bpo-29659: Expose copyfileobj() length arg for public use#328
bpo-29659: Expose copyfileobj() length arg for public use#328goodboy wants to merge 3 commits intopython:masterfrom
length arg for public use#328Conversation
|
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA). Unfortunately our records indicate you have not signed the CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow these steps to rectify the issue:
Thanks again to your contribution and we look forward to looking at it! |
length arg for public uselength arg for public use
|
Hello,
I have completed the above steps :)
Thanks!
Tyler Goodlet
…On Sun, Feb 26, 2017 at 8:19 PM, the-knights-who-say-ni < ***@***.***> wrote:
Hello, and thanks for your contribution!
I'm a bot set up to make sure that the project can legally accept your
contribution by verifying you have signed the PSF contributor agreement
<https://www.python.org/psf/contrib/contrib-form/> (CLA).
Unfortunately our records indicate you have not signed the CLA. For legal
reasons we need you to sign this before we can look at your contribution.
Please follow these steps to rectify the issue:
1. Sign the PSF contributor agreement
<https://www.python.org/psf/contrib/contrib-form/>. The "
bugs.python.org username" requested by the form is the "Login name"
field in "Your Details" at b.p.o
<https://cloud.githubusercontent.com/assets/2680980/23276970/d14a380c-f9d1-11e6-883d-e13b6b211239.png>
2. *Wait at least one US business day* and then check the "Contributor
form received entry under "Your Details" on bugs.python.org to see if
your account has been marked as having signed the CLA (the delay is due to
a person having to manually check your signed CLA)
3. Reply here saying you have completed the above steps
Thanks again to your contribution and we look forward to looking at it!
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#328 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AARzZYMi41HaBN6XwUx3iPTjPSu_myyrks5rgiSogaJpZM4MMmcV>
.
|
vstinner
left a comment
There was a problem hiding this comment.
You must update Doc/library/shutil.rst: document the new parameter, but also add a ".. versionchanged:: 3.7" section to document the change.
Lib/shutil.py
Outdated
There was a problem hiding this comment.
I suggest:
if not length:
length = 16 * 1024
Lib/shutil.py
Outdated
There was a problem hiding this comment.
You should document the default size: 16 kB.
Lib/shutil.py
Outdated
4feede0 to
39a9b83
Compare
|
@Haypo I believe I've addressed everything! |
Doc/library/shutil.rst
Outdated
There was a problem hiding this comment.
I wouldn'd say "expose ..." but just "Added length parameter", and versionadded is not the right place to document "default is ...", put it in the documentation body.
You didn't update copyfile() signature. Same remark for the 2 other functions.
Moreover, you should also document the length in the function document, not only in versionadded. The "documentation" can just ask to refer to copyfileobj().
There was a problem hiding this comment.
Shoot yeah not sure how I missed the signatures...
Lib/shutil.py
Outdated
There was a problem hiding this comment.
please mention the unit, "size in bytes"
There was a problem hiding this comment.
@Haypo to clarify, do you prefer:
'An in-memory buffer size can be set with length; the default size in bytes is 16*1024',
or
"An in-memory buffer size in bytes can be set with length; the default is 16 KB.
?
There was a problem hiding this comment.
"An in-memory buffer size in bytes can be set withlength; the default is 16 KB."
There was a problem hiding this comment.
I encourage you to use KiB, which other parts of the documentation already use. If you write 16 KB it could easily be interpreted as 16,000 B (lower-case k means 1000, as in km for kilometre).
39a9b83 to
158a71b
Compare
|
@Haypo hopefully 3rd times the charm ;) If you'd like me to squash the history to one commit let me know! |
terryjreedy
left a comment
There was a problem hiding this comment.
Suggestion: leave API of copyfileobj should alone as 'length=16*1024' to document default. For other functions, say something like 'If length not passed, copyfileobj uses its default.' We might change it someday.
Required: new or revised tests that use the expanded api (and fail without patch).
|
@terryjreedy the only problem with that is that we'll need to have the Yes I totally agree about the test. I guess it goes here? |
|
Yes I totally agree about the test. I guess it goes here?
|
Doc/library/shutil.rst
Outdated
There was a problem hiding this comment.
Indentation should use three spaces here:
.. versionchanged:: 3.7
Added the *length* parameter.There was a problem hiding this comment.
`length` -> *length*
(and please finish the sentence with a full stop.)
When copying large files the copy time can be sufficiently shortened by increasing the memory buffer used in the `copyfileobj()` routine. Expose the `length` argument from `copyfileobj()` upwards for use throughout the module.
158a71b to
bfeaee8
Compare
Doc/library/shutil.rst
Outdated
There was a problem hiding this comment.
IMO “Note that” is redundant. In fact the whole first line could be dropped, and “Only the contents from the current file position . . .” could be put back in the first paragraph.
vstinner
left a comment
There was a problem hiding this comment.
LGTM, but please address pending comments before I can approve the change ;-)
Lib/shutil.py
Outdated
Lib/shutil.py
Outdated
There was a problem hiding this comment.
PEP 8: add spaces around operators, "16 * 1024"
Lib/shutil.py
Outdated
|
Note: I created PR #4293 to replace KB with KiB in the Python code base ;-) |
bfeaee8 to
0043566
Compare
|
I think everything has been addressed except for @terryjreedy's:
I don't know how to keep the default value in Whatever you all think is best I'll do 👍 |
Lib/test/test_shutil.py
Outdated
There was a problem hiding this comment.
I suggest to use a smaller file: write('x' * 100) and then length=20.
By the way, I suggest to use tempfile.NamedTemporaryFile rather than temporary files:
with tempfile.NamedTemporaryFile() as src:
with tempfile.NamedTemporaryFile() as dst:
write_file(src.name, b'x' * 100, binary=True)
fn(src.name, dst.name, length=20)
0043566 to
8daf7a8
Compare
|
|
||
| def copyfileobj(fsrc, fdst, length=16*1024): | ||
| """copy data from file-like object fsrc to file-like object fdst""" | ||
| def copyfileobj(fsrc, fdst, length=None): |
There was a problem hiding this comment.
def copyfileobj(fsrc, fdst, length=16*1024):I personally prefer this way in my code to concentrate 2 lines worth of code into one.
if not length:
length = 16 * 1024It looks a little nicer and not to mention saves lines of code that basically do the same thing.
And yes I am also guilty of doing it this way and then people started hating me for it and calling my a bad programmer for it.
|
Just to clear this up since it seems my original reasoning isn't showing well and the drive-by comments are piling up. Afaict there are 3 ways to deal with the default value of
If one of these solutions is preferred over the others after acknowledging the trade-offs I'm fine to do it, but just simply saying |
|
@vstinner looks like the appveyor build failed due to the |
Sorry, I don't know why the test failed. |
|
Looks to me like an normal permission error as python seems to not have permissions to write to the temp folder? Maybe somehow have python write to somewhere that does not require administrator permissions? It would be nice if python had an nice function that you provide the path and it checks if you have permissions to write to there to avoid having to try / except permission errors. 🤔 That function would be the perfect thing to fix this test on Windows. |
|
Ok so besides the outstanding:
Is there anything else in the patch that needs to be addressed? |
|
I dont think so. |
vstinner
left a comment
There was a problem hiding this comment.
I dislike length=-1 "feature".
| """ | ||
| if not length: | ||
| length = 16 * 1024 | ||
| while 1: |
There was a problem hiding this comment.
I'm not sure that it's ok to use a loop if the length is negative. I suggest to have a special case for negative value calling read() (no parameter) only once.
| The integer *length*, if given, is the buffer size; the default value | ||
| in bytes is 16 KiB. A negative *length* value means to copy the data without | ||
| looping over the source data in chunks; by default the data is read in | ||
| chunks to avoid uncontrolled memory consumption. |
There was a problem hiding this comment.
"A negative length value means to copy the data without looping over the source data in chunks"
I dislike this definition. In practice, negative means "unlimited" buffer size: the whole input file is loaded into memory.
I'm not sure that it's a good practice to try to load files of unknown size into memory.
I suggest to remove this feature which seems more like a side effect than a carefully designed API.
If you want to get fast copy, pass a very large length like 1 GB. But if Python starts to load 1 TB into memory, it's likely to crash the system... At least, to slow down the system, a lot.
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
|
Fixed by https://bugs.python.org/issue33671 |
Bumps [gidgethub](https://github.com/brettcannon/gidgethub) from 4.0.0 to 4.1.0. - [Release notes](https://github.com/brettcannon/gidgethub/releases) - [Changelog](https://github.com/brettcannon/gidgethub/blob/master/docs/changelog.rst) - [Commits](https://github.com/brettcannon/gidgethub/commits/v4.1.0) Signed-off-by: dependabot-preview[bot] <support@dependabot.com> Co-authored-by: dependabot-preview[bot] <27856297+dependabot-preview[bot]@users.noreply.github.com> Co-authored-by: Mariatta <Mariatta@users.noreply.github.com>
Corresponds to issue bpo-29659
When copying large files the copy time can be sufficiently
shortened by increasing the memory buffer used in the
copyfileobj()routine. Expose the
lengthargument fromcopyfileobj()upwardsfor use throughout the module.
https://bugs.python.org/issue29659