bpo-45466: Add download feature to urllib.request module#29217
bpo-45466: Add download feature to urllib.request module#29217pohlt wants to merge 7 commits intopython:mainfrom pohlt:bpo-45466
Conversation
|
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA). Recognized GitHub usernameWe couldn't find a bugs.python.org (b.p.o) account corresponding to the following GitHub usernames: This might be simply due to a missing "GitHub Name" entry in one's b.p.o account settings. This is necessary for legal reasons before we can look at this contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. You can check yourself to see if the CLA has been received. Thanks again for the contribution, we look forward to reviewing it! |
Similar to http.server, the urllib.request offers a download functionality: python -m urllib.request https://python.org/ --output file.html
|
Tests are needed. |
|
Yep, tests are next. I just found |
Lib/urllib/request.py
Outdated
| out = stdout.buffer if args.output is None else open(args.output, "wb") | ||
|
|
||
| with urlopen(args.URL) as response: | ||
| while data := response.read(1024 * 1024): |
There was a problem hiding this comment.
1 MB is a bit large. curl uses a buffer size of 32768.
The code will also do lots of allocation and deallocations. It's possible to avoid allocations with memoryview(bytearray(32768)) and readinto().
And some housekeeping.
|
I looked into the doc building issue, but couldn't figure out what went wrong. Could someone please give me a hint? |
|
Fixed some uncritical typos and reformatted the news blurb to remove newlines from within code parts. Maybe this fixes the docs issue. |
|
I'm sure the docs are an unrelated problem. I think there's an open issue with the Sphinx version being used, but now I can't find the mention of that problem. |
|
I guess it is unlikely for a core developer to look at the PR as long as there are open basic issues like breaking the docs. Is there anything I can do to re-run the pipeline to see if the docs issue has been fixed? |
|
Closing and re-opening to trigger the doc build step. |
ericvsmith
left a comment
There was a problem hiding this comment.
I'd prefer if we use f-strings for new code.
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
tiran
left a comment
There was a problem hiding this comment.
I still don't think it is a good idea to add a download feature at all.
While the PR implements enough functionality to address your use case, other users will open issues to request more features. I expect that users will ask for "take filename from remote", "custom HTTP headers" and "POST requests" next. curl has over 200 command line options for a reason. Soon we'll end up with a poor clone of curl and a new source of security bugs. :)
Lib/urllib/request.py
Outdated
| out = stdout.buffer if args.output is None else open(args.output, "wb") | ||
|
|
||
| with urlopen(args.URL) as response: | ||
| while data := response.read(1024 * 1024): |
There was a problem hiding this comment.
1 MB is a bit large. curl uses a buffer size of 32768.
The code will also do lots of allocation and deallocations. It's possible to avoid allocations with memoryview(bytearray(32768)) and readinto().
Lib/urllib/request.py
Outdated
| args = parser.parse_args() | ||
| out = stdout.buffer if args.output is None else open(args.output, "wb") | ||
|
|
||
| with urlopen(args.URL) as response: |
There was a problem hiding this comment.
This will print an exception. You should add error checking and a nice error output in case connection or download fails.
Lib/urllib/request.py
Outdated
| parser.add_argument( | ||
| "-o", | ||
| "--output", | ||
| type=str, |
There was a problem hiding this comment.
argparse has builtin file handling:
| type=str, | |
| type=argparse.FileType('wb'), default=sys.stdout.buffer |
Lib/urllib/request.py
Outdated
| if __name__ == "__main__": | ||
| from argparse import ArgumentParser |
There was a problem hiding this comment.
Please turn this into a helper method, e.g. def main().
There was a problem hiding this comment.
Is it common practice (or impolite) to resolve a fixed issue?
* use buffer to avoid buffer reallocations * catch URLError and print error output * use argparse file handling
|
Sorry if I screw up the review process. I've never done this in Github... |
The default answer for those requests could be: If you need more functionality than that, install curl/wget and use it instead. Anyway, thanks for the reviews. |
|
After discussing this among the core devs, we've decided not to accept this patch. Sorry, @pohlt. I hope you at least gained some experience in working with the code and our processes. I'll comment on the issue about why we're not accepting it. |
|
Thanks, @ericvsmith, for your support. |
Similar to http.server, the
urllib.requestcould offer a download functionality:To keep the code lean,
outputis the only optional parameter.A typical use case could be downloading some installation scripts or other data from within a container where curl/wget is not available.
https://bugs.python.org/issue45466