-
-
Notifications
You must be signed in to change notification settings - Fork 34.2k
Description
Bug report
The zipfile.ZipInfo.from_file method takes a native filename, and an optional name to be placed in the archive. However, it uses a converted form of the native filename to the ZipInfo() constructor without turning it into the correct form for an archive name.
The code in question is:
Lines 526 to 541 in 7db1d2e
| if isinstance(filename, os.PathLike): | |
| filename = os.fspath(filename) | |
| st = os.stat(filename) | |
| isdir = stat.S_ISDIR(st.st_mode) | |
| mtime = time.localtime(st.st_mtime) | |
| date_time = mtime[0:6] | |
| if not strict_timestamps and date_time[0] < 1980: | |
| date_time = (1980, 1, 1, 0, 0, 0) | |
| elif not strict_timestamps and date_time[0] > 2107: | |
| date_time = (2107, 12, 31, 23, 59, 59) | |
| # Create ZipInfo instance to store file information | |
| if arcname is None: | |
| arcname = filename | |
| arcname = os.path.normpath(os.path.splitdrive(arcname)[1]) | |
| while arcname[0] in (os.sep, os.altsep): | |
| arcname = arcname[1:] |
I believe that code is assuming that:
ZipInfo()will know that a native filename can be converted to an archive name (this is only true on POSIX and Windows systems, see ZipInfo filename is mangled when os.sep is not '/' #94529), and so is a bad assumption- A native filename is equivalent to a name that can be put into an archive. This is clearly not true on all systems.
- The archive name supplied will conform to the filesystem normalisation rules for the system it is running on. Again, this will not be true on filesystems that do not honour the same normalisation rules.
I believe the code should explicitly convert the filename to a form which is suitable for use as an archive name, rather than assuming that the name given is suitable.
A real world example on a system which uses a different scheme for filename separators is RISC OS, which has an os.sep of .. In such a case you might make a call to place a file in the archive with something like:
zi = ZipInfo.from_file("Application.Directory.Filename/txt")The expectation would be that this would create a ZipInfo object with an arcname of Application/Directory/Filename.txt. However, this will actually use the arcname Application.Directory.Filename/txt - which isn't at all what was intended or useful.
I believe the code would be better handled as something like this...
# Create ZipInfo instance to store file information
if arcname is None:
# The filename is still in native filename format, so we must convert to
# the form used by the archive.
arcname = os.path.normpath(os.path.splitdrive(filename)[1])
# Normalise the separator to os.sep
if os.altsep:
arcname = arcname.replace(os.altsep, os.sep)
# Ensure that the directory and extension separators are in arcname format
if os.sep != '/' or os.extsep != '.':
parts = arcname.split(os.sep)
if os.extsep != '.':
parts = [part.replace(os.extsep, '.') for part in parts]
arcname = '/'.join(parts)
while arcname and arcname[0] == '/':
arcname = arcname[1:]This would...
- Only perform the normalisation through the filesystem functions when a filename is given. That is, if you supply an arcname that's what you get, unaffected by the native platform's filesystem semantics.
- Converts the filesystem semantics (from
os.sep,os.altsepandos.extsep) to arcname semantics. We normalise fromaltseptosepto simplify things, and then construct a name which uses the correct arcname convention from the known parts of the name. - Only performs the conversion if they would not be identity operations, so we're efficient on systems where the name is a pass through.
- Retain the stripping of leading
/characters in the archive, so that we don't create them in an odd place (although it's arguable this should be in the ZipInfo constructor to prevent overwriting system files, etc). - Ensure that ZipInfo is always called with a name which is an arcname, as that's what it expects anyhow (it has a little allowance for names in native form but this only ever worked on Windows and should not be relied on because on other systems it can never work, see ZipInfo filename is mangled when os.sep is not '/' #94529)
- Retain as much of the existing behaviour as was actually reliable and deterministic as possible.
Your environment
- CPython versions tested on: Python 3.9, Python 3.10
- Operating system and architecture: OSX, RISC OS
Metadata
Metadata
Assignees
Labels
Projects
Status
Status