X Tutup
Skip to content

Commit fdf239a

Browse files
committed
Close python#17839: support bytes-like objects in base64 module
This mostly affected the encodebytes and decodebytes function (which are used by base64_codec) Also added a test to ensure all bytes-bytes codecs can handle memoryview input and tests for handling of multidimensional and non-bytes format input in the modern base64 API.
1 parent 73c6ee0 commit fdf239a

File tree

6 files changed

+172
-69
lines changed

6 files changed

+172
-69
lines changed

Doc/library/base64.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,10 @@ byte strings, but only using the Base64 standard alphabet.
2727
ASCII-only Unicode strings are now accepted by the decoding functions of
2828
the modern interface.
2929

30+
.. versionchanged:: 3.4
31+
Any :term:`bytes-like object`\ s are now accepted by all
32+
encoding and decoding functions in this module.
33+
3034
The modern interface provides:
3135

3236
.. function:: b64encode(s, altchars=None)

Doc/library/codecs.rst

Lines changed: 35 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1208,36 +1208,41 @@ mappings.
12081208

12091209
.. tabularcolumns:: |l|L|L|
12101210

1211-
+----------------------+---------------------------+------------------------------+
1212-
| Codec | Purpose | Encoder/decoder |
1213-
+======================+===========================+==============================+
1214-
| base64_codec [#b64]_ | Convert operand to MIME | :meth:`base64.b64encode`, |
1215-
| | base64 (the result always | :meth:`base64.b64decode` |
1216-
| | includes a trailing | |
1217-
| | ``'\n'``) | |
1218-
+----------------------+---------------------------+------------------------------+
1219-
| bz2_codec | Compress the operand | :meth:`bz2.compress`, |
1220-
| | using bz2 | :meth:`bz2.decompress` |
1221-
+----------------------+---------------------------+------------------------------+
1222-
| hex_codec | Convert operand to | :meth:`base64.b16encode`, |
1223-
| | hexadecimal | :meth:`base64.b16decode` |
1224-
| | representation, with two | |
1225-
| | digits per byte | |
1226-
+----------------------+---------------------------+------------------------------+
1227-
| quopri_codec | Convert operand to MIME | :meth:`quopri.encodestring`, |
1228-
| | quoted printable | :meth:`quopri.decodestring` |
1229-
+----------------------+---------------------------+------------------------------+
1230-
| uu_codec | Convert the operand using | :meth:`uu.encode`, |
1231-
| | uuencode | :meth:`uu.decode` |
1232-
+----------------------+---------------------------+------------------------------+
1233-
| zlib_codec | Compress the operand | :meth:`zlib.compress`, |
1234-
| | using gzip | :meth:`zlib.decompress` |
1235-
+----------------------+---------------------------+------------------------------+
1236-
1237-
.. [#b64] Rather than accepting any :term:`bytes-like object`,
1238-
``'base64_codec'`` accepts only :class:`bytes` and :class:`bytearray` for
1239-
encoding and only :class:`bytes`, :class:`bytearray`, and ASCII-only
1240-
instances of :class:`str` for decoding
1211+
+----------------------+------------------------------+------------------------------+
1212+
| Codec | Purpose | Encoder / decoder |
1213+
+======================+==============================+==============================+
1214+
| base64_codec [#b64]_ | Convert operand to MIME | :meth:`base64.b64encode` / |
1215+
| | base64 (the result always | :meth:`base64.b64decode` |
1216+
| | includes a trailing | |
1217+
| | ``'\n'``) | |
1218+
| | | |
1219+
| | .. versionchanged:: 3.4 | |
1220+
| | accepts any | |
1221+
| | :term:`bytes-like object` | |
1222+
| | as input for encoding and | |
1223+
| | decoding | |
1224+
+----------------------+------------------------------+------------------------------+
1225+
| bz2_codec | Compress the operand | :meth:`bz2.compress` / |
1226+
| | using bz2 | :meth:`bz2.decompress` |
1227+
+----------------------+------------------------------+------------------------------+
1228+
| hex_codec | Convert operand to | :meth:`base64.b16encode` / |
1229+
| | hexadecimal | :meth:`base64.b16decode` |
1230+
| | representation, with two | |
1231+
| | digits per byte | |
1232+
+----------------------+------------------------------+------------------------------+
1233+
| quopri_codec | Convert operand to MIME | :meth:`quopri.encodestring` /|
1234+
| | quoted printable | :meth:`quopri.decodestring` |
1235+
+----------------------+------------------------------+------------------------------+
1236+
| uu_codec | Convert the operand using | :meth:`uu.encode` / |
1237+
| | uuencode | :meth:`uu.decode` |
1238+
+----------------------+------------------------------+------------------------------+
1239+
| zlib_codec | Compress the operand | :meth:`zlib.compress` / |
1240+
| | using gzip | :meth:`zlib.decompress` |
1241+
+----------------------+------------------------------+------------------------------+
1242+
1243+
.. [#b64] In addition to :term:`bytes-like objects <bytes-like object>`,
1244+
``'base64_codec'`` also accepts ASCII-only instances of :class:`str` for
1245+
decoding
12411246
12421247
12431248
The following codecs provide :class:`str` to :class:`str` mappings.

Lib/base64.py

Lines changed: 24 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -35,11 +35,13 @@ def _bytes_from_decode_data(s):
3535
return s.encode('ascii')
3636
except UnicodeEncodeError:
3737
raise ValueError('string argument should contain only ASCII characters')
38-
elif isinstance(s, bytes_types):
38+
if isinstance(s, bytes_types):
3939
return s
40-
else:
41-
raise TypeError("argument should be bytes or ASCII string, not %s" % s.__class__.__name__)
42-
40+
try:
41+
return memoryview(s).tobytes()
42+
except TypeError:
43+
raise TypeError("argument should be a bytes-like object or ASCII "
44+
"string, not %r" % s.__class__.__name__) from None
4345

4446

4547
# Base64 encoding/decoding uses binascii
@@ -54,14 +56,9 @@ def b64encode(s, altchars=None):
5456
5557
The encoded byte string is returned.
5658
"""
57-
if not isinstance(s, bytes_types):
58-
raise TypeError("expected bytes, not %s" % s.__class__.__name__)
5959
# Strip off the trailing newline
6060
encoded = binascii.b2a_base64(s)[:-1]
6161
if altchars is not None:
62-
if not isinstance(altchars, bytes_types):
63-
raise TypeError("expected bytes, not %s"
64-
% altchars.__class__.__name__)
6562
assert len(altchars) == 2, repr(altchars)
6663
return encoded.translate(bytes.maketrans(b'+/', altchars))
6764
return encoded
@@ -149,7 +146,7 @@ def b32encode(s):
149146
s is the byte string to encode. The encoded byte string is returned.
150147
"""
151148
if not isinstance(s, bytes_types):
152-
raise TypeError("expected bytes, not %s" % s.__class__.__name__)
149+
s = memoryview(s).tobytes()
153150
leftover = len(s) % 5
154151
# Pad the last quantum with zero bits if necessary
155152
if leftover:
@@ -250,8 +247,6 @@ def b16encode(s):
250247
251248
s is the byte string to encode. The encoded byte string is returned.
252249
"""
253-
if not isinstance(s, bytes_types):
254-
raise TypeError("expected bytes, not %s" % s.__class__.__name__)
255250
return binascii.hexlify(s).upper()
256251

257252

@@ -306,12 +301,26 @@ def decode(input, output):
306301
s = binascii.a2b_base64(line)
307302
output.write(s)
308303

304+
def _input_type_check(s):
305+
try:
306+
m = memoryview(s)
307+
except TypeError as err:
308+
msg = "expected bytes-like object, not %s" % s.__class__.__name__
309+
raise TypeError(msg) from err
310+
if m.format not in ('c', 'b', 'B'):
311+
msg = ("expected single byte elements, not %r from %s" %
312+
(m.format, s.__class__.__name__))
313+
raise TypeError(msg)
314+
if m.ndim != 1:
315+
msg = ("expected 1-D data, not %d-D data from %s" %
316+
(m.ndim, s.__class__.__name__))
317+
raise TypeError(msg)
318+
309319

310320
def encodebytes(s):
311321
"""Encode a bytestring into a bytestring containing multiple lines
312322
of base-64 data."""
313-
if not isinstance(s, bytes_types):
314-
raise TypeError("expected bytes, not %s" % s.__class__.__name__)
323+
_input_type_check(s)
315324
pieces = []
316325
for i in range(0, len(s), MAXBINSIZE):
317326
chunk = s[i : i + MAXBINSIZE]
@@ -328,8 +337,7 @@ def encodestring(s):
328337

329338
def decodebytes(s):
330339
"""Decode a bytestring of base-64 data into a bytestring."""
331-
if not isinstance(s, bytes_types):
332-
raise TypeError("expected bytes, not %s" % s.__class__.__name__)
340+
_input_type_check(s)
333341
return binascii.a2b_base64(s)
334342

335343
def decodestring(s):

Lib/test/test_base64.py

Lines changed: 87 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,21 @@
55
import os
66
import sys
77
import subprocess
8-
8+
import struct
9+
from array import array
910

1011

1112
class LegacyBase64TestCase(unittest.TestCase):
13+
14+
# Legacy API is not as permissive as the modern API
15+
def check_type_errors(self, f):
16+
self.assertRaises(TypeError, f, "")
17+
self.assertRaises(TypeError, f, [])
18+
multidimensional = memoryview(b"1234").cast('B', (2, 2))
19+
self.assertRaises(TypeError, f, multidimensional)
20+
int_data = memoryview(b"1234").cast('I')
21+
self.assertRaises(TypeError, f, int_data)
22+
1223
def test_encodebytes(self):
1324
eq = self.assertEqual
1425
eq(base64.encodebytes(b"www.python.org"), b"d3d3LnB5dGhvbi5vcmc=\n")
@@ -24,7 +35,9 @@ def test_encodebytes(self):
2435
b"Y3ODkhQCMwXiYqKCk7Ojw+LC4gW117fQ==\n")
2536
# Non-bytes
2637
eq(base64.encodebytes(bytearray(b'abc')), b'YWJj\n')
27-
self.assertRaises(TypeError, base64.encodebytes, "")
38+
eq(base64.encodebytes(memoryview(b'abc')), b'YWJj\n')
39+
eq(base64.encodebytes(array('B', b'abc')), b'YWJj\n')
40+
self.check_type_errors(base64.encodebytes)
2841

2942
def test_decodebytes(self):
3043
eq = self.assertEqual
@@ -41,7 +54,9 @@ def test_decodebytes(self):
4154
eq(base64.decodebytes(b''), b'')
4255
# Non-bytes
4356
eq(base64.decodebytes(bytearray(b'YWJj\n')), b'abc')
44-
self.assertRaises(TypeError, base64.decodebytes, "")
57+
eq(base64.decodebytes(memoryview(b'YWJj\n')), b'abc')
58+
eq(base64.decodebytes(array('B', b'YWJj\n')), b'abc')
59+
self.check_type_errors(base64.decodebytes)
4560

4661
def test_encode(self):
4762
eq = self.assertEqual
@@ -73,6 +88,38 @@ def test_decode(self):
7388

7489

7590
class BaseXYTestCase(unittest.TestCase):
91+
92+
# Modern API completely ignores exported dimension and format data and
93+
# treats any buffer as a stream of bytes
94+
def check_encode_type_errors(self, f):
95+
self.assertRaises(TypeError, f, "")
96+
self.assertRaises(TypeError, f, [])
97+
98+
def check_decode_type_errors(self, f):
99+
self.assertRaises(TypeError, f, [])
100+
101+
def check_other_types(self, f, bytes_data, expected):
102+
eq = self.assertEqual
103+
eq(f(bytearray(bytes_data)), expected)
104+
eq(f(memoryview(bytes_data)), expected)
105+
eq(f(array('B', bytes_data)), expected)
106+
self.check_nonbyte_element_format(base64.b64encode, bytes_data)
107+
self.check_multidimensional(base64.b64encode, bytes_data)
108+
109+
def check_multidimensional(self, f, data):
110+
padding = b"\x00" if len(data) % 2 else b""
111+
bytes_data = data + padding # Make sure cast works
112+
shape = (len(bytes_data) // 2, 2)
113+
multidimensional = memoryview(bytes_data).cast('B', shape)
114+
self.assertEqual(f(multidimensional), f(bytes_data))
115+
116+
def check_nonbyte_element_format(self, f, data):
117+
padding = b"\x00" * ((4 - len(data)) % 4)
118+
bytes_data = data + padding # Make sure cast works
119+
int_data = memoryview(bytes_data).cast('I')
120+
self.assertEqual(f(int_data), f(bytes_data))
121+
122+
76123
def test_b64encode(self):
77124
eq = self.assertEqual
78125
# Test default alphabet
@@ -90,13 +137,16 @@ def test_b64encode(self):
90137
b"Y3ODkhQCMwXiYqKCk7Ojw+LC4gW117fQ==")
91138
# Test with arbitrary alternative characters
92139
eq(base64.b64encode(b'\xd3V\xbeo\xf7\x1d', altchars=b'*$'), b'01a*b$cd')
93-
# Non-bytes
94-
eq(base64.b64encode(bytearray(b'abcd')), b'YWJjZA==')
95140
eq(base64.b64encode(b'\xd3V\xbeo\xf7\x1d', altchars=bytearray(b'*$')),
96141
b'01a*b$cd')
97-
# Check if passing a str object raises an error
98-
self.assertRaises(TypeError, base64.b64encode, "")
99-
self.assertRaises(TypeError, base64.b64encode, b"", altchars="")
142+
eq(base64.b64encode(b'\xd3V\xbeo\xf7\x1d', altchars=memoryview(b'*$')),
143+
b'01a*b$cd')
144+
eq(base64.b64encode(b'\xd3V\xbeo\xf7\x1d', altchars=array('B', b'*$')),
145+
b'01a*b$cd')
146+
# Non-bytes
147+
self.check_other_types(base64.b64encode, b'abcd', b'YWJjZA==')
148+
self.check_encode_type_errors(base64.b64encode)
149+
self.assertRaises(TypeError, base64.b64encode, b"", altchars="*$")
100150
# Test standard alphabet
101151
eq(base64.standard_b64encode(b"www.python.org"), b"d3d3LnB5dGhvbi5vcmc=")
102152
eq(base64.standard_b64encode(b"a"), b"YQ==")
@@ -110,15 +160,15 @@ def test_b64encode(self):
110160
b"RUZHSElKS0xNTk9QUVJTVFVWV1hZWjAxMjM0NT"
111161
b"Y3ODkhQCMwXiYqKCk7Ojw+LC4gW117fQ==")
112162
# Non-bytes
113-
eq(base64.standard_b64encode(bytearray(b'abcd')), b'YWJjZA==')
114-
# Check if passing a str object raises an error
115-
self.assertRaises(TypeError, base64.standard_b64encode, "")
163+
self.check_other_types(base64.standard_b64encode,
164+
b'abcd', b'YWJjZA==')
165+
self.check_encode_type_errors(base64.standard_b64encode)
116166
# Test with 'URL safe' alternative characters
117167
eq(base64.urlsafe_b64encode(b'\xd3V\xbeo\xf7\x1d'), b'01a-b_cd')
118168
# Non-bytes
119-
eq(base64.urlsafe_b64encode(bytearray(b'\xd3V\xbeo\xf7\x1d')), b'01a-b_cd')
120-
# Check if passing a str object raises an error
121-
self.assertRaises(TypeError, base64.urlsafe_b64encode, "")
169+
self.check_other_types(base64.urlsafe_b64encode,
170+
b'\xd3V\xbeo\xf7\x1d', b'01a-b_cd')
171+
self.check_encode_type_errors(base64.urlsafe_b64encode)
122172

123173
def test_b64decode(self):
124174
eq = self.assertEqual
@@ -141,7 +191,8 @@ def test_b64decode(self):
141191
eq(base64.b64decode(data), res)
142192
eq(base64.b64decode(data.decode('ascii')), res)
143193
# Non-bytes
144-
eq(base64.b64decode(bytearray(b"YWJj")), b"abc")
194+
self.check_other_types(base64.b64decode, b"YWJj", b"abc")
195+
self.check_decode_type_errors(base64.b64decode)
145196

146197
# Test with arbitrary alternative characters
147198
tests_altchars = {(b'01a*b$cd', b'*$'): b'\xd3V\xbeo\xf7\x1d',
@@ -160,7 +211,8 @@ def test_b64decode(self):
160211
eq(base64.standard_b64decode(data), res)
161212
eq(base64.standard_b64decode(data.decode('ascii')), res)
162213
# Non-bytes
163-
eq(base64.standard_b64decode(bytearray(b"YWJj")), b"abc")
214+
self.check_other_types(base64.standard_b64decode, b"YWJj", b"abc")
215+
self.check_decode_type_errors(base64.standard_b64decode)
164216

165217
# Test with 'URL safe' alternative characters
166218
tests_urlsafe = {b'01a-b_cd': b'\xd3V\xbeo\xf7\x1d',
@@ -170,7 +222,9 @@ def test_b64decode(self):
170222
eq(base64.urlsafe_b64decode(data), res)
171223
eq(base64.urlsafe_b64decode(data.decode('ascii')), res)
172224
# Non-bytes
173-
eq(base64.urlsafe_b64decode(bytearray(b'01a-b_cd')), b'\xd3V\xbeo\xf7\x1d')
225+
self.check_other_types(base64.urlsafe_b64decode, b'01a-b_cd',
226+
b'\xd3V\xbeo\xf7\x1d')
227+
self.check_decode_type_errors(base64.urlsafe_b64decode)
174228

175229
def test_b64decode_padding_error(self):
176230
self.assertRaises(binascii.Error, base64.b64decode, b'abc')
@@ -205,8 +259,8 @@ def test_b32encode(self):
205259
eq(base64.b32encode(b'abcd'), b'MFRGGZA=')
206260
eq(base64.b32encode(b'abcde'), b'MFRGGZDF')
207261
# Non-bytes
208-
eq(base64.b32encode(bytearray(b'abcd')), b'MFRGGZA=')
209-
self.assertRaises(TypeError, base64.b32encode, "")
262+
self.check_other_types(base64.b32encode, b'abcd', b'MFRGGZA=')
263+
self.check_encode_type_errors(base64.b32encode)
210264

211265
def test_b32decode(self):
212266
eq = self.assertEqual
@@ -222,7 +276,8 @@ def test_b32decode(self):
222276
eq(base64.b32decode(data), res)
223277
eq(base64.b32decode(data.decode('ascii')), res)
224278
# Non-bytes
225-
eq(base64.b32decode(bytearray(b'MFRGG===')), b'abc')
279+
self.check_other_types(base64.b32decode, b'MFRGG===', b"abc")
280+
self.check_decode_type_errors(base64.b32decode)
226281

227282
def test_b32decode_casefold(self):
228283
eq = self.assertEqual
@@ -277,8 +332,9 @@ def test_b16encode(self):
277332
eq(base64.b16encode(b'\x01\x02\xab\xcd\xef'), b'0102ABCDEF')
278333
eq(base64.b16encode(b'\x00'), b'00')
279334
# Non-bytes
280-
eq(base64.b16encode(bytearray(b'\x01\x02\xab\xcd\xef')), b'0102ABCDEF')
281-
self.assertRaises(TypeError, base64.b16encode, "")
335+
self.check_other_types(base64.b16encode, b'\x01\x02\xab\xcd\xef',
336+
b'0102ABCDEF')
337+
self.check_encode_type_errors(base64.b16encode)
282338

283339
def test_b16decode(self):
284340
eq = self.assertEqual
@@ -293,7 +349,15 @@ def test_b16decode(self):
293349
eq(base64.b16decode(b'0102abcdef', True), b'\x01\x02\xab\xcd\xef')
294350
eq(base64.b16decode('0102abcdef', True), b'\x01\x02\xab\xcd\xef')
295351
# Non-bytes
296-
eq(base64.b16decode(bytearray(b"0102ABCDEF")), b'\x01\x02\xab\xcd\xef')
352+
self.check_other_types(base64.b16decode, b"0102ABCDEF",
353+
b'\x01\x02\xab\xcd\xef')
354+
self.check_decode_type_errors(base64.b16decode)
355+
eq(base64.b16decode(bytearray(b"0102abcdef"), True),
356+
b'\x01\x02\xab\xcd\xef')
357+
eq(base64.b16decode(memoryview(b"0102abcdef"), True),
358+
b'\x01\x02\xab\xcd\xef')
359+
eq(base64.b16decode(array('B', b"0102abcdef"), True),
360+
b'\x01\x02\xab\xcd\xef')
297361

298362
def test_decode_nonascii_str(self):
299363
decode_funcs = (base64.b64decode,

0 commit comments

Comments
 (0)
X Tutup