|
`-=[]โจโฉ\;',./~!@#$%^&*()_+{}|:"<>? ๐๐๐๐๐๐๐โ๐๐๐๐๐๐๐๐๐๐๐ ๐ก๐ข๐ฃ๐ค๐ฅ๐ฆ๐ง
ร
โโโรโโ
โยฑโ๊๏นฆโโ โฏ ๐ธ๐นโ๐ป๐ผ๐ฝ๐พโ๐๐๐๐๐โ๐โโโ๐๐๐๐๐๐๐โค๐ด๐ต๐ถ๐ท๐ธ๐น๐บ๐ป๐ผ๐ฝ๐พ๐ฟ๐๐๐๐๐๐
๐๐๐๐๐๐๐๐
โผโฝโพโโโโโ
โโโโโโโ โก โคโฅโฆโงโจโฉโชโซ
โโโโโโ โโโโ
โโ ๐ผ๐ฝ๐พ๐ฟ๐๐๐๐๐๐
๐๐๐๐๐๐๐๐๐๐๐๐๐๐
โโโโ
โฆฐโโโโโโดโต โโโโโโโ โงโจโฉโช
โซโฌโญโฎโฏโฐโฑโฒโณ โฅโฎโฏโฐโฑ โ โฒ โณ โด โ โ สน สบ โต โถ โท
๏น ๏น ๏น ๏น ๏ธน ๏ธบ ๏ธป ๏ธผ ๏ธ ๏ธ ๏ธฟ ๏น ๏ธฝ ๏ธพ ๏น ๏น ๏ธท ๏ธธ โ โ โด โต โ โ โ โก
โโโโโคโฆโฅโงโโโโโโโฒโผโโถโบโปโฒโณ โผโฝโพโฟโโโโโโ
โโ โโโโโโโโโโโโโโโณโฅขโฅฃโฅคโฅฅโฅฆโฅงโฅจโฅฉโฅชโฅซโฅฌโฅญโฅฎโฅฏ
Draft for Information Only
Content
Python Binary Sequence Types โPython Bytes Objects โโPython Bytes Object Properties โโโASCII Charactes โโโSequence of Integer โโPython Bytes Literals โโPython Bytes Constructor โโโParameters โโโRemarks โโPython Bytes Methods โPython Bytearray Objects โโPython Bytearray Constructor โโโParameters โโโRemarks โโPython Bytearray Methods โBytes and Bytearray Operations โSource and Reference
Python Binary Sequence Types
Python binary data is implemented as Python binary sequence types which are handled by predefined bytes, bytearray, and memoryview iteration objects.
Both bytes, and bytearray are used to manipulate binary data and are supported by memoryview. memoryview is used to access the binary data of a binary object stored in the memory through the buffer protocol directly without making a copy. Besides, bytearray is also supported by the array module with efficient storage of basic data types like 32-bit integers and IEEE754 double-precision floating values.
Python Bytes Objects
Bytes objects are immutable sequences of single bytes.
Python Bytes Object Properties
ASCII Charactes
Many major binary protocols are based on ASCII text encoding, several methods are only valid when working with ASCII compatible data and are closely related to string objects in a variety of other ways.
Only ASCII characters are permitted in bytes literals (regardless of the declared source code encoding). Any binary values over 127 must be entered into bytes literals using the appropriate escape sequence.
While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers, with each value in the sequence restricted such that 0 <= x < 256 (attempts to violate this restriction will trigger ValueError). This is done deliberately to emphasise that while many binary formats include ASCII based elements and can be usefully manipulated with some text-oriented algorithms, this is not generally the case for arbitrary binary data (blindly applying text processing algorithms to binary data formats that are not ASCII compatible will usually lead to data corruption).
Sequence of Integer
Since bytes objects are sequences of integers (akin to a tuple), for a bytes object b, b[0] will be an integer, while b[0:1] will be a bytes object of length 1. (This contrasts with text strings, where both indexing and slicing will produce a string of length 1)
The representation of bytes objects uses the literal format (b'...') since it is often more useful than e.g. bytes([46, 46, 46]). You can always convert a bytes object into a list of integers using list(b).
Note
For Python 2.x users: In the Python 2.x series, a variety of implicit conversions between 8-bit strings (the closest thing 2.x offers to a built-in binary data type) and Unicode strings were permitted. This was a backwards compatibility workaround to account for the fact that Python originally only supported 8-bit text, and Unicode text was a later addition. In Python 3.x, those implicit conversions are gone - conversions between 8-bit binary data and Unicode text must be explicit, and bytes and string objects will always compare unequal.
Python Bytes Literals
Python bytes may be constructed by bytes literals.
Bytes LiteralsSingle quotes: For allowing embedded double quotes. e.g. b'allows embedded "double" quotes'
Double quotes: For allowing embedded single quotes. e.g. b"allows embedded 'single' quotes"
Triple quoted: For allowing embedded single and double quotes. Besides triple quoted strings may span multiple lines and all associated whitespace will be included in the bytes literal. e.g. b'''Three single quotes''', b"""Three double quotes"""
Besides, a single expression of several textual literals separated by whitespace only will be implicitly converted to a single textual literal.
Python Bytes Constructor
The Python bytes constructor is class bytes
class bytes([source[, encoding[, errors]]])
Parameters
class bytesto return a bytes object
sourceoptional argument used to specify the binary data source
encodingoptional argument, valid only if source is given, used to the encoding of source
errorsoptional argument, valid only if encoding is given, used to specify the error handling method
Remarks
- bytes literal is usually used as
source
- if
source is not given, bytes() return an empty bytes object b''
source of special value can be used to create some typical bytes object:
- zero-filled bytes object of a specified length. e.g.
bytes(10)
- bytes object of a specified range. e.g.
bytes(range(20))
- bytes object by copying existing binary data via the buffer protocol. e.g.
bytes(obj)
Since 2 hexadecimal digits correspond precisely to a single byte, hexadecimal numbers are a commonly used format for describing binary data. Accordingly, the bytes type has an additional class method to read data in that format:
Python Bytes Methods
classmethod bytes.fromhex(string)a bytes-classmethod to return a bytes object by decoding the given string object. Form of string is two hexadecimal digits per byte and ASCII whitespace will be ignored.
๐๐ฆ๐ก๐๐ .hex([sep[, bytes_per_sep]])to return a string object of form two hexadecimal digits per byte from ๐๐ฆ๐ก๐๐ with output pattern according to the optional specified delimiter sep and optional specified length per byte bytes_per_sep. bytes_per_sep with positive values calculate the separator position from the right, while negative values from the left.
Python Bytearray Objects
Bytearray objects are a mutable counterpart to bytes objects. There is no dedicated literal syntax for bytearray objects, instead they are always created by calling the constructor
Python Bytearray Constructor
The Python bytearray constructor is class bytearray
class bytearray([source[, encoding[, errors]]])
Parameters
class bytearrayto return a bytearray object
sourceoptional argument used to specify the binary data source
encodingoptional argument, valid only if source is given, used to the encoding of source
errorsoptional argument, valid only if encoding is given, used to specify the error handling method
Remarks
- There is no dedicated literal syntax for bytearray objects. Bytes literal is also usually used as
source
- if
source is not given, bytes() return an empty bytes object b''
source of special value can be used to create some typical bytes object:
- zero-filled bytes object of a specified length. e.g.
bytes(10)
- bytes object of a specified range. e.g.
bytes(range(20))
- bytes object by copying existing binary data via the buffer protocol. e.g.
bytes(obj)
As bytearray objects are mutable, they support the mutable sequence operations in addition to the common bytes and bytearray operations described in Bytes and Bytearray Operations.
Also see the bytearray built-in.
Since 2 hexadecimal digits correspond precisely to a single byte, hexadecimal numbers are a commonly used format for describing binary data. Accordingly, the bytearray type has an additional class method to read data in that format:
Python Bytearray Methods
classmethod bytearray.fromhex(string)a bytearray-classmethod to return a bytearray object by decoding the given string object. Form of string is two hexadecimal digits per byte and ASCII whitespace will be ignored.
hex([sep[, bytes_per_sep]])Return a string object containing two hexadecimal digits for each byte in the instance.
>>>
>>> bytearray(b'\xf0\xf1\xf2').hex()
'f0f1f2'
New in version 3.5.
Changed in version 3.8: Similar to bytes.hex(), ๐๐ฆ๐ก๐๐๐๐๐๐ฆ.hex() now supports optional sep and bytes_per_sep parameters to insert separators between bytes in the hex output.
Since bytearray objects are sequences of integers (akin to a list), for a bytearray object b, b[0] will be an integer, while b[0:1] will be a bytearray object of length 1. (This contrasts with text strings, where both indexing and slicing will produce a string of length 1)
The representation of bytearray objects uses the bytes literal format (bytearray(b'...')) since it is often more useful than e.g. bytearray([46, 46, 46]). You can always convert a bytearray object into a list of integers using list(b).
Bytes and Bytearray Operations
Both bytes and bytearray objects support the common sequence operations. They interoperate not just with operands of the same type, but with any bytes-like object. Due to this flexibility, they can be freely mixed in operations without causing errors. However, the return type of the result may depend on the order of operands.
Note
The methods on bytes and bytearray objects donโt accept strings as their arguments, just as the methods on strings donโt accept bytes as their arguments. For example, you have to write:
a = "abc"
b = a.replace("a", "f")
and:
a = b"abc"
b = a.replace(b"a", b"f")
Some bytes and bytearray operations assume the use of ASCII compatible binary formats, and hence should be avoided when working with arbitrary binary data. These restrictions are covered below.
Note
Using these ASCII based operations to manipulate binary data that is not stored in an ASCII based format may lead to data corruption.
The following methods on bytes and bytearray objects can be used with arbitrary binary data.
๐๐ฆ๐ก๐๐ .count(sub[, start[, end]])๐๐ฆ๐ก๐๐๐๐๐๐ฆ.count(sub[, start[, end]])to return the number of non-overlapping occurrences of the specified subsequence sub in ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ within the optional range specified by one argument start, or two arguments start and end. The subsequence sub may be any bytes-like object or an integer in the range 0 to 255.
๐๐ฆ๐ก๐๐ .decode(encoding="utf-8", errors="strict")๐๐ฆ๐ก๐๐๐๐๐๐ฆ.decode(encoding="utf-8", errors="strict")to return a string from ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ by decoding according to the specified encoding encoding and error handling scheme errors. Default encoding is 'utf-8' and default errors is 'strict, meaning that encoding errors raise a UnicodeError. Other possible values are 'ignore', 'replace' and any other name registered via codecs.register_error(), etc. By passing the encoding argument to ๐ ๐ก๐ allows decoding any bytes-like object directly, without needing to make a temporary bytes or bytearray object.
๐๐ฆ๐ก๐๐ .endswith(suffix[, start[, end]])๐๐ฆ๐ก๐๐๐๐๐๐ฆ.endswith(suffix[, start[, end]])to return a True if the speicified suffix is found at the end of ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ within the optional range specified by one argument start, or two arguments start and end, otherwise return a False. The specified suffix can also be a tuple of suffixes being look for. The suffix(es) to search for may be any bytes-like object.
๐๐ฆ๐ก๐๐ .find(sub[, start[, end]])๐๐ฆ๐ก๐๐๐๐๐๐ฆ.find(sub[, start[, end]])to return the lowest index of subsequence sub in ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ within the optional range specified by one argument start, or two arguments start and end. -1 will be returned if sub is not found. The subsequence to search for may be any bytes-like object or an integer in the range 0 to 255.
๐๐ฆ๐ก๐๐ .index(sub[, start[, end]])๐๐ฆ๐ก๐๐๐๐๐๐ฆ.index(sub[, start[, end]])to return the lowest index of subsequence sub in ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ within the optional range specified by one argument start, or two arguments start and end. But will raise ValueError instead of -1 while using ๐๐ฆ๐ก๐๐ .find(sub[, start[, end]])) or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ.find(sub[, start[, end]])) if subsequence sub is not found. The subsequence to search for may be any bytes-like object or an integer in the range 0 to 255.
๐๐ฆ๐ก๐๐ .join(iterable)๐๐ฆ๐ก๐๐๐๐๐๐ฆ.join(iterable)to return a concatenated bytes or bytearray object by concatenating the binary data sequences in iterable with delimiter ๐ ๐ก๐. A TypeError will be raised if there are any non-bytes-like objects in iterable, including string objects. The delimiter between elements is the contents of the bytes or bytearray object providing this method.
static ๐๐ฆ๐ก๐๐ .maketrans(from, to)static bytearray.maketrans(from, to)a static method used to return a translation table usable for ๐๐ฆ๐ก๐๐ .translate() that will map each character in from into the character at the same position in to; from and to must both be bytes-like objects and have the same length.
๐๐ฆ๐ก๐๐ .partition(sep)๐๐ฆ๐ก๐๐๐๐๐๐ฆ.partition(sep)to return a 3-tuple from ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ by spliting the sequence into three parts, with the first tuple is the part before the first occurrence of specified sep, the second tuple is the separator sep, and the third tuple is the part after the first occurrence of specified sep. If the separator is not found, then a 3-tuple with the string itself, followed by two empty bytes or bytearray will be returned. The separator to search for may be any bytes-like object.
๐๐ฆ๐ก๐๐ .replace(old, new[, count])๐๐ฆ๐ก๐๐๐๐๐๐ฆ.replace(old, new[, count])to return a sequence from ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ with the specified subsequence old is replaced by the specified subsequence new according to the optional specified count occurrences from the beginnine of sequence. By default, all occurrences will be replaced if argument count is not given. The subsequence to search for and its replacement may be any bytes-like object. The bytearray version of this method does not operate in place - it always produces a new object, even if no changes were made.
๐๐ฆ๐ก๐๐ .rfind(sub[, start[, end]])๐๐ฆ๐ก๐๐๐๐๐๐ฆ.rfind(sub[, start[, end]])to return the highest index of sub in ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ within the optional range specified by one argument start, or two arguments start and end. -1 will be returned if sub is not found. The subsequence to search for may be any bytes-like object or an integer in the range 0 to 255.
๐๐ฆ๐ก๐๐ .rindex(sub[, start[, end]])๐๐ฆ๐ก๐๐๐๐๐๐ฆ.rindex(sub[, start[, end]])to return the lowest index of sub in ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ within the optional range specified by one argument start, or two arguments start and end. But will raise ValueError instead of -1 while using ๐ ๐ก๐.find(sub[, start[, end]])) if sub is not found. The subsequence to search for may be any bytes-like object or an integer in the range 0 to 255.
๐๐ฆ๐ก๐๐ .rpartition(sep)๐๐ฆ๐ก๐๐๐๐๐๐ฆ.rpartition(sep)to return a 3-tuple from ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ by spliting the string into three parts, with the first tuple is the part before the last occurrence of specified sep, the second tuple is the separator sep, and the third tuple is the part after the last occurrence of specified sep. If the separator is not found, then a 3-tuple with two empty bytes or bytearray objects, followed by the original sequence copy will be returned. The separator to search for may be any bytes-like object.
๐๐ฆ๐ก๐๐ .startswith(prefix[, start[, end]])๐๐ฆ๐ก๐๐๐๐๐๐ฆ.startswith(prefix[, start[, end]])to return a True if ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ starts with the given prefix within the optional range specified by one argument start, or two arguments start and end, otherwise return False. The prefix(es) to search for may be any bytes-like object. prefix can also be a tuple of prefixes to look for.
๐๐ฆ๐ก๐๐ .translate(table, /, delete=b'')๐๐ฆ๐ก๐๐๐๐๐๐ฆ.translate(table, /, delete=b'')to return a bytes or bytearray object from ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ by mapping each character to a given translation table table
Return a copy of the bytes or bytearray object where all bytes occurring in the optional argument delete are removed, and the remaining bytes have been mapped through the given translation table, which must be a bytes object of length 256.
You can use the ๐๐ฆ๐ก๐๐ .maketrans() method to create a translation table.
Set the table argument to None for translations that only delete characters:
>>>
>>> b'read this short text'.translate(None, b'aeiou')
b'rd ths shrt txt'
Changed in version 3.6: delete is now supported as a keyword argument.
The following methods on bytes and bytearray objects have default behaviours that assume the use of ASCII compatible binary formats, but can still be used with arbitrary binary data by passing appropriate arguments. Note that all of the bytearray methods in this section do not operate in place, and instead produce new objects.
๐๐ฆ๐ก๐๐ .center(width[, fillbyte])๐๐ฆ๐ก๐๐๐๐๐๐ฆ.center(width[, fillbyte])to return a bytes or bytearray object from ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ with the whole sequence is centered within the specified length width by padding with the optional specified fillbyte. The default fillbyte is an ASCII space. For bytes objects, if the specified width is less than or equal to len(๐ ), then the orginal sequence s is returned. The bytearray version of this method does not operate in place - it always produces a new object, even if no changes were made.
๐๐ฆ๐ก๐๐ .ljust(width[, fillbyte])๐๐ฆ๐ก๐๐๐๐๐๐ฆ.ljust(width[, fillbyte])to return a bytes or bytearray object from ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ with the whole sequence is left justified within the specified length width by padding with the optional specified fillbyte. The default fillbyte is an ASCII space. For bytes objects, if the specified width is less than or equal to len(๐ ), then the orginal sequence ๐ is returned. The bytearray version of this method does not operate in place - it always produces a new object, even if no changes were made.
๐๐ฆ๐ก๐๐ .lstrip([chars])๐๐ฆ๐ก๐๐๐๐๐๐ฆ.lstrip([chars])to return a bytes or bytearray object from ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ with the optional specified leading binary sequence chars are removed until reaching a byte that is not contained in the set of byte values in chars. The default chars is whitespace, if argument chars is omitted or None. Argument chars is used to specify all combinations of the set of byte values to be stripped from, not a specific prefix byte values to be removed. The binary sequence of byte values to remove may be any bytes-like object. he bytearray version of this method does not operate in place - it always produces a new object, even if no changes were made.
๐๐ฆ๐ก๐๐ .rjust(width[, fillbyte])๐๐ฆ๐ก๐๐๐๐๐๐ฆ.rjust(width[, fillbyte])to return aa bytes or bytearray object from ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ with the whole sequence is right justified within the specified length width by padding with the optional specified fillbyte. The default fillbyte is an ASCII space. For bytes objects, if the specified width is less than or equal to len(๐ ), then the orginal sequence ๐ is returned. The bytearray version of this method does not operate in place - it always produces a new object, even if no changes were made.
๐๐ฆ๐ก๐๐ .rsplit(sep=None, maxsplit=-1)๐๐ฆ๐ก๐๐๐๐๐๐ฆ.rsplit(sep=None, maxsplit=-1)to return a list of the subsequences from ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ by spliting with the specified delimiter sep according to the given count maxsplit. In other words, the list will have at most maxsplit+1 elements, the rightmost ones. If maxsplit is not specified or equal to -1, then all possible splits will be carried out. Splitting an empty sequence will always returns a list with an empty string. If delimiter sep is given, spliting two consecutive delimiters in ๐ ๐ก๐ will return an empty ssequence. The delimiter argument sep may be multiple characters.
๐๐ฆ๐ก๐๐ .rstrip([chars])๐๐ฆ๐ก๐๐๐๐๐๐ฆ.rstrip([chars])to return a sequence from ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ with the optional specified trailing characters chars are removed until reaching a byte value that is not contained in the set of byte values in chars. The default chars is ASCII whitespace, if argument chars is omitted or None. Argument chars is used to specify all combinations of the set of byte values to be stripped from, not a specific prefix bytes to be removed. The binary sequence of byte values to remove may be any bytes-like object. The bytearray version of this method does not operate in place - it always produces a new object, even if no changes were made.
๐๐ฆ๐ก๐๐ .split(sep=None, maxsplit=-1)๐๐ฆ๐ก๐๐๐๐๐๐ฆ.split(sep=None, maxsplit=-1)to return a list of the subsequence from ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ by spliting with the specified delimiter sep according to the given count maxsplit. In other words, the list will have at most maxsplit+1 elements. If maxsplit is not specified or equal to -1, then all possible splits will be carried out. Splitting an empty sequence will always returns a list with an empty sequence. If delimiter sep is given, spliting two consecutive delimiters in ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ will return an empty sequence. The delimiter argument sep may be multiple characters.
If sep is given, consecutive delimiters are not grouped together and are deemed to delimit empty subsequences (for example, b'1,,2'.split(b',') returns [b'1', b'', b'2']). The sep argument may consist of a multibyte sequence (for example, b'1<>2<>3'.split(b'<>') returns [b'1', b'2', b'3']). Splitting an empty sequence with a specified separator returns [b''] or [bytearray(b'')] depending on the type of object being split. The sep argument may be any bytes-like object.
If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive ASCII whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the sequence has leading or trailing whitespace. Consequently, splitting an empty sequence or a sequence consisting solely of ASCII whitespace without a specified separator returns [].
For example:
>>>
>>> b'1 2 3'.split()
[b'1', b'2', b'3']
>>> b'1 2 3'.split(maxsplit=1)
[b'1', b'2 3']
>>> b' 1 2 3 '.split()
[b'1', b'2', b'3']
๐๐ฆ๐ก๐๐ .strip([chars])๐๐ฆ๐ก๐๐๐๐๐๐ฆ.strip([chars])to return a sequence from ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ with the optional specified leading and trailing byte values chars are removed until reaching a byte value that is not contained in the set of byte values in chars. The default chars is ASCII whitespace, if argument chars is omitted or None. Argument chars is used to specify all combinations of the set of byte values to be stripped from, not a specific prefix or suffix byte values to be removed. The binary sequence of byte values to remove may be any bytes-like object. The bytearray version of this method does not operate in place - it always produces a new object, even if no changes were made.
The following methods on bytes and bytearray objects assume the use of ASCII compatible binary formats and should not be applied to arbitrary binary data. Note that all of the bytearray methods in this section do not operate in place, and instead produce new objects.
๐๐ฆ๐ก๐๐ .capitalize()๐๐ฆ๐ก๐๐๐๐๐๐ฆ.capitalize()to return a sequence from ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ with each byte interpreted as an ASCII character, and the first byte capitalized and the rest lowercased. Non-ASCII byte values are passed through unchanged. The bytearray version of this method does not operate in place - it always produces a new object, even if no changes were made.
๐๐ฆ๐ก๐๐ .expandtabs(tabsize=8)๐๐ฆ๐ก๐๐๐๐๐๐ฆ.expandtabs(tabsize=8)to return a sequence from ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ with all ASCII tab characters are replaced by one or more ASCII spaces according to the specified tabsize and the column position of the tab character. The default tabsize is 8. In other words, tab column positions are at 0, 8, โฏ. To expand the sequence, the current column is set to zero and the sequence is examined byte by byte. A tab character will always be will replaced by one to tabsize spaces such that the last added space will always be the space be next tab column position. If the byte is an ASCII tab character (b'\t'), one or more space characters are inserted in the result until the current column is equal to the next tab position. (The tab character itself is not copied.) If the current byte is an ASCII newline (b'\n') or carriage return (b'\r'), it is copied and the current column is reset to zero. Any other byte value is copied unchanged and the current column is incremented by one regardless of how the byte value is represented when printed. The bytearray version of this method does not operate in place - it always produces a new object, even if no changes were made.
๐๐ฆ๐ก๐๐ .isalnum()๐๐ฆ๐ก๐๐๐๐๐๐ฆ.isalnum()to return a True if all bytes in ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ, which is not empty, are alphabetical ASCII characters or ASCII decimal digits, otherwise return a False. Alphabetic ASCII characters are those byte values in the sequence b'abcdefghijklmnopqrstuvwxyzโABCDEFGHIJKLMNOPQRSTUVWXYZ'. ASCII decimal digits are those byte values in the sequence b'0123456789'.
๐๐ฆ๐ก๐๐ .isalpha()๐๐ฆ๐ก๐๐๐๐๐๐ฆ.isalpha()to return a True if all bytes in ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ, which is not empty, are alphabetic ASCII characters, otherwise return a False. Alphabetic ASCII characters are those byte values in the sequence b'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'.
๐๐ฆ๐ก๐๐ .isascii()๐๐ฆ๐ก๐๐๐๐๐๐ฆ.isascii()to return a True if ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ is empty or all bytes in ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ, are ASCII characters, otherwise return a False. ASCII characters have code points in the range U+0000-U+007F (0-0x7F).
๐๐ฆ๐ก๐๐ .isdigit()๐๐ฆ๐ก๐๐๐๐๐๐ฆ.isdigit()to return a True if all bytes in ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ, which is not empty, are ASCII decimal digits, otherwise return a False. ASCII decimal digits are those byte values in the sequence b'0123456789'.
๐๐ฆ๐ก๐๐ .islower()๐๐ฆ๐ก๐๐๐๐๐๐ฆ.islower()to return a True if there is at least one lowercase ASCII character in the sequence in ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ and no uppercase ASCII characters, otherwise return a False. Lowercase ASCII characters are those byte values in the sequence b'abcdefghijklmnopqrstuvwxyz'. Uppercase ASCII characters are those byte values in the sequence b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'.
๐๐ฆ๐ก๐๐ .isspace()๐๐ฆ๐ก๐๐๐๐๐๐ฆ.isspace()to return a True if all bytes in ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ, which is not empty, are ASCII whitespace characters only, otherwise return a False. ASCII whitespace characters are those byte values in the sequence b' \t\n\r\x0b\f' (space, tab, newline, carriage return, vertical tab, form feed).
๐๐ฆ๐ก๐๐ .istitle()๐๐ฆ๐ก๐๐๐๐๐๐ฆ.istitle()to return a True if the ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ is not empty and is ASCII titlecased, otherwise return a False.
๐๐ฆ๐ก๐๐ .isupper()๐๐ฆ๐ก๐๐๐๐๐๐ฆ.isupper()to return a True if there is at least one uppercase alphabetic ASCII character in ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ, and no lowercase ASCII characters, otherwise return a False. Lowercase ASCII characters are those byte values in the sequence b'abcdefghijklmnopqrstuvwxyz'. Uppercase ASCII characters are those byte values in the sequence b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'.
๐๐ฆ๐ก๐๐ .lower()๐๐ฆ๐ก๐๐๐๐๐๐ฆ.lower()to return a sequence from ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ with all ASCII cased character are converted to corresponding lowercase counterpart. Lowercase ASCII characters are those byte values in the sequence b'abcdefghijklmnopqrstuvwxyz'. Uppercase ASCII characters are those byte values in the sequence b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'. The bytearray version of this method does not operate in place - it always produces a new object, even if no changes were made.
๐๐ฆ๐ก๐๐ .splitlines(keepends=False)๐๐ฆ๐ก๐๐๐๐๐๐ฆ.splitlines(keepends=False)to return a list of the lines from ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ by breaking at ASCII line boundaries. Line boundaries is a superset of universal boundaries, e.g. \n Line Feed, \r Carriage Return, \r\n Carriage Return + Line Feed, \v or \x0b Line Tabulation, \f or \x0c Form Feed, \x1c File Separator, \x1d Group Separator, \x1e Record Separator, \x85 Next Line (C1 Control Code), \u2028 Line Separator, \u2029 Paragraph Separator etc. Line breask are not included in the resulting list unless argument keepends is given and true. Unlike split() splitlines returns an empty list for the empty string, and a terminal line break does not result in an extra line.
๐๐ฆ๐ก๐๐ .swapcase()๐๐ฆ๐ก๐๐๐๐๐๐ฆ.swapcase()to return a sequence from ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ with uppercase characters converted to corresponding lowercase counterpart and vice versa. Lowercase ASCII characters are those byte values in the sequence b'abcdefghijklmnopqrstuvwxyz'. Uppercase ASCII characters are those byte values in sequence b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'. Unlike ๐ ๐ก๐.swapcase(), it is always the case that bin.swapcase().swapcase() == bin for the binary versions. Case conversions are symmetrical in ASCII, even though that is not generally true for arbitrary Unicode code points. The bytearray version of this method does not operate in place - it always produces a new object, even if no changes were made.
๐๐ฆ๐ก๐๐ .title()๐๐ฆ๐ก๐๐๐๐๐๐ฆ.title()to return a binary sequence from ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ by converting to a titlecased version with all words are of first uppercase ASCII character and others lowercase characters. Uncased byte values are left unmodified. Lowercase ASCII characters are those byte values in sequence b'abcdefghijklmnopqrstuvwxyz'. Uppercase ASCII characters are those byte values in sequence b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'. All other byte values are uncased. The bytearray version of this method does not operate in place - it always produces a new object, even if no changes were made.
The algorithm uses a simple language-independent definition of a word as groups of consecutive letters. The definition works in many contexts but it means that apostrophes in contractions and possessives form word boundaries, which may not be the desired result:
>>>
>>> b"they're bill's friends from the UK".title()
b"They'Re Bill'S Friends From The Uk"
A workaround for apostrophes can be constructed using regular expressions:
>>>
>>> import re
>>> def titlecase(s):
... return re.sub(rb"[A-Za-z]+('[A-Za-z]+)?",
... lambda mo: mo.group(0)[0:1].upper() +
... mo.group(0)[1:].lower(),
... s)
...
>>> titlecase(b"they're bill's friends.")
b"They're Bill's Friends."
Note
๐๐ฆ๐ก๐๐ .upper()๐๐ฆ๐ก๐๐๐๐๐๐ฆ.upper()to return a sequence from ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ with all lowercased ASCII characters are converted to corresponding uppercase counterpart. Lowercase ASCII characters are those byte values in the sequence b'abcdefghijklmnopqrstuvwxyz'. Uppercase ASCII characters are those byte values in the sequence b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'. But for bytearray object, this method does not operate in place - it always produces a new object, even if no changes were made.
๐๐ฆ๐ก๐๐ .zfill(width)๐๐ฆ๐ก๐๐๐๐๐๐ฆ.zfill(width)to return a sequence of the specified length width from ๐๐ฆ๐ก๐๐ or ๐๐ฆ๐ก๐๐๐๐๐๐ฆ by left filling with ASCII b'0' digits. A leading sign prefix (b'+'/b'-') is handled by inserting the padding after the sign character rather than before. For bytes object, if the specified width is less than or equal to len(๐๐ฆ๐ก๐๐ ), then the orginal sequence ๐๐ฆ๐ก๐๐ is returned. But for bytearray object, this method does not operate in place - it always produces a new object, even if no changes were made.
Source and Reference
ยฉsideway
ID: 210100013 Last Updated: 1/13/2021 Revision: 0
|
 |