|
`-=[]โจโฉ\;',./~!@#$%^&*()_+{}|:"<>? ๐๐๐๐๐๐๐โ๐๐๐๐๐๐๐๐๐๐๐ ๐ก๐ข๐ฃ๐ค๐ฅ๐ฆ๐ง
ร
โโโรโโ
โยฑโ๊๏นฆโโ โฏ ๐ธ๐นโ๐ป๐ผ๐ฝ๐พโ๐๐๐๐๐โ๐โโโ๐๐๐๐๐๐๐โค๐ด๐ต๐ถ๐ท๐ธ๐น๐บ๐ป๐ผ๐ฝ๐พ๐ฟ๐๐๐๐๐๐
๐๐๐๐๐๐๐๐
โผโฝโพโโโโโ
โโโโโโโ โก โคโฅโฆโงโจโฉโชโซ
โโโโโโ โโโโ
โโ ๐ผ๐ฝ๐พ๐ฟ๐๐๐๐๐๐
๐๐๐๐๐๐๐๐๐๐๐๐๐๐
โโโโ
โฆฐโโโโโโดโต โโโโโโโ โงโจโฉโช
โซโฌโญโฎโฏโฐโฑโฒโณ โฅโฎโฏโฐโฑ โ โฒ โณ โด โ โ สน สบ โต โถ โท
๏น ๏น ๏น ๏น ๏ธน ๏ธบ ๏ธป ๏ธผ ๏ธ ๏ธ ๏ธฟ ๏น ๏ธฝ ๏ธพ ๏น ๏น ๏ธท ๏ธธ โ โ โด โต โ โ โ โก
โโโโโคโฆโฅโงโโโโโโโฒโผโโถโบโปโฒโณ โผโฝโพโฟโโโโโโ
โโ โโโโโโโโโโโโโโโณโฅขโฅฃโฅคโฅฅโฅฆโฅงโฅจโฅฉโฅชโฅซโฅฌโฅญโฅฎโฅฏ
Draft for Information Only
Content
Python Text Sequence Types โPython Text Strings โPython String Literals โPython String Constructor โโclass str โโParameter โโRemarks โPython Strings Methods โSource and Reference
Python Text Sequence Types
Python textual string is implemented as Python text sequence types which are supported by predefined str iteration objects, called string.
Python Text Strings
The Python text sequence of Python strings are stored as immutable sequences of Unicode code points internally. Although the type constructor of Python strings is designed to handled all kinds of Python textual data for both representation of printable string or byes like objects, methods of str are only used to manipulate Python text sequence only.
Python does not have an individual character type for Python text sequence of Python string, the indexing of a Python string produces strings of length 1. e.g. for a non-empty string ๐ , ๐ [0]==๐ [0:1]. Python also does not have a mutable Python string type, but various methods are provided to construct strings from a given Python string type.
Python String Literals
Python strings may be constructed by string literals.
String LiteralsSingle quotes: For allowing embedded double quotes. e.g. 'allows embedded "double" quotes'
Double quotes: For allowing embedded single quotes. e.g. "allows embedded 'single' quotes"
Triple quoted: For allowing embedded single and double quotes. Besides triple quoted strings may span multiple lines and all associated whitespace will be included in the string literal. e.g. '''Three single quotes''', """Three double quotes"""
Besides, a single expression of several textual literals separated by whitespace only will be implicitly converted to a single textual literal.
Python String Constructor
class str
class str(object='')
Parameter
class strTo return a string version of object. The behavior of str() depends on whether the arguments, encoding or errors is given or not.
object=''To specify a textual string object. Default object is an empty string objects
Remarks
- For calling
str() function with neither encoding nor errors argument is given, the str(object) function returns object.__str__(), the informal readable printing string representation of the object. If the given object is a string object, the return is the string itself. If object does not have a __str__() method, then str() returns repr(object), the formal Python informative string presentation of the object.
- When calling
str() function with neither encoding nor errors argument is given for a bytes-like object, the str() function returns the informal string representation.
Python Strings Methods
Text sequence, the textual string sequence support, all common sequence operations. Additional methods are also provided for manipulating the typical properties of textual string sequence.
๐ ๐ก๐.capitalize()to return a string from ๐ ๐ก๐ with its first character titlecased(capitalized) and the rest lowercased. In other words, only the first letter of a character, e.g. digraphs Dz, will be capitalized, instead of the whole character.
๐ ๐ก๐.casefold()to return a string from ๐ ๐ก๐ with its characters are all casefolded. Casefolded strings may be used for caseless matching, because casefolding is intended to remove all case distinctions in a string. e.g. latin small letter sharp s, ß, will be casefolded to "ss". The casefolding algorithm follows section 3.13 of the Unicode Standard.
๐ ๐ก๐.center(width[, fillchar])to return a string from ๐ ๐ก๐ with the whole string is centered within the specified length width by padding with the optional specified fillchar. The default fillchar is an ASCII space. If the specified width is less than or equal to len(๐ ๐ก๐), then the orginal string ๐ ๐ก๐ is returned.
๐ ๐ก๐.count(sub[, start[, end]])to return the number of non-overlapping occurrences of the specified substring sub in ๐ ๐ก๐ within the optional range specified by one argument start, or two arguments start and end.
๐ ๐ก๐.encode(encoding="utf-8", errors="strict")to return a bytes object from ๐ ๐ก๐ with its string is encoded according to the specified encoding and error. Default encoding is utf-8. There are also other possible encodings. Default errors is strict, encoding errors always raise a UnicodeError. Other possible errors values are ignore, replace, xmlcharrefreplace, backslashreplace, and any other name registered via codecs.register_error().
๐ ๐ก๐.endswith(suffix[, start[, end]])to return a True if the speicified suffix is found at the end of ๐ ๐ก๐ within the optional range specified by one argument start, or two arguments start and end, otherwise return a False. The specified suffix can also be a tuple of suffixes being look for.
๐ ๐ก๐.expandtabs(tabsize=8)to return a string from ๐ ๐ก๐ with all tab characters are replaced by one or more spaces according to the specified tabsize and the column position of the tab character. The default tabsize is 8. In other words, tab column positions are at 0, 8, โฏ. A tab character will always be will replaced by one to tabsize spaces such that the last added space will always be the space be next tab column position.
๐ ๐ก๐.find(sub[, start[, end]])to return the lowest index of sub in ๐ ๐ก๐ within the optional range specified by one argument start, or two arguments start and end. -1 will be returned if sub is not found.
๐ ๐ก๐.format(*args, **kwargs)to return a string from ๐ ๐ก๐ by performing a string formatting operation. literal text or replacement fields delimited by braces {}. Each replacement field contains either the numeric index of a positional argument, or the name of a keyword argument.
When formatting a number (int, float, complex, decimal.Decimal and subclasses) with the n type (ex: '{:n}'.format(1234)), the function temporarily sets the LC_CTYPE locale to the LC_NUMERIC locale to decode decimal_point and thousands_sep fields of localeconv() if they are non-ASCII or longer than 1 byte, and the LC_NUMERIC locale is different than the LC_CTYPE locale. This temporary change affects other threads.
Changed in version 3.7: When formatting a number with the n type, the function sets temporarily the LC_CTYPE locale to the LC_NUMERIC locale in some cases.
๐ ๐ก๐.format_map(mapping)to return a string from ๐ ๐ก๐ by performing a string formatting operation through mapping to a dictionary. The operation is similar to ๐ ๐ก๐.format(**mapping), except that mapping is used directly and not copied to a dict.
๐ ๐ก๐.index(sub[, start[, end]])to return the lowest index of sub in ๐ ๐ก๐ within the optional range specified by one argument start, or two arguments start and end. But will raise ValueError instead of -1 while using ๐ ๐ก๐.find(sub[, start[, end]])) if sub is not found.
๐ ๐ก๐.isalnum()to return a True if all characters in ๐ ๐ก๐, which has at least one character, are alphanumeric characters, otherwise return a False. A character ๐ is alphanumeric only if ๐.isalpha(), ๐.isdecimal(), ๐.isdigit(), or ๐.isnumeric() returns a True.
๐ ๐ก๐.isalpha()to return a True if all characters in ๐ ๐ก๐, which has at least one character, are alphabetic characters, otherwise return a False.
Alphabetic characters are those characters defined in the Unicode character database as โLetterโ, i.e., those with general category property being one of โLmโ, โLtโ, โLuโ, โLlโ, or โLoโ. Note that this is different from the โAlphabeticโ property defined in the Unicode Standard.
๐ ๐ก๐.isascii()to return a True if ๐ ๐ก๐ is empty or all characters in ๐ ๐ก๐, are ASCII characters, otherwise return a False. ASCII characters have code points in the range U+0000-U+007F.
๐ ๐ก๐.isdecimal()to return a True if all characters in ๐ ๐ก๐, which has at least one character, are decimal characters, otherwise return a False.
Decimal characters are those that can be used to form numbers in base 10, e.g. U+0660, ARABIC-INDIC DIGIT ZERO. Formally a decimal character is a character in the Unicode General Category โNdโ.
๐ ๐ก๐.isdigit()to return a True if all characters in ๐ ๐ก๐, which has at least one character, are digits, otherwise return a False.
Digits include decimal characters and digits that need special handling, such as the compatibility superscript digits. This covers digits which cannot be used to form numbers in base 10, like the Kharosthi numbers. Formally, a digit is a character that has the property value Numeric_Type=Digit or Numeric_Type=Decimal.
๐ ๐ก๐.isidentifier()to return a True if ๐ ๐ก๐ is a valid identifier according to the language definition, section identifiers, and keywords.
๐ ๐ก๐.islower()to return a True if all cased characters in ๐ ๐ก๐, which has at least one character, are lowercase, otherwise return a False
๐ ๐ก๐.isnumeric()to return a True if all characters in ๐ ๐ก๐, which has at least one character, are numeric characters, otherwise return a False.
Numeric characters include digit characters, and all characters that have the Unicode numeric value property, e.g. U+2155, VULGAR FRACTION ONE FIFTH. Formally, numeric characters are those with the property value Numeric_Type=Digit, Numeric_Type=Decimal or Numeric_Type=Numeric.
๐ ๐ก๐.isprintable()to return a True if all characters in ๐ ๐ก๐, which has at least one character, are printable, otherwise return a False.
Nonprintable characters are those characters defined in the Unicode character database as โOtherโ or โSeparatorโ, excepting the ASCII space (0x20) which is considered printable. (Note that printable characters in this context are those which should not be escaped when repr() is invoked on a string. It has no bearing on the handling of strings written to sys.stdout or sys.stderr.)
๐ ๐ก๐.isspace()to return a True if all characters in ๐ ๐ก๐, which has at least one character, are whitespace characters only, otherwise return a False.
A character is whitespace if in the Unicode character database (see unicodedata), either its general category is Zs (โSeparator, spaceโ), or its bidirectional class is one of WS, B, or S.
๐ ๐ก๐.istitle()to return a True if the ๐ ๐ก๐, which has at least one character, is a titlecased string, otherwise return a False.
Note: uppercase characters may only follow uncased characters and lowercase characters only cased ones.
๐ ๐ก๐.isupper()to return a True if all cased characters in ๐ ๐ก๐, which has at least one character, are uppercase characters, otherwise return a False.
๐ ๐ก๐.join(iterable)to return a concatenated string by concatenating the strings in iterable with delimiter ๐ ๐ก๐. A TypeError will be raised if there are any non-string values in iterable, including bytes objects.
๐ ๐ก๐.ljust(width[, fillchar])to return a string from ๐ ๐ก๐ with the whole string is left justified within the specified length width by padding with the optional specified fillchar. The default fillchar is an ASCII space. If the specified width is less than or equal to len(๐ ๐ก๐), then the orginal string ๐ ๐ก๐ is returned.
๐ ๐ก๐.lower()to return a string from ๐ ๐ก๐ with all cased character are converted to lowercase.
The lowercasing algorithm used is described in section 3.13 of the Unicode Standard.
๐ ๐ก๐.lstrip([chars])to return a string from ๐ ๐ก๐ with the optional specified leading characters chars are removed until reaching a string character that is not contained in the set of characters in chars. The default chars is whitespace, if argument chars is omitted or None. Argument chars is used to specify all combinations of the set of characters to be stripped from, not a specific prefix string to be removed.
static ๐ ๐ก๐.maketrans(x[, y[, z]]) This static method returns a translation table usable for ๐ ๐ก๐.translate().
If there is only one argument, it must be a dictionary mapping Unicode ordinals (integers) or characters (strings of length 1) to Unicode ordinals, strings (of arbitrary lengths) or None. Character keys will then be converted to ordinals.
If there are two arguments, they must be strings of equal length, and in the resulting dictionary, each character in x will be mapped to the character at the same position in y. If there is a third argument, it must be a string, whose characters will be mapped to None in the result.
๐ ๐ก๐.partition(sep)to return a 3-tuple from ๐ ๐ก๐ by spliting the string into three parts, with the first tuple is the part before the first occurrence of specified sep, the second tuple is the separator sep, and the third tuple is the part after the first occurrence of specified sep. If the separator is not found, then a 3-tuple with the string itself, followed by two empty strings will be returned.
๐ ๐ก๐.replace(old, new[, count])to return a string from ๐ ๐ก๐ with the specified substring old is replaced by the specified substring new according to the optional specified count occurrences from the beginnine of string. By default, all occurrences will be replaced if argument count is not given.
๐ ๐ก๐.rfind(sub[, start[, end]])to return the highest index of sub in ๐ ๐ก๐ within the optional range specified by one argument start, or two arguments start and end. -1 will be returned if sub is not found.
๐ ๐ก๐.rindex(sub[, start[, end]])to return the lowest index of sub in ๐ ๐ก๐ within the optional range specified by one argument start, or two arguments start and end. But will raise ValueError instead of -1 while using ๐ ๐ก๐.find(sub[, start[, end]])) if sub is not found.
๐ ๐ก๐.rjust(width[, fillchar])to return a string from ๐ ๐ก๐ with the whole string is right justified within the specified length width by padding with the optional specified fillchar. The default fillchar is an ASCII space. If the specified width is less than or equal to len(๐ ๐ก๐), then the orginal string ๐ ๐ก๐ is returned.
๐ ๐ก๐.rpartition(sep)to return a 3-tuple from ๐ ๐ก๐ by spliting the string into three parts, with the first tuple is the part before the last occurrence of specified sep, the second tuple is the separator sep, and the third tuple is the part after the last occurrence of specified sep. If the separator is not found, then a 3-tuple with two empty strings, followed by the string itself will be returned.
๐ ๐ก๐.rsplit(sep=None, maxsplit=-1)to return a list of the words from ๐ ๐ก๐ by spliting with the specified delimiter sep according to the given count maxsplit. In other words, the list will have at most maxsplit+1 elements, the rightmost ones. If maxsplit is not specified or equal to -1, then all possible splits will be carried out. Splitting an empty string will always returns a list with an empty string. If delimiter sep is given, spliting two consecutive delimiters in ๐ ๐ก๐ will return an empty string. The delimiter argument sep may be multiple characters.
If sep is not specified or is None, any whitespace string is a separator. A different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace. Consequently, splitting an empty string or a string consisting of just whitespace with a None separator returns [].
๐ ๐ก๐.rstrip([chars])to return a string from ๐ ๐ก๐ with the optional specified trailing characters chars are removed until reaching a string character that is not contained in the set of characters in chars. The default chars is whitespace, if argument chars is omitted or None. Argument chars is used to specify all combinations of the set of characters to be stripped from, not a specific prefix string to be removed.
๐ ๐ก๐.split(sep=None, maxsplit=-1)to return a list of the words from ๐ ๐ก๐ by spliting with the specified delimiter sep according to the given count maxsplit. In other words, the list will have at most maxsplit+1 elements. If maxsplit is not specified or equal to -1, then all possible splits will be carried out. Splitting an empty string will always returns a list with an empty string. If delimiter sep is given, spliting two consecutive delimiters in ๐ ๐ก๐ will return an empty string. The delimiter argument sep may be multiple characters.
If sep is not specified or is None, if delimiter sep is not specified or None, any whitespace string is a separator. A different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace. Consequently, splitting an empty string or a string consisting of just whitespace with a None separator returns [].
๐ ๐ก๐.splitlines([keepends])to return a list of the lines from ๐ ๐ก๐ by breaking at line boundaries. Line boundaries is a superset of universal boundaries, e.g. \n Line Feed, \r Carriage Return, \r\n Carriage Return + Line Feed, \v or \x0b Line Tabulation, \f or \x0c Form Feed, \x1c File Separator, \x1d Group Separator, \x1e Record Separator, \x85 Next Line (C1 Control Code), \u2028 Line Separator, \u2029 Paragraph Separator etc. Line breask are not included in the resulting list unless argument keepends is given and true. Unlike split() splitlines returns an empty list for the empty string, and a terminal line break does not result in an extra line
๐ ๐ก๐.startswith(prefix[, start[, end]])to return a True if ๐ ๐ก๐ starts with the given prefix within the optional range specified by one argument start, or two arguments start and end, otherwise return False.
๐ ๐ก๐.strip([chars])to return a string from ๐ ๐ก๐ with the optional specified leading and trailing characters chars are removed until reaching a string character that is not contained in the set of characters in chars. The default chars is whitespace, if argument chars is omitted or None. Argument chars is used to specify all combinations of the set of characters to be stripped from, not a specific prefix or suffix string to be removed.
๐ ๐ก๐.swapcase()to return a string from ๐ ๐ก๐ with uppercase characters converted to lowercase and vice versa. However, it is not necessarily true that s.swapcase().swapcase() == s.
๐ ๐ก๐.title()to return a string from ๐ ๐ก๐ by converting to a titlecased version with all words are of first uppercase character and others lowercase characters.
The algorithm uses a simple language-independent definition of a word as groups of consecutive letters. The definition works in many contexts but it means that apostrophes in contractions and possessives form word boundaries, which may not be the desired result:
>>>
>>> "they're bill's friends from the UK".title()
"They'Re Bill'S Friends From The Uk"
A workaround for apostrophes can be constructed using regular expressions:
>>>
>>> import re
>>> def titlecase(s):
... return re.sub(r"[A-Za-z]+('[A-Za-z]+)?",
... lambda mo: mo.group(0).capitalize(),
... s)
...
>>> titlecase("they're bill's friends.")
"They're Bill's Friends."
๐ ๐ก๐.translate(table)to return a string from ๐ ๐ก๐ by mapping each character to a given translation table table.
The table must be an object that implements indexing via __getitem__(), typically a mapping or sequence. When indexed by a Unicode ordinal (an integer), the table object can do any of the following: return a Unicode ordinal or a string, to map the character to one or more other characters; return None, to delete the character from the return string; or raise a LookupError exception, to map the character to itself.
You can use ๐ ๐ก๐.maketrans() to create a translation map from character-to-character mappings in different formats.
See also the codecs module for a more flexible approach to custom character mappings.
๐ ๐ก๐.upper()to return a string from ๐ ๐ก๐ with all cased characters are converted to uppercase.
Note that s.upper().isupper() might be False if s contains uncased characters or if the Unicode category of the resulting character(s) is not โLuโ (Letter, uppercase), but e.g. โLtโ (Letter, titlecase).
The uppercasing algorithm used is described in section 3.13 of the Unicode Standard.
๐ ๐ก๐.zfill(width)to return a string of the specified length width from ๐ ๐ก๐ by left filling with ASCII '0' digits. A leading sign prefix ('+'/'-') is handled by inserting the padding after the sign character rather than before. If the specified width is less than or equal to len(๐ ๐ก๐), then the orginal string ๐ ๐ก๐ is returned.
Source and Reference
ยฉsideway
ID: 210100017 Last Updated: 1/17/2021 Revision: 0
|
 |