python string

By: Ashley J  

Python String objects (byte strings, as well as text, AKA Unicode, ones) are immutable: attempting to rebind or delete an item or slice of a string will raise an exception. The items of a string object (corresponding to each of the characters in the string) are themselves strings of the same kind, each of length 1. 

Unicode str and bytes objects are immutable sequences. All immutable-sequence operations (repetition, concatenation, indexing, and slicing) apply to them, returning an object of the same type. Listed below are the methods available for string handling in Python.

capitalize

s.capitalize()

Returns a copy of s where the first character, if a letter, is uppercase, and all other letters, if any, are lowercase.

casefold

s.casefold()

str only, v3 only. Returns a string processed by the algorithm described in section 3.13 of the Unicode standard. This is similar to s.lower (described later in this list) but also takes into account broader equivalences, such as that between the German lowercase 'ß' and 'ss', and is thus better suited to case-insensitive matching.

center

s.center(n,fillchar=' ')

Returns a string of length max(len(s), n), with a copy of s in the central part, surrounded by equal numbers of copies of character fillchar on both sides (e.g., 'ciao'.center(2) is 'ciao' and 'x'.center(4,'_') is '_x__').

count

s.count(sub,start=0,end=sys.maxsize)

Returns the number of nonoverlapping occurrences of substring sub in s[start:end].

decode

s.decode(encoding='utf-8',errors='strict')

bytes only. Returns a str object decoded from the bytes s according to the given encoding. errors determines how decoding errors are handled. 'strict' cause errors to raise UnicodeError exceptions, 'ignore' ignores the malformed data, and 'replace' replaces them with question marks; see â€œUnicode” for details. Other values can be registered via codec.register_error()

encode

s.encode(encoding=None,errors='strict')

str only. Returns a bytes object obtained from s with the given encoding and error handling.

endswith

s.endswith(suffix,start=0,end=sys.maxsize)

Returns True when s[start:end] ends with string suffix; otherwise, False. suffix can be a tuple of strings, in which case endswith returns True when s[start:end] ends with any one of them.

expandtabs

s.expandtabs(tabsize=8)

Returns a copy of s where each tab character is changed into one or more spaces, with tab stops every tabsize characters.

find

s.find(sub,start=0,end=sys.maxsize)

Returns the lowest index in s where substring sub is found, such that sub is entirely contained in s[start:end]. For example, 'banana'.find('na') is 2, as is 'banana'.find('na',1), while 'banana'.find('na',3) is 4, as is 'banana'.find('na',-2). find returns -1 when sub is not found.

format

s.format(*args,**kwargs)

str only. Formats the positional and named arguments according to formatting instructions contained in the string s. See â€œString Formatting” for further details.

format_map

s.format_map(mapping)

str only, v3 only. Formats the mapping argument according to formatting instructions contained in the string s. Equivalent to s.format(**mapping) but uses the mapping directly.

index

s.index(sub,start=0,end=sys.maxsize)

Like find, but raises ValueError when sub is not found.

isalnum

s.isalnum()

Returns True when len(s) is greater than 0 and all characters in s are letters or digits. When s is empty, or when at least one character of s is neither a letter nor a digit, isalnum returns False.

isalpha

s.isalpha()

Returns True when len(s) is greater than 0 and all characters in s are letters. When s is empty, or when at least one character of s is not a letter, isalpha returns False.

isdecimal

s.isdecimal()

str only, v3 only. Returns True when len(s) is greater than 0 and all characters in s can be used to form decimal-radix numbers. This includes Unicode characters defined as Arabic digits.

isdigit

s.isdigit()

Returns True when len(s) is greater than 0 and all characters in s are digits. When s is empty, or when at least one character of s is not a digit, isdigit returns False.

isidentifier

s.isidentifier()

str only, v3 only. Returns True when s is a valid identifier according to the Python language’s definition; keywords also satisfy the definition, so, for example, 'class'.isidentifier() returns True.

islower

s.islower()

Returns True when all letters in s are lowercase. When s contains no letters, or when at least one letter of s is uppercase, islower returns False.

isnumeric

s.isnumeric()

str only, v3 only. Similar to s.isdigit(), but uses a broader definition of numeric characters that includes all characters defined as numeric in the Unicode standard (such as fractions).

isprintable

s.isprintable()

str only, v3 only. Returns True when all characters in s are spaces ('\x20') or are defined in the Unicode standard as printable. Differently from other methods starting with is, ''.isprintable() returns True.

isspace

s.isspace()

Returns True when len(s) is greater than 0 and all characters in s are whitespace. When s is empty, or when at least one character of s is not whitespace, isspace returns False.

istitle

s.istitle()

Returns True when letters in s are titlecase: a capital letter at the start of each contiguous sequence of letters, all other letters lowercase (e.g., 'King Lear'.istitle() is True). When s contains no letters, or when at least one letter of s violates the titlecase condition, istitlereturns False (e.g., '1900'.istitle() and 'Troilus and Cressida'.istitle() return False).

isupper

s.isupper()

Returns True when all letters in s are uppercase. When s contains no letters, or when at least one letter of s is lowercase, isupper returns False.

join

s.join(seq)

Returns the string obtained by concatenating the items of seq, which must be an iterable whose items are strings, and interposing a copy of s between each pair of items (e.g., ''.join(str(x) for x in range(7)) is '0123456' and 'x'.join('aeiou') is 'axexixoxu').

ljust

s.ljust(n,fillchar=' ')

Returns a string of length max(len(s),n), with a copy of s at the start, followed by zero or more trailing copies of character fillchar.

lower

s.lower()

Returns a copy of s with all letters, if any, converted to lowercase.

lstrip

s.lstrip(x=string.whitespace)

Returns a copy of s, removing leading characters that are found in string x. For example, 'banana'.lstrip('ab') returns 'nana'.

replace

s.replace(old,new,maxsplit=sys.maxsize)

Returns a copy of s with the first maxsplit (or fewer, if there are fewer) nonoverlapping occurrences of substring old replaced by string new (e.g., 'banana'.replace('a',
'e',2)
 returns 'benena').

rfind

s.rfind(sub,start=0,end=sys.maxsize)

Returns the highest index in s where substring sub is found, such that sub is entirely contained in s[start:end]. rfind returns -1 if sub is not found.

rindex

s.rindex(sub,start=0,end=sys.maxsize)

Like rfind, but raises ValueError if sub is not found.

rjust

s.rjust(n,fillchar=' ')

Returns a string of length max(len(s),n), with a copy of s at the end, preceded by zero or more leading copies of character fillchar.

rstrip

s.rstrip(x=string.whitespace)

Returns a copy of s, removing trailing characters that are found in string x. For example, 'banana'.rstrip('ab') returns 'banan'.

split

s.split(sep=None,maxsplit=sys.maxsize)

Returns a list L of up to maxsplit+1 strings. Each item of L is a “word” from s, where string sep separates words. When s has more than maxsplit words, the last item of L is the substring of s that follows the first maxsplit words. When sep is None, any string of whitespace separates words (e.g., 'four score and seven years'.split(None,3) is ['four','score','and','seven years']).

Note the difference between splitting on None (any string of whitespace is a separator) and splitting on ' ' (each single space character, not other whitespace such as tabs and newlines, and not strings of spaces, is a separator). For example:

>>> x = 'a b' # two spaces between a and b
>>> x.split() # or, equivalently, x.split(None)
['a', 'b']
>>> x.split(' ')
['a', '', 'b']

In the first case, the two-spaces string in the middle is a single separator; in the second case, each single space is a separator, so that there is an empty string between the two spaces.

splitlines

s.splitlines(keepends=False)

Like s.split('\n'). When keepends is true, however, the trailing '\n' is included in each item of the resulting list.

startswith

s.startswith(prefix,start=0,end=sys.maxsize)

Returns True when s[start:end] starts with string prefix; otherwise, False. prefix can be a tuple of strings, in which case startswith returns True when s[start:end] starts with any one of them.

strip

s.strip(x=string.whitespace)

Returns a copy of s, removing both leading and trailing characters that are found in string x. For example, 'banana'.strip('ab') is 'nan'.

swapcase

s.swapcase()

Returns a copy of s with all uppercase letters converted to lowercase and vice versa.

title

s.title()

Returns a copy of s transformed to titlecase: a capital letter at the start of each contiguous sequence of letters, with all other letters (if any) lowercase.

translate

s.translate(table)

Returns a copy of s where characters found in table are translated or deleted. In v3 (and in v2, when s is an instance of unicode), table is a dict whose keys are Unicode ordinals; values are Unicode ordinals, Unicode strings, or None (to delete the character)—for example (coded to work both in v2 and v3, with the redundant-in-v3 u prefix on strings):

print(u'banana'.translate({ord('a'):None,ord('n'):u'ze'}))
# prints: 'bzeze'

In v2, when s is a string of bytes, its translate method is quite different.

import string
identity = string.maketrans('','')
print('some string'.translate(identity,'aeiou'))
# prints: sm strng

The Unicode or v3 equivalent of this would be:

no_vowels = dict.fromkeys(ord(x) for x in 'aeiou')
print(u'some string'.translate(no_vowels))
# prints: sm strng

Here are v2 examples of turning all vowels into a’s and also deleting s’s:

intoas = string.maketrans('eiou','aaaa')
print('some string'.translate(intoas))
# prints: sama strang
print('some string'.translate(intoas,'s'))
# prints: ama trang

The Unicode or v3 equivalent of this would be:

intoas = dict.fromkeys((ord(x) for x in 'eiou'), 'a')
print(u'some string'.translate(intoas))
# prints: sama strang
intoas_nos = dict(intoas, s='None')
print(u'some string'.translate(intoas_nos))
# prints: ama trang

upper

s.upper()

Returns a copy of s with all letters, if any, converted to uppercase.


String Formatting

Python 3.x has introduced a powerful new string formatting facility, which has also been backported into v2. Unicode strings provide a format method that you call with arguments to interpolate into the format string, in which values to be formatted are indicated by replacement fields enclosed within braces.

The formatting process is best understood as a sequence of operations, each of which is guided by its replacement field. First, each value to be formatted is selected; next, it is converted, if required; finally, it is formatted. For example, the below code:

print('First: {} second: {}'.format(1, 'two'))
print('a: {a}, 1st: {}, 2nd: {}, a again: {a}'.format(1, 'two', a=3))
print('a: {a} first:{0} second: {1} first: {0}'.format(1, 'two', a=3))

Will generate the below output.

'First: 1 second: two'
'a: 3, 1st: 1, 2nd: two, a again: 3'
'a: 3 first:1 second: two first: 1'



Archived Comments


Most Viewed Articles (in Python )

Latest Articles (in Python)

Comment on this tutorial