Strings

Strings are a simple, yet expressive data type. They are just a sequence of text characters. It's a very natural data type for humans to work with (arguably more natural than using numbers). The example below shows a variety of ways you can construct strings.

Strings can be constructed, literally using:

  • Matching double-quote (") [short string]
  • Matching single-quote (') [short string]
  • Matching triple-quote (""") [long string, preserves lines]
  • Matching triple-quote (''') [long string, preserves lines]

Basic Actions

Having a series of characters stored as available data for your program is great and all, but you may want to access and modify it. Well, there are a few import properties about strings:

  1. They are immutable; strings cannot be modified. Strings can only be "modified" by way of creation of new strings created from old strings (with modification instructions applied).

  2. They are indexable; Do you want a character at a particular position? Just ask the string to provide that character using the syntax of "<string>[pos]" where <string> is either a literal string, or a string assigned to a variable, and pos is an integer specifying the position between 0 and n-1, where n is the length of the string <string>

  3. They are slicable; Do you want a substring of the whole string? You can follow a very similar syntax to indexing and provide both a start and stop value for selecting characters from the string. You can even specify an optional step value to select every n character from the string (like every second or third character in the string, for example.)

In [2]:
s1 = "Hello"

s2 = 'World'

s3 = """Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts."""

String Indexing

String indexing is a basic form of access of a string (aside from printing or eval()'ing it). It allows you to access a single character from a string from a position starting from 0 to len(s)-1 where len(s) is the length of a string s and len() is the Python "get length" function.

Python uses "zero-based indexing", so be clear that any counting of positions starts at zero!

Below, we see indexing of the string greet at various positions. Notice the index positions are integers and will fail if you give a string an index of any other type.

In [3]:
greet = "Hello Bob"
In [4]:
greet[0]
Out[4]:
'H'
In [5]:
print(greet[0], greet[2], greet[4])
H l o

Expressions as Indexes

Sometimes access to a position within a string is dynamic, perhaps based on user input or some other calculation. You can give an index as an expression! Again, this is only valid if the expression evaluates into an integer result.

In [6]:
x = 8
print(greet[x - 2])
B

Negative Indexes

A common idea is to reference a string from its end. As shown below, you would normally index a string by taking its length and substracting 1 (to adjust for zero-based indexing. For example, we can reference the last character of greet using this expression:

greet[len(greet)-1]

The makers of Python said "this is silly! We do this so often in programming, let's give it a simpler form. So, they did. Instead of providing the length of the string to a string which already knows it's own length, we simply provide a negative number to access the end!

greet[-1]
In [7]:
greet[-1]
Out[7]:
'b'
In [8]:
greet[-3]
Out[8]:
'B'
In [9]:
len(greet)
Out[9]:
9
In [10]:
greet[len(greet)-1]
Out[10]:
'b'
In [11]:
greet[len(greet)-3]
Out[11]:
'B'
In [12]:
greet[len(greet)-2]
Out[12]:
'o'
In [13]:
greet[-2]
Out[13]:
'o'

String Slicing

Slicing is a powerful extension of indexing. Now, you can pull out a subset (or, a selection of) the content of a string!

Similar rules apply, plus others:

  • Slicing consists of a start, stop and optional step parameters.
  • Slicing parameters have to be integers; expressions are allowed so long as they result into integers.
  • The colon ":" character separates the start, stop parameters and a second colon separates the optional step parameter.
  • If you leave off the value for start (left of the colon), it means "start slicing at zero".
  • If you leave off the value for stop (right of colon), it means "stop slicing at the string's end, including the last character.
  • If you leave off the start and stop parameters, then the slicing is implicitly from the start and end of the string, all the characters. It effectively "copies" the string (but not really a copy,it just returns the same string, use id() to see this behavior).

You can also slice from the reverse of a string! That is, do you want to slice out a string from the end towards the beginning? This means your start parameter is a smaller negative number (indicating you are choosing a position starting closer to the string's end) and your stop parameter is larger negative number (inidicating your are stopping towards the beginning of the string.

OK, perhaps an example!

This slices "Bo" from the greet string:

greet[-3:-1]

Which is the same as:

greet[6:-1]

Weird Edge Case

But what if you wanted "Bob" from the greet string of "Hello Bob"?

Well, how about:

greet[6:]

or, how about:

greet[6:None]

Hmm, what the heck is None? It's a special value to indicate "no value". Trust me, it's necessary. You might be asking yourself, why can't we just use -1 instead of None? Ah! Well, that's when because slicing, like range(), the value at stop is not included in the returned slice. So, since -1 refers to the last character of the string, you have no obvious way to refer to the string beyond -1! Clearly, -0 is just the same as zero and that refers to the start of the string. So, Python said that leaving off the stop value, or using None is the way to go here.

Why would this be useful? Well, if you are using expressions to determine a slice, you'll need some way to specify the start, stop values:

begin = 6
end = None
greet[begin:end]
In [14]:
greet[0:3]
Out[14]:
'Hel'
In [15]:
greet[5:9]
Out[15]:
' Bob'
In [46]:
greet[6:-1]
Out[46]:
'Bo'
In [17]:
greet[:5]
Out[17]:
'Hello'
In [48]:
greet[5:]
Out[48]:
' Bob'
In [50]:
greet[6:None]
Out[50]:
'Bob'
In [19]:
greet[:]
Out[19]:
'Hello Bob'
In [20]:
x = 4
greet[x:]
Out[20]:
'o Bob'
In [21]:
greet[x:x+4]
Out[21]:
'o Bo'
In [22]:
greet[x-2:x+4]
Out[22]:
'llo Bo'
In [23]:
greet[int(x/2.0):x]
Out[23]:
'll'
In [24]:
greet[0:-1:2]
Out[24]:
'HloB'
In [25]:
greet[0:]
Out[25]:
'Hello Bob'
In [26]:
greet[0:-1]
Out[26]:
'Hello Bo'
In [27]:
greet[0:None]
Out[27]:
'Hello Bob'

String "Artihmetic" Operators

Strings support 'arithmetic', or rather reasonable operations that one might want to support with strings.

  1. Addition (concatenation). For example, what if you wanted to add strings together to make a new string from those individual strings, e.g.

    part1 = "Hello"
    part2 = "World"
    greet = part1 + " " + part2
    
  2. Multiplication. What if you want a string with 30 letter 'A's in it? It's as simple as:

    s = "A" * 30
    
In [28]:
s1 = "Hello"
s2 = "Spam"
In [29]:
s1 + s2
Out[29]:
'HelloSpam'
In [30]:
s3 = s1 + " " + s2
s3
Out[30]:
'Hello Spam'
In [31]:
s1
Out[31]:
'Hello'
In [32]:
s2
Out[32]:
'Spam'
In [33]:
s1 + 2
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-33-550931b18b2a> in <module>()
----> 1 s1 + 2

TypeError: must be str, not int
In [ ]:
s1 - 2
In [ ]:
s1 * 3
In [ ]:
s2 * 4
In [ ]:
s3
In [ ]:
print(s1, s2)
print("-"*5, "-"*4)
In [ ]:
print(s1, s2)
print("-"*len(s1), "-"*len(s2))

Strings in for Loops

Strings are sequences (ordered collections of characters) and they are perfectly natural objects to be used for for loops, since for loops are known as "foreach loop", meaning they loop over each item in a collection of values.

So, consider the following loops:

In [ ]:
for ch in "Spam!":
    print(ch)
In [ ]:
for ch in s1:
    print(ch)
In [ ]:
for ch in s1 + s2:
    print(ch)