Jupyter Snippet P4M 04
Jupyter Snippet P4M 04
All of these python notebooks are available at https://gitlab.erc.monash.edu.au/andrease/Python4Maths.git
Strings
Strings have already been discussed in Chapter 02, but can also be treated as collections similar to lists and tuples. For example
S = 'The Taj Mahal is beautiful'
print([x for x in S if x.islower()]) # list of lower case charactes
words=S.split() # list of words
print("Words are:",words)
print("--".join(words)) # hyphenated
" ".join(w.capitalize() for w in words) # capitalise words
['h', 'e', 'a', 'j', 'a', 'h', 'a', 'l', 'i', 's', 'b', 'e', 'a', 'u', 't', 'i', 'f', 'u', 'l']
Words are: ['The', 'Taj', 'Mahal', 'is', 'beautiful']
The--Taj--Mahal--is--beautiful
'The Taj Mahal Is Beautiful'
String Indexing and Slicing are similar to Lists which was explained in detail earlier.
print(S[4])
print(S[4:])
T
Taj Mahal is beautiful
Dictionaries
Dictionaries are mappings between keys and items stored in the dictionaries. Alternatively one can think of dictionaries as sets in which something stored against every element of the set. They can be defined as follows:
To define a dictionary, equate a variable to { } or dict()
d = dict() # or equivalently d={}
print(type(d))
d['abc'] = 3
d[4] = "A string"
print(d)
<class 'dict'>
{'abc': 3, 4: 'A string'}
As can be guessed from the output above. Dictionaries can be defined by using the { key : value }
syntax. The following dictionary has three elements
d = { 1: 'One', 2 : 'Two', 100 : 'Hundred'}
len(d)
3
Now you are able to access ‘One’ by the index value set at 1
print(d[1])
One
There are a number of alternative ways for specifying a dictionary including as a list of (key,value)
tuples.
To illustrate this we will start with two lists and form a set of tuples from them using the zip() function
Two lists which are related can be merged to form a dictionary.
names = ['One', 'Two', 'Three', 'Four', 'Five']
numbers = [1, 2, 3, 4, 5]
[ (name,number) for name,number in zip(names,numbers)] # create (name,number) pairs
[('One', 1), ('Two', 2), ('Three', 3), ('Four', 4), ('Five', 5)]
Now we can create a dictionary that maps the name to the number as follows.
a1 = dict((name,number) for name,number in zip(names,numbers))
print(a1)
{'One': 1, 'Two': 2, 'Three': 3, 'Four': 4, 'Five': 5}
Note that the ordering for this dictionary is not based on the order in which elements are added but on its own ordering (based on hash index ordering). It is best never to assume an ordering when iterating over elements of a dictionary.
Note: Any value used as a key must be immutable. That means that tuples can be used as keys (because they can’t be changed) but lists are not allowed. As an aside for more advanced readers, arbitrary objects can be used as keys – but in this case the object reference (address) is used as a key, not the “value” of the object.
The use of tuples as keys is very common and allows for a (sparse) matrix type data structure:
matrix={ (0,1): 3.5, (2,17): 0.1}
matrix[2,2] = matrix[0,1] + matrix[2,17]
# matrix[2,2] is equivalent to matrix[ (2,2) ]
print(matrix)
{(0, 1): 3.5, (2, 17): 0.1, (2, 2): 3.6}
Dictionary can also be built using the loop style definition.
a2 = { name : len(name) for name in names}
print(a2)
{'One': 3, 'Two': 3, 'Three': 5, 'Four': 4, 'Five': 4}
Built-in Functions
The len()
function and in
operator have the obvious meaning:
print("a1 has",len(a1),"elements")
print("One is in a1",'One' in a1,"but not 2:", 2 in a1) # 'in' checks keys only
a1 has 5 elements
One is in a1 True but not 2: False
The clear( )
function is used to erase all elements.
a2.clear()
print(a2)
{}
The values( )
function returns a list with all the assigned values in the dictionary. (Acutally not quit a list, but something that we can iterate over just like a list to construct a list, tuple or any other collection):
[ v for v in a1.values() ]
[1, 2, 3, 4, 5]
keys( )
function returns all the index or the keys to which contains the values that it was assigned to.
{ k for k in a1.keys() }
{'Five', 'Four', 'One', 'Three', 'Two'}
items( )
is returns a list containing both the list but each element in the dictionary is inside a tuple. This is same as the result that was obtained when zip function was used - except that the ordering may be ‘shuffled’ by the dictionary.
", ".join( "%s = %d" % (name,val) for name,val in a1.items())
'One = 1, Two = 2, Three = 3, Four = 4, Five = 5'
The pop( )
function is used to get the remove that particular element and this removed element can be assigned to a new variable. But remember only the value is stored and not the key. Because the is just a index value.
val = a1.pop('Four')
print(a1)
print("Removed",val)
{'One': 1, 'Two': 2, 'Three': 3, 'Five': 5}
Removed 4
When to use Dictionaries vs Lists
The choice of whether to store data in a list or dictionary (or set) may seem a bit arbitrary at times. Here is a brief summary of some of the pros and cons of these:
- Finding elements in a set vs a list:
x in C
is valid whether the collectionC
is a list, set or dictonary. However computationally for large collections this is much slower with lists than sets or dictionaries. On the other hand if all items are indexed by an integer thanx[45672]
is much faster to look up if x is a list than if it is a dictionary. - If all your items are indexed by integers but with some indices unused you could use lists and assign some dummy value (e.g. “") whenever there is no corresponding item. For very sparse collections this could consume significant additional memory compared to a dictionary. On the other hand if most values are present, then storing the indices explicitly (as is done in a dictionary) could consume significant additional memory compared to the list representation.
import time
bigList = [i for i in range(0,100000)]
bigSet = set(bigList)
start = time.clock() # how long to find the last number out of 10,000 items?
99999 in bigList
print("List lookup time: %.6f ms" % (1000*(time.clock()-start)))
start = time.clock()
99999 in bigSet
print("Set lookup time: %.6f ms" % (1000*(time.clock()-start)))
List lookup time: 1.031000 ms
Set lookup time: 0.043000 ms