Loops and Escapes


Topics:

  • For loops
  • Using range() in a loop
  • Nested for loops for complex data structures
  • Indexing loops with enumerate()
  • Loops with zip()
  • While loops
  • Loop control and escapes with continue and break
  • List comprehension

Introduction:


In this lesson, we will write our first scripts. First we must learn how to interact with the Python language in a fundamental way, and learn about different classes of data and how to manipulate them.

For Loops

Up to this point, we have learned how to organize data. In this class, we'll learn how to DO stuff with it. In all programming languages, there are statements that allow you to do the same thing over and over, in a loop, with a small amount of code. The loop continues until the statement at the start of the loop is no longer true. Python offers two major flavors of loop, the for loop and the while loop.

We have already met the for loop in disguise. In a previous lesson, we called it the iterator operator for lists.
#!/usr/bin/python
 
li = ['Why', 'do', 'superheroes', 'wear', 'tights?']
 
#iterate over list
for x in li: print x

$ ./loops.py
Why
do
superheroes
where
tights?


Let's look at this a bit more closely and figure out what's happening.

Used in this manner, for goes through each item in the list that follows "in" and successively assigns the value it finds to the variable before "in."

...But, what if we want to do something even more interesting in the loop? How do we add multiple commands?

Formatting loops

#iterate over list
for x in li:
    #check whether item equals 'do'
    if x == 'do':
        print 'Found:', x
    else:
        print x

$ ./loops.py
Why
Found: do
superheroes
where
tights?


Just like in conditional statements from this morning, the body of the loop is indented. Reversion to the original indentation indicates the end of the loop. NOTE: Even though we have expanded the loop to multiple lines, the colon is still after the first line!

Looping over lists a given number of times

for x in range(4):
    print 'hello'
y = range(4)
print y

$ ./loops.py
hello
hello
hello
hello
[0, 1, 2, 3]


We can also loop over fancy, nested data structures using fancy, nested loops:

Looping over fancy data structures

liLi = [[1,2,3],[7,8,9]]
for x in liLi:
    print x
    for y in x:
        print y
    print
 

$ ./loops.py
[1, 2, 3]
1
2
3

[7, 8, 9]
7
8
9


What is happening here?

Mutating lists with loops

When we discussed loops, we said that they were mutable. We demonstrated that by using the sort() operator. Now, we would like to change each item in the list.
li2 = [1,22,48,36,101]
print li2
for x in li2:
    x = x + 23
print li2

$ ./loops.py
[1, 22, 48, 36, 101]
[1, 22, 48, 36, 101]


What the heck??? Why didn't it change?

The reason the list items did not change is because we were adding 32 to their values, but not to the position in the list itself. In order to change the value of the position in the list itself, we have to access the list, not just the item, in the loop.

li2 = [1,22,48,36,101]
print li2
for x in range( len(li2) ):
    li2[x] = li2[x] + 23
print li2

Here we've used the range() function to make a list of indexes as long as the list, li2. This is admittedly clunky, and Python has a built-in alternative, enumerate().

enumerate() returns a tuple with the index of the item as well as the item itself. You can use the index to access the actual item in the list, not just the copy. However, you can access the copy, too, if you want.
print li2
for xInd, x in enumerate(li2):
    print x
    li2[xInd] = li2[xInd] + 23
print li2

$ ./loops.py
[1, 22, 48, 36, 101]
1
22
48
36
101
[24, 45, 71, 59, 124]


Looping over multiple lists at the same time

You also may have the need to loop over multiple lists at the same time. You can do this using the zip command, which returns a tuple containing the item in each list.
for x,y,z in zip(li,li2,li):
    print x,y,z

$ ./loops.py
Why 24 Why
do 45 do
superheroes 71 superheroes
wear 59 wear
tights? 124 tights?


What happens when one list is of a different length?
li3 = [3]
for x,y in zip(li,li3):
    print x,y

$ ./loops.py
Why 3

While Loops

The while loop is a more generic form of the for loop. It continues until the statement in the first line is no longer true.
#!/usr/bin/python
 
x = 'Ni!'
while x:
    print x
    x = x[1:]
print
 
x = 5
while x > 0:
    print x
    x = x - 1
print

$ ./loops.py
Ni!
i!
!

5
4
3
2
1


Notice that the format of the while loop is the same as the for loop (i.e. truth statement ending in a : and indented body). Also, notice that you have to EXPLICITLY modify the variable within the loop that is being checked in the the truth statement.

What happens if you don't change that variable?
#!/usr/bin/python
 
x = 5
while x > 0:
    print 'Hit Ctrl+C to quit'

$ ./loops.py
Hit Ctrl+c to quit
Hit Ctrl+c to quit
Hit Ctrl+c to quit
Hit Ctrl+c to quit
Hit Ctrl+c to quit
Hit Ctrl+c to quit
Hit Ctrl+c to quit
Hit Ctrl+c to quit
...


Escaping Loops

Occasionally, you might want to get out of a loop before the truth statement is met. In fact, some loops are designed such that the control condition at the top of the loop is never met! You can do this using break, continue, or pass.

break: jumps out of the closest loop
continue: jumps to the top of the closest loop
pass: empty placeholder
#!/usr/bin/python
 
x = 10
while x:
    x = x-1
    #use mod to check if number
    #is even?  go to next number
    if x % 2 == 0:
        continue
    #use comma to print multiple
    #things on same line
    else:
        print x,

$ ./loops.py
9 7 5 3 1

In this example, each number is checked by the if statement to see if it is odd. If the number is even, the loop goes back to the while logical expression (i.e. that x != 0) and continues.

#!/usr/bin/python
 
y = 10
x = y-1
while x > 1:
    if y % x == 0:
        print x, 'is a factor of', y
        break
    x = x-1
else:
    print y, 'is prime'

$ ./loops.py
5 is a factor of 10

Here, y is checked for each number to see if it is evenly divisible (also called a factor). If y is evenly divisible, the number is printed and the code stops. If no factor is found, the loop defers to the else statement. What happens when you change the y to 100? 15? 3?


pass is a statement that simply has no effect. Some things (like if and for) require a statement to follow them. Occasionally, when you're writing your code, you may know that you're going to want to do something if something is True or False, but you don't know what exactly you want to do. pass is a reminder that you need to put something there, while leaving the code still valid and executable.

if national_debt > 20000000000:
    pass
    # Let someone else figure out what to do here


Note that else, break, continue, and pass can be used in the context of ANY loop, not just these examples.

Any questions?

List Comprehension

Sometimes we'll want to change every item in a list in a systematic way. For example, we may want to add one to each of a list of integers:

#!/usr/bin/python
 
a = range(10)
b = []
for i in a:
    b.append(i + 1)
a = b
print a



This is a totally acceptable way to solve this problem. However, because it is very common to commit the same operation on every member of a list, Python provides for us a shortcut that is both syntactically more concise, and computationally optimized to be much, much faster. It's called list comprehension, and it works like this:

#!/usr/bin/python
 
# let's recreate our list
a = range(10)
# note that the entire statement below is contained in **[ ]**
# This is the defining syntax of a **list comprehension**.
print [ i + 1 for i in a ]
 
# note also that we have to save the resulting new list back
# to **a** if we want to replace the old list
a = [ i + 1 for i in a ]
print a
Okay, we'll only get a little bit fancier here, exploring the idea of a nested list comprehension instead of a nested for loop. If you wanted to make a list of lists, with 10 lists of integers ranging from 0 to 99, you might have to type it out like this:

# terrible way to do this
b = [ range(100), range(100), range(100), etc, etc, etc...]
 
Or you could use a nested for loop:
b = []
 
for i in range(10):
    b.append(range(100))
Not too bad, but we can use list comprehensions in a nested format as well, and make this even cleaner and faster:
b = [ range(100) for i in range(10) ]

Exercises


1) Getting comfortable with loops (adapted from Learning Python)

a) Computers only really understand numbers, not letters. Therefore, letters and symbols are stored in the computer according to numbers. The American Standard Code for Information Interchange (ASCII) is a formalized code that links characters and numbers. You determine the ASCII number code of a letter using the built-in function ord(character).
For example,
>>> print ord('T')
84
Write a for loop that prints the ASCII code of each character in the string 'Rock and Roll'. HINTS: You can loop over a string the same way you loop over a list.
b) Next, change your loop to compute the sum of the ASCII codes of all characters in the string.
c) Modify your code to print a new list that contains the ASCII codes of the characters in the string.

2) All Roads Lead to Rome (adapted from Learning Python)

A coworker (who obviously is not a native Python speaker) hands you the following code:
#!/usr/bin/python
 
L = [1,2,4,8,16,32,64]
x = 5
 
found = i = 0
while not found and i < len(L):
    #check if 2 to the power
    #of x is in the list
    if 2 ** x == L[i]:
        found = 1
    else:
        i = i+1
if found:
    print 'at index', i
else:
    print x, 'not found'

$ ./powers.py
at index 5

As is, the script does not follow normal Python coding techniques. Follow the steps below to improve it.
a) Rewrite this code with a while/else loop to eliminate the found flag and the final if statement.
b) Rewrite the example to use a for/else loop to eliminate the explicit list indexing logic.
c) Remove the loop completely by rewriting the examples using an expression with in (HINT: try the line 'print 2 in [1,2,3]')
d) Use a for loop and the list append method to generate the list L instead of typing it by hand.

3) List comprehension Practice

a) Using a for loop, convert a list of 10,000,000 integers to floating point values. Pay attention to how long it takes by counting the seconds in your head.

b) Now, code the same conversion with a
list comprehension. Again, count the time in your head.

c) Next, create a smaller list of 100 integers, and use
list comprehension to double the value at each position in the list.



===4) Zippity Dictionaries===

This morning, some of you approached Exercise #6 (our FASTA parser) with a strategy that first constructed two lists with matching indices, and then wanted to combine them into a dictionary. You likely found this (along with the rest of the exercise) to contain a tedious task of copying each list index into its position as a key or value in the dictionary. You can also use zip() to construct a dictionary from two lists.


# list of 0 to 10
a = range(10)
 
# list of 10 to 20
b = range(20)[10:]

Now use these lists, with a for loop, and zip() to create a dictionary keyed by items from a, with values from b


5) Doing something interesting
A friend of yours in another lab is starting a new project on neuraminidase from the H1N1 flu virus (swine flu). She explains to you that, once a new virus is formed, neuraminidase clips off polysaccharide chains on the surface of the infected cell, ensuring that the virus doesn't get stuck as it is leaving. At the moment, she is interested in comparing her new structure of neuraminidase to the existing structure.

Write a script that will tell her what percent of the structure is helical, beta sheet, or some other structure.

Here are some things to help you out:
A) Download the structure of H1N1 neuraminidase (PDB code 3b7e) as an example structure
B) I am supplying you with a script that will open the pdb file, parse out the information about the sequence and secondary structure, and save three lists called full_seq, helix_aa and sheet_aa. full_seq is a list containing the full sequence of the protein (NOTE: The protein crystallized as a homodimer). helix_aa and sheet_aa are lists of secondary structure descriptions, which have the following formats when converted into lists (paraphrased from PDB file format documenation)

helix_aa
0. Record name 'HELIX'
1. Serial number of helix
2. Helix identifier
3. Name of initial residue
4. Chain identifier
5. Sequence number of initial residue
6. Name of terminal residue
7. Chain identifier
8. Sequence number of terminal residue

sheet_aa
0. Record name 'SHEET'
1. Strand number
2. Sheet identifier
3. Number of strands in sheet
4. Residue name of initial residue
5. Chain identifier of initial residue
6. Sequence number of initial residue
7. Residue name of terminal residue
8. Chain identifier of terminal residue
9. Sequence number of terminal residue
#!/usr/bin/python
##SCRIPT TO PARSE OUT SECONDARY STRUCTURE INFORMATION
 
import sys, os
 
full_seq = []
helix_aa = []
sheet_aa = []
 
f1 = open('3B7E.pdb' ,'r')
for next in f1:
    tmp = next.strip().split()
    if tmp[0] == 'SEQRES':
        if tmp[2] == 'A':
            full_seq.extend(tmp[4:])
    elif tmp[0] == 'HELIX':
        try:
            int(tmp[5])
        except:
            tmp[5] = tmp[5][:-1]
        helix_aa.append(tmp[:9])
    elif tmp[0] == 'SHEET':
        sheet_aa.append(tmp[:10])
 
ANSWER:
Total number of residues = 385
Percent helical = 3.8961038961
Percent B sheet = 45.1948051948
Percent other = 50.9090909091

BONUS: What is the average b-factor (measure of the amount of vibrational motion each atom is undergoing) for each region?

To help you, here is a modification of the script above that collects the information about each atom and stores them in the list atoms. Each atom has the following format when converted into list (paraphrased from PDB file format documenation)

atoms
0. Record name ATOM
1. Atom sequence number
2. Atom name
3. Residue name
4. Chain identifier
5. Residue sequence number
6-8. X, Y, Z coordinates
9. Occupancy in structure (1.0 = 100% occupied, 0.5 = 50% occupied)
10. B-factor
11. Element
#!/usr/bin/python
##SCRIPT TO PARSE OUT SECONDARY STRUCTURE AND ATOM INFORMATION
 
full_seq = []
helix_aa = []
sheet_aa = []
atoms = []
f1 = open('3B7E.pdb' ,'r')
for next in f1:
    tmp = next.strip().split()
    if tmp[0] == 'SEQRES':
        if tmp[2] == 'A':
            full_seq.extend(tmp[4:])
    elif tmp[0] == 'HELIX':
        try:
            int(tmp[5])
        except:
            tmp[5] = tmp[5][:-1]
        helix_aa.append(tmp[:9])
    elif tmp[0] == 'SHEET':
        sheet_aa.append(tmp[:10])
    elif tmp[0] == 'ATOM':
        if len(tmp) < 12:
            begin = tmp[0:2]
            end = tmp[3:]
            middle = [tmp[2][:3], tmp[2][4:]]
            tmp = begin + middle + end
        try:
            int(tmp[5])
        except:
            continue
        atoms.append(tmp)
 
ANSWER:
Structure Chain A Chain B
Helix 10.7259090909 10.4519090909
B Sheet 9.80990572879 9.76974619289
Other 11.9676417704 11.8491953232

Solutions


1) Getting comfortable with loops (adapted from Learning Python)

#!/usr/bin/env python
 
rockString = 'Rock and Roll'
print rockString
print
 
# a)
print 'a)'
for x in rockString: print 'The ASCII value of %s is %s' % (x, ord(x))
print
 
# b)
print 'b)'
sum = 0
for x in rockString: sum = ord(x) + sum
print sum, '''is the sum of the ASCII values corresponding to each of the letters of the string 'Rock and Roll'.'''
print
 
# c)
print 'c)'
rockASCIIlist = [ ]
for xInd, x in enumerate(rockString):
    rockASCIIlist.append([ ])
    print rockASCIIlist
    rockASCIIlist[xInd] = ord(x)
    # print rockASCIIlist # uncomment this to see the list grow with each iteration
print rockASCIIlist
print

2) All Roads Lead to Rome (adapted from Learning Python)

#!/usr/bin/env python
 
# a)
print 'a)'
L = [1,2,4,8,16,32,64]
x = 5
 
# Less convoluted, but still indexing
i = 0
while i < len(L):
    if 2 ** x == L[i]:
        print 'at index', i
        break
    else:
        i = i+1
else:
    print x, 'not found'
print
 
# b)
print 'b)'
L = [1,2,4,8,16,32,64]
x = 5
for yInd, y in enumerate(L):
    if 2 ** x == y:
        print 'at index %s' % (yInd)
        break
print
 
# c)
print 'c)'
# Quickest way to see if something is in a list
print 2 in [1,2,3]
L = [1,2,4,8,16,32,64]
x = 5
print 2**x in L
print
 
# d)
 
L = []
for x in range(7):
    L.append(2 ** x)
print L
print
 

3) List comprehension Practice

#!/usr/bin/env python
 
# a)
print 'a)'
print
L = []
for x in range(10000):
    L.append(x+1)
    # print type(x) # uncomment to see that all the numbers are ints
for xInd, x in enumerate(L): # note that we have to employ the enumerate() method if we want to actually change the list!
    L[xInd] = float(x)
    # print type(x) # uncomment to see that all the numbers are now floats
print L
print
 
# b)
print 'b)'
print
L = range(10000)
L = [float(i) for i in L]
print L
print
print 'The list is comprised of %s.' % (type(L[1])) # uncomment to see that the numbers are floats
print
 
# c)
print 'c)'
print
L = range(100)
print L
L = [i*2 for i in L]
print L
 

4) Zippity Dictionaries

#!/usr/bin/env python
 
# list of 0 to 10
a = range(10)
print a
print
 
# list of 10 to 20
b = range(20)[10:]
print b
print
 
dict = {}
for x,y in zip(a,b):
    dict[x] = y
print dict
print
 

5) Doing something interesting

#!/usr/bin/env python
 
#### The code below was provided for you ####
import sys, os
 
full_seq = []
helix_aa = []
sheet_aa = []
 
f1 = open('3B7E.pdb' ,'r')
for next in f1:
    tmp = next.strip().split()
    if tmp[0] == 'SEQRES':
        if tmp[2] == 'A':
            full_seq.extend(tmp[4:])
    elif tmp[0] == 'HELIX':
        try:
            int(tmp[5])
        except:
            tmp[5] = tmp[5][:-1]
        helix_aa.append(tmp[:9])
    elif tmp[0] == 'SHEET':
        sheet_aa.append(tmp[:10])
#### The code above was provided for you ####
 
HelicalResidues = 0
for helix_inst in helix_aa:
    # verify that the helix is part of chain A, since this
    # particular protein crystallized as a dimer and we
    # only care about the number of helical residues per
    # polypeptide
    if helix_inst[4] == 'A' and helix_inst[7] == 'A':
        # Add to a running total of the number of helical
        # residues by adding to the total the difference
        # between the residue number of the last amino acid and the
        # residue number of the first amino acid
        HelicalResidues += (float(helix_inst[8]) - float(helix_inst[5]) + 1)
 
# This following code will do the same for beta sheet residues
# as we did above for helical residues
BetaSheetResidues = 0
for sheet_inst in sheet_aa:
    if sheet_inst[5] == 'A' and sheet_inst[8] == 'A':
        BetaSheetResidues += (float(sheet_inst[9]) - float(sheet_inst[6]) + 1)
 
seq_len = len(full_seq)
 
print "Total number of residues=%d" % seq_len
print "Percent helical=%f" % ((HelicalResidues/seq_len) * 100)
print "Percent B sheet=%f" % ((BetaSheetResidues/seq_len) * 100)
print "Percent other=%f" % (((seq_len - HelicalResidues - BetaSheetResidues)/seq_len) * 100)
 


import sys, os
 
full_seq = []
helix_aa = []
sheet_aa = []
atoms = []
f1 = open('3B7E.pdb' ,'r')
for next in f1:
    tmp = next.strip().split()
    if tmp[0] == 'SEQRES':
        if tmp[2] == 'A':
            full_seq.extend(tmp[4:])
    elif tmp[0] == 'HELIX':
        try:
            int(tmp[5])
        except:
            tmp[5] = tmp[5][:-1]
        helix_aa.append(tmp[:9])
    elif tmp[0] == 'SHEET':
        sheet_aa.append(tmp[:10])
    elif tmp[0] == 'ATOM':
        if len(tmp) < 12:
            begin = tmp[0:2]
            end = tmp[3:]
            middle = [tmp[2][:3], tmp[2][4:]]
            tmp = begin + middle + end
        try:
            int(tmp[5])
        except:
            continue
        atoms.append(tmp)
######################
# My code starts here
######################
num_helix_res = 0.0
print "There are %s residues in the sequence" % len(full_seq)
 
# Set up a listing of features by residue, then fill it in as we go along
feature = ['Other']*(10000)  
 
for aa in helix_aa:
    # We add 1 because there are b-a+1 residues between a and b, inclusive
    num_helix_res += float(aa[8]) - float(aa[5]) + 1
    for i in range(int(aa[5]), int(aa[8])+1):
        feature[i] = 'Helix'
 
num_sheet_res = 0.0
for sheet in sheet_aa:
    num_sheet_res += float(sheet[9]) - float(sheet[6]) + 1
    for i in range(int(sheet[6]), int(sheet[9])+1):
        feature[i] = 'Sheet'
 
 
 # atom[4] == residue #
 # atom[5] == chain id
 # atom[10] == b-factor
 
 
# Use list comprehensions to parse out the B-factors
helix_bfactors_a = [float(atom[10]) for atom in atoms 
                    if feature[int(atom[5])] == 'Helix' and atom[4] == "A"]
sheet_bfactors_a = [float(atom[10]) for atom in atoms 
                    if feature[int(atom[5])] == 'Sheet' and atom[4] == "A"]
other_bfactors_a =     [float(atom[10]) for atom in atoms 
                    if feature[int(atom[5])] == 'Other' and atom[4] == "A"]
 
helix_bfactors_b = [float(atom[10]) for atom in atoms 
                    if feature[int(atom[5])] == 'Helix' and atom[4] == "B"]
sheet_bfactors_b = [float(atom[10]) for atom in atoms 
                    if feature[int(atom[5])] == 'Sheet' and atom[4] == "B"]
other_bfactors_b =     [float(atom[10]) for atom in atoms 
                    if feature[int(atom[5])] == 'Other' and atom[4] == "B"]
 
 
# Divide num_helix_res and num_sheet_res by 2 because of homodimers
print "\tPercentage\tBFactorA\tBFactorB"
print "Helix\t", num_helix_res/2 / len(full_seq) * 100, "\t",\
    sum(helix_bfactors_a) / len(helix_bfactors_a),"\t",\
    sum(helix_bfactors_b) / len(helix_bfactors_b)
print "Sheet\t", num_sheet_res/2 / len(full_seq) * 100, "\t",\
    sum(sheet_bfactors_a) / len(sheet_bfactors_a),"\t",\
    sum(sheet_bfactors_b) / len(sheet_bfactors_b)
 
print "Other\t", (len(full_seq) - num_sheet_res / 2 - num_helix_res/2) \
                    / len(full_seq) * 100, "\t",\
        sum(other_bfactors_a) / len(other_bfactors_a),"\t",\
        sum(other_bfactors_b) / len(other_bfactors_b),