How to Not Do What You Should Not Do: Errors, Exceptions, and Simple Debugging

Topics:

  • Common Errors
  • Error Messages from Python
  • Exception Handling (try, except)
  • Print Debugging

Introduction


Let's be honest, most of us aren't perfect. And in our zen self-awareness, we are well-served to be sensitive to our predispositions to err in particular ways. We are probably even better served to listen to the computer when it tells us we've made a big mistake. Python can be quite good at communicating our flaws to us, if only we're receptive to the constructive criticism in its spartanly high-contrast black and white print.

This morning we're going to look at common sources of error, see what to look for as feedback from Python, and learn a couple of tricks to both obviate, or if necessary, track down bugs, should they occur.

A few commonly encountered sources of errors include:

  • Mis-typed variable names

  • Mis-typed variables

  • Non-existent variables (or indices in a variable)

  • Unexpected input


Some of these tend to cause python to give up right away, while others will cause insidious bugs that sneak through unnoticed until you present your work in lab meeting and someone calls you on your exciting, but seemingly impossible (and ultimately bogus) result.

Strategic Initiative 1: Be Modest; Be Prepared.


Every time you introduce a new piece of logic to your code, test it. There are two main ways to accomplish this:

1) Each time you finish a block of code, whether it's a short function, or a few lines modifying a control loop, make sure any variables you changed or created in that time possess the values you expect them to possess.

2) Take a firm hand, and start explicitly setting variables to the value you want them to have (often [ ], ''" or 0) just before you use them for the first time.

You will save yourself a ton of time and effort if you ensure your code is doing exactly what you think it is all the way through the program. It is more difficult to debug 100 lines of code than it is to debug 10. Writing a ton of code that generates output without checking if each component works individually does NOT make you a coding rock star; it makes you sloppy.

Strategic Initiative 2: Be Verbose.


The key is to get used to reporting the values that your variables have at the point things start going wrong. Often, you won't know where this is without doing a little debugging first, so you should approach debugging with a divide-and-conquer attitude. In other words, add lots of print statements in lots of places to narrow down where things are going wrong and then add additional print statements to figure out what exactly the problem is. It's not a bad practice to include print statements before you need them. If you expect that there will be a subset of data that will fail a logical test, setup your if and else statements to report the incidence of failure.

# students = {name : [status]}
# pretend this dictionary is already filled with data
 
for name in students:
     if students[name] == 'Post-doc':
          get_student_a_job(name)
          print "Time for this post-doc to get a job..."
     elif students[name] == 'Grad Student':
          print "Be patient, this one's just a grad student."
     else:
          print "Warning:", name, "is not a post-doc nor a grad student!"
 

Strategic Initiative 3: Be Boring.


As you get better at coding, you will start to take shortcuts and combine lines. As soon as something doesn't behave as you expect, you should decompose your compound statements, as this is a common source of error.

Compound:
helix_aa.extend(range(int(line[21:25].strip()),int(line[33:37].strip())+1))

Expanded:
start_str = line[21:25]
end_str = line[33:37]
start_stripped = start_str.strip()
end_stripped = end_str.strip()
start = int(start_stripped)
end = int(end_stripped)+1
helix_res = range(start,end)
helix_aa.extend(helix_res)

These two code snippets do the same thing, and python doesn't care which you use, but if you start with the former and something doesn't work, you should immediately switch to the latter, in order to print the values of each intermediate step--this is the easiest way to figure out which step is misbehaving.

Error Messages from Python: Syntactical Mistakes


Python is an honest, loving communicator. If you're into that, then you'll be happy. If not, you're gonna need a little couples' counseling if you want to make it. But be forewarned, while Python will catch your syntactical errors, it is not sentient. Errors in logic and failures in implementation will go unnoticed as long as they follow the rules of the language.

#!/usr/bin/python
 
while True print 'Bioinformatics is AWESOME!'

$ ./errors.py
File "./errors.py", line 3
while True print 'Bioinformatics is AWESOME!'
^
SyntaxError: invalid syntax

This error message, for its brevity, is quite helpful. We quickly learn the following:

1) The problem is on line 3 of the file errors.py. This may seem trivial in this case, but when you have nested functions conjured from various modules in various directories, knowing which file to look in is a big, big help.

2) There is a problem between "True" and "print"

What is the problem and how is it fixable?

Other varieties of errors will tell us if we've made a mistake using our variables. You have already been exposed to a few of these error messages in our lecture on data structures.
TAs = ['Rose', 'Phil', 'Peter', 'Andrew']
print TAs[10]

$ ./errors.py
Traceback (most recent call last):
File "./errors.py", line 5, in <module>
print TAs[10]
IndexError: list index out of range

So, let's parse this honest communique from our dear friend Python:

1) The problem is on line 5 of the file errors.py.
2) The problem is with the expression print TAs[10]
3) The variety of error is an IndexError.

We may be able to intuit the nature of the error now: the index of TAs does not exists (the list isn't that long).

Python finds the ability to identify errors to be a useful tool. Therefore, these are not called errors but exceptions. There are almost 50 different defined (built-in) exceptions in the current release of Python. Good news is that part of the exception message includes a short definition of the error, so you don't need to remember them all! And if we were not able to deduce the nature of the error from this short description, it would be a good time to check out the list of Built-in Exceptions in the Python Documentation.

Alright. It's cool that python can identify when and (in most cases) where something goes wrong, and can then give us some useful information on how to fix things. However, it's kind of lame that the program always quits after an exception is registered. What if I want it to keep going?

Planning for the Worst: Try and Except

Try and except statements ask python first give something a go. If there is no error, all is well and continue the code. However, if an error occurs, do something reasonable.

For example, when we open files, we might want to check if they are actually there first.
fh = open('nonexistentFile.txt', 'r')

$ ./errors.py
Traceback (most recent call last):
File "./errors.py", line 9, in <module>
fh = open('nonexistentFile.txt', 'r')
IOError: [Errno 2] No such file or directory: 'nonexistentFile.txt'

We can use try and except here to do something more informative.
try:
    f1 = open('nonexistentFile.txt', 'r')
except IOError:
    print 'nonexistentFile.txt is not in existence.',



$ ./errors.py
nonexistentFile.txt is not in existence.

You can account for multiple exceptions in the same statement in two ways:
except (RuntimeError, TypeError, NameError):
    print 'Any one of three things could have gone wrong here.'
 

Finally

The finally clause runs 'on the way out'. If no error occurs, Python runs the try statement, then the finally statement, and then moves on. If an exception occurs, Python runs the finally block and then throws the exception.
def divide(x,y):
    try:
        result = x/y
    except ZeroDivisionError:
        print "divide by zero!"
    else:
        print "result is", result
    finally:
        print "executing final clause"
 
divide(2,1)
print
divide(2,0)
print
divide('A','B')

$ ./errors.py
result is 2
executing final clause

divide by zero!
executing final clause

executing final clause
Traceback (most recent call last):
File "./errors.py", line 31, in <module>
divide('A','B')
File "./errors.py", line 19, in divide
result = x/y
TypeError: unsupported operand type(s) for /: 'str' and 'str'

In the first case, we make it through the statement just fine.

In the second case, we throw an exception, deal with it gracefully, and move along our merry way.

In the third case, we have found an exception that we are not handling correctly. Here, we get to the finally clause and then print the exception message.
Any questions?

Exercises


1) Exception handling we have known.

In lesson 2.1, we learned about two functions that can be applied to remove an item from set: remove() and discard(). We looked at the functions using the following example:
#!/usr/bin/env python
 
list_of_letters = ['a', 'a', 'b', 'c','c','c','d','e']
print 'ORIGINAL'
set_of_letters = set(list_of_letters)
print set_of_letters
 
print 'DISCARD'
set_of_letters.discard('q')
print set_of_letters
 
print 'REMOVE'
set_of_letters.remove('q')
print set_of_letters
 
a) Now that we've learned more about exception handling, explain what is happening here.

b) Create a script that contains a list of 5 of your favorite beers or wines or soft drinks (depending on your preference) for a tasting party stored as a set. Each time someone drinks a beverage, it is removed from your fridge and cannot be drunk again. Matt drinks a beverage. Adjust the set as appropriate. Rich sees Matt drinking his beverage and wants the same one. Tell him you're out of that beverage.

2) Modify the FASTA parser from Session 4.2 to handle:

1) non-existent filenames
2) files that do not conform to the FASTA format (i.e. >gene for IDs, and strings of A,T,G, or C for sequence).
3) sequences that are in both cases

3) Modify the FASTA parser to identify file compression format and read compressed files.

We need to add functionality to our FASTA parser for use next week. We will be using very large FASTA files (> 3GB) when they are uncompressed. In order to save disk space, we'll be leaving them compressed (~1 GB) while we use them. However, this will require us to use the gzip module to read a compressed file directly. We will also need the mimetypes module, which can identify the type of file we are using.

First, write a new function called open_file_by_mimetype that determines whether a file is gzipped or not (using the mimetypes modules) and returns an open file handle. Then make a function called read_fasta() using your FASTA parser script, but modify it to call this new function such that your parser can read both types of FASTA files automatically.

When you're done, store your new file-opening function in a module called file_tools.py and your improved read_fasta() parser in a module called seqeunce_tools.py.

4) Improving the teacher's code.

Go back and look at the code you used to do Exercise 3.2.3 Now that you have learned about file and string processing, you should be able to understand the wrapper script that was supplied to help you parse the pdb files.

a) The code is very poorly commented. Figure out what is going on at each step and add comments.

b) There are two locations where exception handling is applied. Why is the exception handling necessary? The implementation is very sloppy. Can you rewrite exception handling? Can you rewrite the code again to avoid exception handling all together? (HINT: PDB files are formatted according to characters not whitespace. Go back to the documentation cited in exercise 3.2-2 for additional information on PDB formatting).

5) All code is bug-free until your first user.

You have another coworker who heard about your AMAZING secondary structure analysis code. She asks if you will analyze her protein, interleukin-19, as well (HINT: use PDB code 1N1F). Crud! This protein breaks your code. Why? Rewrite your code to work on both interleukin-19 and on the original H1N1 neuraminidase example.

Solutions


1) Exception handling we have known.

#!/usr/bin/env python
 
# a) We converted the original list_of_letters to a set
# called set_of_letters, which got rid of all duplicates
# in the set.  Then we discarded 'q'.  Discard can handle
# values that don't exist in sets, so discard does not
# return an error because there's no q, but remove can't
# handle values that don't exist in sets, so it returns a
# KeyError.

#!/usr/bin/env python
 
print
listOfBeers = ['Bell\'s Oberon Summer Wheat', 'Dogfishhead 90 Minute IPA', 'Franziskaner', 'Lindeman\'s Raspberry Lambic', 'Mad Hatter IPA']
print 'The party is all stocked with my favorite beers:'
setOfBeers = set(listOfBeers)
for beer in setOfBeers:
    print '\t', beer
print
 
def BxxrHour(x):
    print 'Craving some %s.  Goes to get one.  ' % (x)
    try:
        setOfBeers.remove(x)
    except:
        print "\"Hey! Someone totally pwned my %s!!\"" % (x)
        print 'Beverage FAIL!!!'
    else:
        print 'Epic beverage WINS!'
    finally:
        print "The remaining beverages are:"
        for beer in setOfBeers:
            print '\t', beer
        print
        return setOfBeers
        print
 
print 'Matt wants a beverage:'
setOfBeers = BxxrHour('Bell\'s Oberon Summer Wheat')
 
print 'Rich wants a beverage:'
BxxrHour('Bell\'s Oberon Summer Wheat')

2) Modify the FASTA parser from Session 4.2 to handle:

callparser.py
#!/usr/bin/env python
from parser import fastaParser
 
parsed = fastaParser('seq.FASTA')
print parsed

fastaParser.py
#!/usr/bin/env python
 
def fastaParser(fastafilename):
    current_gene = ""   # Start with an empty string, just in case
    genes = { }         # Make an empty dictionary of genes
    try:
        fh = open(fastafilename, 'r')
    except IOError:
        print 'Could not find file with filename %s' % (fastafilename)
        result = 'Please verify that your filename is correct and try again.'
        return result
    for lineInd, line in enumerate(fh.readlines()):
        if lineInd == 0:
            if not line[0] == '>':
                print 'File does not conform to FASTA format.'
                result = 'Please try again with FASTA formatted file.'
                fh.close( )
                return result
            else:
                pass
        else:
            pass
        line = line.strip()  # Clear out leading/trailing whitespace
        line = line.upper()  # Deals with whatever case the
                             # sequence is by making it all upper case
        if len(line) > 0 and line[0] == ">":   # This one is a new gene
            current_gene = line[1:]
            genes[current_gene] = ""
        else:                # Add onto the current gene
            genes[current_gene] += line
    fh.close()
    return genes

3) Modify the FASTA parser to identify file compression format and read compressed files.

callparser2.py
#!/usr/bin/env python
 
from sequencetools import *
import sys
 
fastafilename = sys.argv[1]
 
x = read_fasta(fastafilename)
print x

sequencetools.py
#!/usr/bin/env python
 
def read_fasta(fastaFilename):
    from filetools import open_file_by_mimetype as ofbm
    current_gene = ""   # Start with an empty string, just in case
    geneDict = { }      # Make an empty dictionary of genes
    fh = ofbm(fastaFilename)
    for lineInd, line in enumerate(fh.readlines()):
        if lineInd == 0:
            if not line[0] == '>':
                print 'File does not conform to FASTA format.'
                result = 'Please try again with a FASTA formatted file.'
                fh.close( )
                return result
            else:
                pass
        else:
            pass
        line = line.strip()  # Clear out leading/trailing whitespace
        if lineInd % 10000 == 0:
            print "At line %s." % (lineInd)
        else:
            pass
        line = line.upper() # Deals with whatever case the
                            # sequence is by making it all upper case
        if len(line) > 0 and line[0] == ">":   # This one is a new gene
            current_gene = line[1:]
            geneDict[current_gene] = ""
        else:                # Add onto the current gene
            geneDict[current_gene] += line
    fh.close()
    return geneDict

filetools.py
#!/usr/bin/env python
 
def open_file_by_mimetype(filename):
    import mimetypes
    if 'gzip' in mimetypes.guess_type(filename):
        try:
            import gzip
            fh = gzip.open(filename, 'r')
            print 'File is a gzipped file.'
            print
            return fh
        except IOError:
            print 'Could not find file with filename %s' % (filename)
            result = 'Please verify that your filename is correct and try again.'
            return result
    else:
        try:
            fh = open(filename, 'r')
            print 'File is not a gzipped file.'
            return fh
        except IOError:
            print 'Could not find file with filename %s' % (filename)
            result = 'Please verify that your filename is correct and try again.'
            return result

4) Improving the teacher's code.

#!/usr/bin/env python
 

5) All code is bug-free until your first user.

#!/usr/bin/env python