Olla Progradrida: A Few Other Things Before You Go


  • Using SQL databases from Python
  • BioPython
  • Interacting with the interwebs and online databases with BioPython
  • Using R from Python


In order to feel good about oneself, one should possess understanding. That is, we wanted you to be happy, and so we taught you how to think about things like reading files and translating sequence and so on. It's our fishing pole, from us to you. If you can write code, you can write whatever function you may need. It's self-reliance. Your fish are in Walden Pond. It's like Mr. Miyagi said, "Lesson not just karate only. Lesson for whole life. Whole life have a balance. Everything be better. Understand?" Of course you do. That made perfect sense.

But if you wanna be lazy, and have someone else catch your fish for you: there's BioPython.

Downloading and Installing BioPython

It comes from here: biopython-1.54.tar.gz

Check out the README file to learn how to install it (you're big kids now, and it's only three steps).

Sequence Examples (stolen from BioPython Wikipage)

from Bio.Seq import Seq
#create a sequence object
my_seq = Seq('CATGTAGACTAG')
#print out some details about it
print 'seq %s is %i bases long' % (my_seq, len(my_seq))
print 'reverse complement is %s' % my_seq.reverse_complement()
print 'protein translation is %s' % my_seq.translate()

SeqIO Example

from Bio import SeqIO
seq = SeqIO.read('../data/mtbReferenceGenome', format='fasta')
idx = mtbgenome.seq.find('ATG')
mtbgenome.seq[idx:idx + 99].translate()
mtbgenome.seq[idx:idx + 99].reverse_complement

Entrez Module

from Bio import Entrez
Entrez.email = "matthewdavis@berkeley.edu"    # you have to tell NCBI who you are
# Search Pubmed
handle = Entrez.esearch(db='pubmed', term='marcotte em[au] davis[au]')
result = Entrez.read(handle)
# Fetch the result
handle = Entrez.efetch(db='pubmed', id=result['IdList'][0], retmode='text')
result = Entrez.read(handle)
# search gene database
handle = Entrez.esearch(db='gene', term='katG')
result = Entrez.read(handle)
# fetch the result
handle = Entrez.esearch(db='gene', id=result['IdList'][1], retmode='text')
b = Entrez.read(handle)
# complicated, b/c there's lots of info in an Entrezgene listing

R Bindings: RPy

from rpy import *
# assign a value to an r instance
r.assign('x', 5)
r.assign('y', 10)
# raw command
r('x * y')
###  Excerpt from my old code
    def makeRpartTree (self, proteins = 1, xyz = 0, density = 0, ids = 0, verbose = 0):
        accepts target and time labels for the pointcloud target to make an Rpart tree.
        _label = self.factor + '__' + self.time
        self.preppc_for_rpart(proteins, xyz, density, ids, verbose)
        r.assign('ppc', self.ppc)
        r('ppc = (as.data.frame(ppc))')
        r('ppc = (as.matrix(ppc))')
        r.assign('target', self.data[_label])
        rpfit = r("maketreespy(ppc, target, '" + _label + ".png')")
        self.rpfit = rpfit