Structure Gazing with Chimera


Introduction to Chimera documentation

Overview of pulldown menus
  • File: Opens, saves, and closes files
  • Select: Various methods for selecting things
  • Actions: Modulates display options
  • Presets: Makes nice displays for publication
  • Tools: Modules for doing cool stuff
  • Favorites: Your favorite stuff
  • Help: Various ways to get help

Working with a New Structure: Wild Type KatG

Mutations in the catalase-peroxidase (KatG) enzyme have been shown to result in resistance to the drug isoniazid (INH) in tuberculosis. Additional biochemical studies have shown that KatG is responsible for linking INH with NADH. This INH-NADH adduct has been shown to inhibit the enoyl acyl-carrier protein reductase (InhA) critical for cell wall synthesis. Let's start by taking a look at the KatG structure:

1) PDB Code 1SJ2
2) Move things around with the mouse (eg rotate, translate, zoom)
3) Change the display style using pulldown menus (eg ribbons, atom representations,
4) Use presets to change background
5) Add 2-dimensional labels
6) Save image
wtDimer.jpg
We can now see that KatG is a dimer (obligate) with an active site in the N-terminal of each monomer. We can also see that the active sites have a heme group necessary for catalysis.

Comparing Two Structures: Wild Type KatG vs. S315T
One of the first INH resistant mutants that was isolated was serine 315 mutated to threonine. Note that a traditional catalase reaction is described as:
2 H2O2 → 2 H2O + O2
and typically procedes in two stages:
H2O2 + Fe(III)-E → H2O + O=Fe(IV)-E(.+)
H2O2 + O=Fe(IV)-E(.+) → H2O + Fe(III)-E + O2

A traditional peroxidase reaction is described as:
ROOR' + electron donor (2 e-) + 2H+ → ROH + R'OH

The proposed formation of the INH-NADH adduct is:

adduct.jpg
Let's take a look at the active site and see if we can come up with a structural explaination for resistance:

1) PDB Code 1SJ2
2) Clean up structure (eg delete non-relevant ligands and delete chain B)
3) Focus on active site (eg find heme, focus in on heme, display atoms within 5A of heme)
4) Identify residue that confers resistance and color (eg S315)
5) Display molecular surface (eg use mesh capping and move capping to show residue, active site and entry)
6) Computationally mutate 315 to threonine using swapaa
wtMutated.jpg
Why do you think the mutant confers resistance?

The structure for S315T has actually been solved. Let's take a look at that structure as well.
7) Close and reopen 1SJ2; Open 2CCD
8) Overlay structures using Matchmaker and look at conservation
9) Open Reply Log to look at C-alpha RMSD
10) Look at active site and see what's going on

wtMutantOverlay.jpg

We can see two main things from this second analysis. First, the very naive rotamer change got the threonine orientation wrong. The threonine is in a different orientation in the actual crystal structure. Second, we can see that the addition of the methyl group closes off the access channel such that a large substrate cannot access the heme but a smaller ligand, like hydrogen peroxide, would still be able to get to the heme, explaining the biochemical data.

Quantitating Mutant Location
In the paper Discussion, the authors spend some time exploring how mutant location may be correlated with resistance mechanism. One of the flaws of the paper, though, is the generalizations of correlation without any actual statistical analysis of the properties. We will look at the statistical analysis more this afternoon when we explore fitting. Since we are looking at structures now, though, let's develop some quantitative frames of reference for where the mutants are located in the protein. To generate and keep this data organized, though, we are going to need to use Chimera in a new way:

Using Chimera as a library AND as a graphical interface


chimera --help
 
chimera --nogui myscript.py
 
chimera --nogui --script myscript.py arg1 arg2 ...

Places to go for help

Back to the problem at hand...

There are two four main categories of side chains listed in the paper: Active Site, Dimer Interface, Substrate Access Channel, and Far Far Away (N- and C-terminal domains). The authors do not describe how they create these categories so we can make up our own definitions. We will start with the active site, using this definition:

Active Site Distance = minimum distance between any side chain atom and any heme atom

The first thing we need is a structure to work with :O) Let's open one using the Chimera libraries. First, download the pdb file 1SJ2 into a directory. Then, start chimera from that directory. In the IDLE window, type the following:
wildType = chimera.openModels.open('1SJ2.pdb')
The KatG structure is now open and saved as the variable wildType.

The next thing we want to do is find all atoms associated with the Heme group. To do this, we loop through the residues until we find the one named HEM:
for r in wildType[0].residues:
    if r.type == 'HEM':
              print r.type, r.id

Now, let's create a list of the Heme group atoms. Also, let's restrict things to chain A because the information from chain B should be basically redundant.
hemeCoords = []
for r in wildType[0].residues:
    if r.type == 'HEM' and r.id.chainId == 'A':
        for a in r.atoms:
            hemeCoords.append(a.coord())
 

Next, let's find the protein residues in chain A.
for r in wildType[0].residues:
    if not r.isHet and r.id.chainId == 'A':
        print r.type

and calculate the distance between each protein atom and each Heme group atom.
for r in wildType[0].residues:
    if not r.isHet and r.id.chainId == 'A':
        for a in r.atoms:
            for a1 in hemeCoords:
                tmpDist = a.coord().distance(a1)
                print tmpDist

Next, let's figure out the minimum distance between the Heme group and each side chain and save that information in a data structure:
distances = {}
for r in wildType[0].residues:
    dist = 999
    if not r.isHet and r.id.chainId == 'A':
        for a in r.atoms:
            for a1 in hemeCoords:
                tmpDist = a.coord().distance(a1)
                if tmpDist < dist:
                    dist = tmpDist
 
        distances['%s_%s' %(r.type,r.id)] = [dist]
print distances
Finally, let's write everything to a nicely formatted file for future analysis:
out = open('structure_distances.txt', 'w')
out.write('Residue\tDist from HEM\n')
for r in distances.keys():
    out.write('%s\t%s\n' %(r, distances[r][0]))
out.close()
 

We can also put all of the above into a script and run it command line without the IDLE window.
import chimera
import sys, os
 
# syntax to run in Terminal: 'chimera --nogui ./chimerastuff.py'
wildType = chimera.openModels.open('1SJ2.pdb')
 
hemeCoords = []
for r in wildType[0].residues:
    if r.type == 'HEM' and r.id.chainId == 'A':
        for a in r.atoms:
            hemeCoords.append(a.coord())
 
distances = {}
for r in wildType[0].residues:
    dist = 999
    if not r.isHet and r.id.chainId == 'A':
        for a in r.atoms:
            for a1 in hemeCoords:
                tmpDist = a.coord().distance(a1)
                if tmpDist < dist:
                    dist = tmpDist
 
        distances['%s_%s' %(r.type,r.id)] = [dist]
 
out = open('structure_distances.txt', 'w')
out.write('Residue\tDist from HEM\n')
for r in distances.keys():
    out.write('%s\t%s\n' %(r, distances[r][0]))
out.close()
$chimera --nogui calcDistances.py


As it turns out, a new structure study of a KatG homolog from B. pseudomallei was just published in June. The paper is still in press, but the structures have been deposited in the PDB. (If you're interested, the paper can be downloaded from here. The new series of structures have both INH and NAD bound to them (cool!) though not simultaneously (boo!). Let's take a look at the INH-bound structure in comparison to the wild type apo (unbound) structure:

1) PDB Code 1SJ2; PDB Code 3N3N
2) Open Command Line interface
3) Run command
mmaker #0 #1 showAlignment true
4) Look at sequence conservation and structure alignment
5) select and focus on HEM group
sel :HEM.A
focus sel
6) Find the drug!

OMG! The active site isn't the active site!


Anywho, what used to be the active site is now the Heme group site, which is different from the INH binding site (assuming that INH would bind in the same location in both systems and that the binding is not a crystal artifact). Let's repeat the study above and add a new column with distances from the bound INH.

First, we need to read in ligand bound structure:
ligBound = chimera.openModels.open('3N3N.pdb')
Next, we need to overlay the structures. As it turns out, Chimera nicely allows us to import the Command Line into a script and run things inside the script the same way we would in the GUI:
from chimera import runCommand
runCommand('mmaker #0 #1')
 
Remember, if you get stuck, you can always double check your code either running things through the IDLE window or running things without the nogui flag!

We can repeat the same thing we did above with the Heme group. But we need to know the name of the ligand in the pdb file. Let's use Chimera to figure out the name.

...and now we can add the code to add the new column and save it to a file.
inhCoords = []
for r in ligBound[0].residues:
    if r.type == 'NIZ' and r.id.chainId == 'A':
        for a in r.atoms:
            inhCoords.append(a.coord())
for r in wildType[0].residues:
    dist = 999
    if not r.isHet and r.id.chainId == 'A':
        for a in r.atoms:
            for a1 in inhCoords:
                tmpDist = a.coord().distance(a1)
                if tmpDist < dist:
                    dist = tmpDist
 
        distances['%s_%s' %(r.type,r.id)].append(dist)
out = open('structure_distances.txt', 'w')
keys = distances.keys()
keys.sort()
out.write('Residue\tDist from HEM\tDist from INH\n')
for r in keys:
 out.write('%s\t%s\t%s\n' %(r, distances[r][0],
             distances[r][1]))
out.close()
 

The End! Questions?

Exercises
1) Adding more pieces to the puzzle
There are data for several 315 mutants in the paper, summarized below:
Mutant
Catalase
Peroxidase
Adduct Formation
WT
2.4x10^6
7.3
100%
S315T
5.40x10^5
5.7
7.4%
S315I
3.36x10^5
48.1
44.7%
S315G
4.18x10^5
30.5
35.4%
S315N
1.55x10^5
20.0
5.4%
S315R
1.28x10^5
51.9
3.0%
*Adduct formation is 400 microM H2O2 over 1h

Using Chimera to do more structure gazing of the unbound M. tuberculosis KatG, how would you explain these results? How do the drug bound B. pseudomallei KatG structures change your opinion (if at all)?

2) Further characterize the KatG side chains by structure
a) Extend the code we wrote above to include distance from the Dimer Interface (as defined as the minimum distance between any side chain from chain A and any side chain atom from chain B). We will be using this file this afternoon so make sure to maintain the tab-delimited format we used above!
b) Another structure was solved with NAD bound (well sort of). The PDB code is 3N3O. Note that the crystallographer only built the adenine di-phosphate portion of the NAD into the structure as the remaining portion was not well structured enough to be seen in the density. Extend the code we wrote to include distance from the NAD as well.

3) Chimera can, almost literally, make ANY pretty structure picture you can imagine. Go to the image gallery and look through their example images. Pick one that intrigues you and, using what you have learned plus the information provided at the gallery, recreate it. You might also want to check out some of the movies you can make with Chimera at the animation gallery (although the data for these are not provided so you probably won't be able to recreate them).

Solutions


2) The modified code now looks like:

import chimera
import sys, os
 
# syntax to run in Terminal: 'chimera --nogui ./chimerastuff.py'
wildType = chimera.openModels.open('1SJ2.pdb')
ligBound = chimera.openModels.open('3N3N.pdb') 
nadBound = chimera.openModels.open('3N3O.pdb')
 
 
chimera.runCommand('mmaker #0 #1')
chimera.runCommand('mmaker #0 #1')
hemeCoords = []
for r in wildType[0].residues:
    if r.type == 'HEM' and r.id.chainId == 'A':
        for a in r.atoms:
            hemeCoords.append(a.coord())
 
distances = {}
for r in wildType[0].residues:
    dist = 999
    if not r.isHet and r.id.chainId == 'A':
        for a in r.atoms:
            for a1 in hemeCoords:
                tmpDist = a.coord().distance(a1)
                if tmpDist < dist:
                    dist = tmpDist
 
        distances['%s_%s' %(r.type,r.id)] = [dist]
 
inhCoords = []
print "Computing INH distance"
for r in ligBound[0].residues:
    if r.type == 'NIZ' and r.id.chainId == 'A':
        for a in r.atoms:
            inhCoords.append(a.coord())
 
for r in wildType[0].residues:
    dist = 999
    if not r.isHet and r.id.chainId == 'A':
        for a in r.atoms:
            for a1 in inhCoords:
                tmpDist = a.coord().distance(a1)
                if tmpDist < dist:
                    dist = tmpDist
 
        distances['%s_%s' %(r.type,r.id)].append(dist)
 
print "Computing interface distance"
for r1 in wildType[0].residues:
    dist = 999
    if not r1.isHet and r1.id.chainId == 'A':
        for a1 in r1.atoms:
            for r2 in wildType[0].residues:
                if not r2.isHet and r.id.chainID == 'B':
                    for a2 in r2.atoms:
                        tmpDist = a1.coord().distance(a2)
                        if tmpDist < dist:
                            dist = tmpDist
        distances['%s_%s' %(r.type,r.id)].append(dist)
 
print "Computing NAD distance"
adpCoords = []
for r in nadBound[0].residues:
    if r.type == 'ADP' and r.id.chainId == 'A':
        for a in r.atoms:
            adpCoords.append(a.coord())
 
 
for r in wildType[0].residues:
    dist = 999
    if not r.isHet and r.id.chainId == 'A':
        for a in r.atoms:
            for a1 in adpCoords:
                tmpDist = a.coord().distance(a1)
                if tmpDist < dist:
                    dist = tmpDist
        distances['%s_%s' %(r.type,r.id)].append(dist)
 
 
out = open('structure_distances_mine.txt', 'w')
keys = distances.keys()
keys.sort()
out.write('Residue\tDist from HEM\tDist from INH\n')
for r in keys:
    out.write('%s \t%f \t%f \t%f \t%f \n' %\
           (r, distances[r][0], distances[r][1], 
            distances[r][2], distances[r][3]))
out.close()