C687 Tutorial: Homology Module
The Homology Module allows you to build a 3D model of a protein
based on the 3D structure or structures of one or more homologous proteins.
The protein with the undetermined structure that you want to model is called the "model",
"unknown", or "sequence" protein.
The protein(s) with known 3D structures is/are are referred to as
the "reference" or "real" protein(s).
In this tutorial, we will build a model of the structure of an "unknown" zinc finger
domain (with an undetermined structure). Our model will be based on
the similar sequence and known structure of another "reference" zinc finger protein.
Unknown protein:
>976347 Human zinc finger homeodomain protein (Res 724-750)
KPFRCEVCNYSTTTKGNLSIHMKGASSTMGAHSK
Reference protein:
>3znf.pdb Human enhancer binding protein zinc finger
RPYHCSYCNFSFKTKGNLTKHMKSKAHSKK
(minimized average NMR structure)
This is a relatively simple case because:
- Zinc fingers are small and very highly conserved domains:
ther is just one gap in the alignment, and the
unknown protein has one less C-terminal residue than the reference protein when the sequences are aligned.
- The alignment is obvious: 17 of 30/34 residues are identical.
- We will only use a single reference protein.
Although it is simpler to use a single reference protein, IT IS MUCH
MORE RELIABLE TO USE SEVERAL REFERENCE PROTEINS. This is because comparison
of the several known structures allows you to identify regions of STRUCTURAL
conservation in addition to regions of sequence conservation. In your research
you should choose to use as many reference proteins as possible.
The steps involved in this process are:
- Obtain the sequences of both the unknown and the reference proteins.
- Align these sequences using your favorite alignment tools.
- Obtain a structure file for each reference protein
- Read the sequences and structures into InsightII
- Identify which sequence corresponds to which structure
- Find structurally conserved regions (SCRs) in the reference proteins (only
possible if there is more than one reference protein)
- Copy the coordinates of the conserved regions in one of the reference
proteins to the model protein.
- Propose structures for the loops or variable regions (VRs) between the SCRs.
- Make corrections to sidechain conformations as appropriate.
- Refine the structure my energy minimization and/or molecular dynamics.
Step 1: Getting Started
Duration: ~5 minutes
Purpose: This section should teach you how to start up Insight/Homology
remotely and give you a brief overview of the program layout.
- The Homology module is currently licensed only on splatter.
Therefore, unless you are working on splatter, it is necessary
to run the program
remotely on splatter, then display on your local monitor.
To run remotely on splatter,
xhost splatter
telnet splatter (then login)
insightII (to start the program)
This tutorial is MUCH FASTER if you use splatter. If splatter is not in use, choose this
woorkstation!
- Choose the Homology module and take a few minutes to browse through
the menu options
Step 2: Reading and Aligning Sequences
Duration: ~20 minutes
Purpose: this section should teach you how to read in, manipulate, and
align sequences and boxes.
Download the following files for this section of the tutorial:
- Click on Sequences/get, choose single, type in newzf.align for the alignment file name
and type in a new name for this sequence (e.g., newzf) A Sequence Window will appear with one sequence.
The sequence will be shown with lowercase letters because there is no 3D model associated with the sequence.
- Click on molecule/getand get 3znf.pdb. Name this molecule ZNF.
- Click on Sequences/extract and extract the sequence from ZNF.
The sequence of your 3D model will appear in the sequence window. This new sequence will be shown with
UPPERCASE letters because there is a 3D model associated with this sequence.
- Automatically align the sequences.
Click on Alignment/Pairwise_sequence/Automatic, specifiy the 2 sequences, and execute.
PERFORM A MOLECULAR MODELING EXPERIMENT: Repeat this step and choose different
scoring matrices and gap & gap extension penalty values. Which alignment is best? Where should the
loop(s) in the new sequence begin and end?
Not all residues of 3ZNF must match residues of NEWZF:
there could be a "mini-loop" of one, two, or three residues of 3ZNF that are replaced by a larger loop in NEWZF.
It might be better to "cut out" the "mini-loop" in 3ZNF rather than try to match these few residues with residues in the new sequence.
- Examine and test the options of the Sequence Wndow:
| This section is particularly important if you will
perform homology modeling in your research or in your Independent Modeling Project.
Therefore, take the time NOW to learn how to use this program. If an operation
is not clear, ask Marty or Brandt for advice.
|
- Seq Mode
- When Mode is set to Seq, the mouse moves sequences and adds, deletes, & moves gaps.
- Moving sequences:
Click & drag the MIDDLE mouse button on the sequence; the entire sequence moves.
If you have boxes, you can't move a gap, N-terminus, or C-terminus into the box.
- Adding gaps:
Click & hold the RIGHT mouse button on the sequence and drag RIGHT; the gap will appear on the right of the picked residue.
Or click & hold the LEFT mouse button on the sequence and drag LEFT; the gap will appear on the left of the picked residue.
- Deleting gaps:
Do the reverse of the previous step.
- Moving gaps:
Click & drag the MIDDLE mouse button on a gap character ("-"); the entire gap moves withing the sequence.
Click & drag the LEFT or RIGHT mouse button on a gap character ("-"); the gap is split and one side is moved.
Instead of clicking & dragging, you can click & release on the "start" residue, then hold down the <Crtl>
key and click on the "destination" residue.
- Box Mode
When Mode is set to Box, the mouse creates, moves resizes, and deletes sequence boxes.
- Creating a box:
Click & drag the LEFT mouse button to draw a box around the two aligned sequences.
If you include a gap in your box, the box will be reduced to eliminate the gap.
Tthe vertical height of the box can't be changed after it's created (if you have more than two sequences).
You can also use the Boxes/Initialize menu (Use the menu if you are working from a remote workstation).
- Moving a box:
Click & drag the MIDDLE mouse button on a residue in the box. You can't move boxes through gaps.
- Resizing a box:
Click & drag the RIGHT mouse button on a residue in the box. Move the mouse to a box edge to "expand"
or "shrink" that edge. You can't expand boxes into gaps.
- Freezing or Unfreezing a box:
Click & release the MIDDLE or RIGHT mouse button on a box, then immediately click the LEFT mouse button.
You can also use the Boxes/Freeze or Boxes/UnFreeze menus (Use the menu if you are working from a remote workstation).
- Deleting a box:
Click & release the MIDDLE or RIGHT mouse button on a box,
then immediately hold down the <Crtl> button and click the LEFT mouse button.
You must unfreeze frozen boxes before deleting them.
You can also use the Boxes/Delete menu (Use the menu if you are working from a remote workstation).
To select a box, click on a residue in the box with the LEFT mouse button.
Boxes may overlap partially or completely.
To select one of several overlapping boxes, click the LEFT mouse button repeatedly on a residue in the box.
The most relevant box is always visible.
| SEQUENCE BOXES |
| Box Type | Color | Order of Relevance for visibility |
selected box
(the box that you pick during execution of a command) | yellow | 1 |
prompt box and suggestion boxes
(offered as a visual aid during the execution of a command) | yellow | 2 |
suggestion box
(offered as a visual aid during the execution of a command) | yellow | 3 |
active sequence box
(the box that you are currently scrolling) | green | 4 |
| summary box | white | 5 |
Frozen m-box
(used in multiple alignments) | red | 6 |
m-box
(used in multiple alignments) | blue | 7 |
| Frozen sequence box | red | 8 |
| Regular sequence box | cyan | 9 |
- Font size
- changes the font size. When remotely logged onto another workstation, type in a font size
value in this menu rather than using the slide bar (interactive graphics changes via slide bars are too slow
on remote workstations).
- Color
- C-alpha: each letter is the same color as the residue in the 3D model.
- p-value: residues in m-blocks are colored shades of magenta to show statistical significance of sequence similarity.
- Contents: all residues in m-blocks are colored cyan
- Decide on a "best algnment" that has just ONE gap.
If you have made extensive modifications to your sequence box,
you can do this by deleting boxes & realigning the sequences, or just by deleting everything
and starting over.
Step 3: Assigning Coordinates to the Unknown Protein
Duration: ~5 minutes
Purpose: To learn how to assign coordinates to the unknown protein.
- Draw a box around the regions of the 2 sequences that are aligned. Boxes must NOT contain gaps.
Freeze these boxes.
- Click on Sequences/AssignCoords
Enter the first box number. Set Segment Definition to SCR. Choose bump check to search for steric violations
as the new structure is generated.
Examine the new structure and compare it to the old
structure. Repeat for the other box.
- Review the Textport window to determine if "bumps" are present. The presence
of "bumps" may indicte a problem with the sequence alignment or a need for model-building to fix the "bumps".
Step 4: Building a loop.
Duration: ~15 minutes
Purpose: To learn how to build a loop into the model of the unknown protein.
- Click on Loops/Generate. Click on the residue N-terminal to the loop in a frozen box in your unknown sequence to specify the
Start Residue.
Click on the residue C-terminal to the loop in a frozen box in your unknown sequence to specify the Stop Residue.
The number of Flex Residues will be automatically
entered, but verify that the number of Flex Residues corresponds to the loop length in the unknown sequence.
- Click Execute. The program will randomly select main-chain dihedral angles and then minimize these angles.
Side chains will be set to extended rotamer angles. A total of 10 loops will be randomly generated.
- A menu will appear that allows you to display each of these loops. Display ALL loops. Then display the loop with the lowest
low Root-Mean-Square (RMS) geometrical alignment. Display all other loops one-by-one.
- Choose the best loop. The best choice is the loop that has a low Root-Mean-Square (RMS) geometrical alignment of
the loop-to-Start Residue and loop-to-Stop Residue interface,
The loop should also extend away from the body of the protein.
- Assign the coordinates for this loop to your model using the Loops/AssignCoords menu.
- If NO loops are acceptable, repeat this procedure and add one Flex Residue to either side of the loop (i.e., make the Start Residue
and Stop Residue part of the Flex Residue loop).
"SearchLoops" Loop Generation Procedure
Some researchers prefer to model the new loop based on existing loops in the Protein Data Bank.
However, there is no specific advantage to modeling the unknown loop based on an existing loop model vs de novo loop generation.
To search for existing loops, the entire Protein Data Bank must be on a hard disk accessible by your workstation.
Workstaitons in the MolViz Facility are not configured to contain the entire Protein Data Bank, so this method
is not typically used in the MolViz Facility. If you want to try this method, contact the
MolViz Facility Staff.
- Click on Loops/Search. Click on the residue N-terminal to the loop in a frozen box in your unknown sequence to specify the
Start Residue.
Click on the residue C-terminal to the loop in a frozen box in your unknown sequence to specify the Stop Residue.
The number of Flex Residues will be automatically
entered, but verify that the number of Flex Residues corresponds to the loop length in the unknown sequence.
The number of Preflex and Postflex residues will be 5, unless the sequences is shorter than 5 residues before or after the loop.
There moust be a MINIMUM OF TWO Preflex and Postflex residues.
See the Loops Diagram from Lecture 2 for a diagram of the Preflex, Postflex, and Flex resudies.
- Click Execute. The program will search the alpha-Carbon coordinates of the PDB and find the 10 best loops that
can be spliced into your new model.
- A menu will appear that allows you to display each of these loops. Toggle Tails to be
On so that the Preflex and Postflex residues are displayed.
- Choose the best loop. The best choice is the one that has a low Root-Mean-Square (RMS) superposition of Preflex and Postflex regions,
and extends away from the body of the protein.
- Assign the coordinates for this loop to your model using the Loops/AssignCoords menu.
- If NO loops are acceptable, repeat this procedure and add one Flex Residue to either side of the loop (i.e., make the Start Residue
and Stop Residue part of the Flex Residue loop).
|
"Designated_Loop" Loop Generation Procedure
If you have more than one reference sequence, and one of the sequeces contains a loop that can be used to
model the loop of your unknown sequence (but the other sequences have a gap), use the Designated_loop option
in the Sequences/AssignCoords menu.
|
If you plan to do Homology Modeling with multiple sequence alignments for your Independent Modeling
Project, more information is available in the Homology Manual in room A701 (please
DO NOT REMOVE manuals from room A701 without contacting the Molecular Visualization Facility staff!).
Step 4: Fixing the Geometry
Duration: ~20 minutes
Purpose: To learn how to change the initial 3D structural model of
the unknown protein to a structure that is more stable and structurally
& chemically reasonable.
- Compare the the unknown and reference protein structures. Examine:
- Examine the confomations of the backbones
- Examine the sidechains of residues that are identical in both proteins
- Examine the sidechains of residues that are similar in the two proteins
- Examine the sidechains of residues that are very different in the two proteins.
Identify several residues that have side chains with potentially non-optimal rotamer angles.
During this analysis, remember how the program
has assigned conformations to each of these parts of the unknown protein.
Use the Residue/Manual_Rotamer
menu to manually adjust the rotamer angles for a residue.
Review rotamers for a specific Rotamer Residue with Evaluate Energy and Bump Check turned on.
Select the Current rotamer and vie the energy (reported at the bottom of the screen). Select
the next rotamer and view the energy. Repeat for all other rotamers. Select the best rotamer,
change Review to Done, and execute to set this rotamer selection in the model.
Use the Residue/Auto_Rotamer
menu to automatically adjust the rotamer angles for several residues.
Add the four loop residues that have side chain rotamer angles (i.e., Glycine and Alanine do NOT have rotamer angles!).
Search for the best combination of rotamers.
When the calculation is finished, click on Done and Keep the new rotamers.
Caution! Do not select more than 5 residues for auto-rotamer optimization for this tutorial. The
length of this calculation dramatically increases as you add 6, 7, or more side chains.
Part 5: Verify that you have completed this portion of the assignment.
- SAVE YOUR FINAL HOMOLOGY MODEL AS zf_homology.psv.
- See the WWW/Alignment/Homology Assignment page for details.
Back to | C687 Spring 1999 | Courses & Instruction
| MolViz Home |
Send comments to chemvis@indiana.edu
Last updated: 01/23/2001