C687 Tutorial: Homology Module


The Homology Module allows you to build a 3D model of a protein based on the 3D structure or structures of one or more homologous proteins. The protein with the undetermined structure that you want to model is called the "model", "unknown", or "sequence" protein. The protein(s) with known 3D structures is/are are referred to as the "reference" or "real" protein(s).

In this tutorial, we will build a model of the structure of an "unknown" zinc finger domain (with an undetermined structure). Our model will be based on the similar sequence and known structure of another "reference" zinc finger protein.

Unknown protein:
>976347 Human zinc finger homeodomain protein (Res 724-750)
KPFRCEVCNYSTTTKGNLSIHMKGASSTMGAHSK

Reference protein:
>3znf.pdb Human enhancer binding protein zinc finger
RPYHCSYCNFSFKTKGNLTKHMKSKAHSKK

(minimized average NMR structure)

This is a relatively simple case because:

  1. Zinc fingers are small and very highly conserved domains:
    ther is just one gap in the alignment, and the unknown protein has one less C-terminal residue than the reference protein when the sequences are aligned.
  2. The alignment is obvious: 17 of 30/34 residues are identical.
  3. We will only use a single reference protein.
    Although it is simpler to use a single reference protein, IT IS MUCH MORE RELIABLE TO USE SEVERAL REFERENCE PROTEINS. This is because comparison of the several known structures allows you to identify regions of STRUCTURAL conservation in addition to regions of sequence conservation. In your research you should choose to use as many reference proteins as possible.

The steps involved in this process are:

  1. Obtain the sequences of both the unknown and the reference proteins.
  2. Align these sequences using your favorite alignment tools.
  3. Obtain a structure file for each reference protein
  4. Read the sequences and structures into InsightII
  5. Identify which sequence corresponds to which structure
  6. Find structurally conserved regions (SCRs) in the reference proteins (only possible if there is more than one reference protein)
  7. Copy the coordinates of the conserved regions in one of the reference proteins to the model protein.
  8. Propose structures for the loops or variable regions (VRs) between the SCRs.
  9. Make corrections to sidechain conformations as appropriate.
  10. Refine the structure my energy minimization and/or molecular dynamics.


Step 1: Getting Started

Duration: ~5 minutes
Purpose: This section should teach you how to start up Insight/Homology remotely and give you a brief overview of the program layout.

  1. The Homology module is currently licensed only on splatter. Therefore, unless you are working on splatter, it is necessary to run the program remotely on splatter, then display on your local monitor.

    To run remotely on splatter,
    xhost splatter
    telnet splatter (then login)
    insightII (to start the program)

    This tutorial is MUCH FASTER if you use splatter. If splatter is not in use, choose this woorkstation!

  2. Choose the Homology module and take a few minutes to browse through the menu options

Step 2: Reading and Aligning Sequences

Duration: ~20 minutes
Purpose: this section should teach you how to read in, manipulate, and align sequences and boxes.

Download the following files for this section of the tutorial:

  1. Click on Sequences/get, choose single, type in newzf.align for the alignment file name and type in a new name for this sequence (e.g., newzf) A Sequence Window will appear with one sequence. The sequence will be shown with lowercase letters because there is no 3D model associated with the sequence.

  2. Click on molecule/getand get 3znf.pdb. Name this molecule ZNF.

  3. Click on Sequences/extract and extract the sequence from ZNF.
    The sequence of your 3D model will appear in the sequence window. This new sequence will be shown with UPPERCASE letters because there is a 3D model associated with this sequence.

  4. Automatically align the sequences.
    Click on Alignment/Pairwise_sequence/Automatic, specifiy the 2 sequences, and execute.

    PERFORM A MOLECULAR MODELING EXPERIMENT: Repeat this step and choose different scoring matrices and gap & gap extension penalty values. Which alignment is best? Where should the loop(s) in the new sequence begin and end?

    Not all residues of 3ZNF must match residues of NEWZF: there could be a "mini-loop" of one, two, or three residues of 3ZNF that are replaced by a larger loop in NEWZF. It might be better to "cut out" the "mini-loop" in 3ZNF rather than try to match these few residues with residues in the new sequence.

  5. Examine and test the options of the Sequence Wndow:
    This section is particularly important if you will perform homology modeling in your research or in your Independent Modeling Project. Therefore, take the time NOW to learn how to use this program. If an operation is not clear, ask Marty or Brandt for advice.
    Seq Mode
    When Mode is set to Seq, the mouse moves sequences and adds, deletes, & moves gaps.
    • Moving sequences:
      Click & drag the MIDDLE mouse button on the sequence; the entire sequence moves. If you have boxes, you can't move a gap, N-terminus, or C-terminus into the box.
    • Adding gaps:
      Click & hold the RIGHT mouse button on the sequence and drag RIGHT; the gap will appear on the right of the picked residue. Or click & hold the LEFT mouse button on the sequence and drag LEFT; the gap will appear on the left of the picked residue.
    • Deleting gaps:
      Do the reverse of the previous step.
    • Moving gaps:
      Click & drag the MIDDLE mouse button on a gap character ("-"); the entire gap moves withing the sequence. Click & drag the LEFT or RIGHT mouse button on a gap character ("-"); the gap is split and one side is moved.
    Instead of clicking & dragging, you can click & release on the "start" residue, then hold down the <Crtl> key and click on the "destination" residue.

    Box Mode
    When Mode is set to Box, the mouse creates, moves resizes, and deletes sequence boxes.
    • Creating a box:
      Click & drag the LEFT mouse button to draw a box around the two aligned sequences. If you include a gap in your box, the box will be reduced to eliminate the gap. Tthe vertical height of the box can't be changed after it's created (if you have more than two sequences). You can also use the Boxes/Initialize menu (Use the menu if you are working from a remote workstation).
    • Moving a box:
      Click & drag the MIDDLE mouse button on a residue in the box. You can't move boxes through gaps.
    • Resizing a box:
      Click & drag the RIGHT mouse button on a residue in the box. Move the mouse to a box edge to "expand" or "shrink" that edge. You can't expand boxes into gaps.
    • Freezing or Unfreezing a box:
      Click & release the MIDDLE or RIGHT mouse button on a box, then immediately click the LEFT mouse button. You can also use the Boxes/Freeze or Boxes/UnFreeze menus (Use the menu if you are working from a remote workstation).
    • Deleting a box:
      Click & release the MIDDLE or RIGHT mouse button on a box, then immediately hold down the <Crtl> button and click the LEFT mouse button. You must unfreeze frozen boxes before deleting them. You can also use the Boxes/Delete menu (Use the menu if you are working from a remote workstation).

    To select a box, click on a residue in the box with the LEFT mouse button. Boxes may overlap partially or completely. To select one of several overlapping boxes, click the LEFT mouse button repeatedly on a residue in the box. The most relevant box is always visible.

    SEQUENCE BOXES
    Box Type Color Order of Relevance
    for visibility
    selected box
    (the box that you pick during execution of a command)
    yellow
    1
    prompt box and suggestion boxes
    (offered as a visual aid during the execution of a command)
    yellow
    2
    suggestion box
    (offered as a visual aid during the execution of a command)
    yellow
    3
    active sequence box
    (the box that you are currently scrolling)
    green
    4
    summary box white
    5
    Frozen m-box
    (used in multiple alignments)
    red
    6
    m-box
    (used in multiple alignments)
    blue
    7
    Frozen sequence box red
    8
    Regular sequence box cyan
    9

    Font size
    changes the font size. When remotely logged onto another workstation, type in a font size value in this menu rather than using the slide bar (interactive graphics changes via slide bars are too slow on remote workstations).

    Color
    • C-alpha: each letter is the same color as the residue in the 3D model.
    • p-value: residues in m-blocks are colored shades of magenta to show statistical significance of sequence similarity.
    • Contents: all residues in m-blocks are colored cyan

  6. Decide on a "best algnment" that has just ONE gap. If you have made extensive modifications to your sequence box, you can do this by deleting boxes & realigning the sequences, or just by deleting everything and starting over.

Step 3: Assigning Coordinates to the Unknown Protein

Duration: ~5 minutes
Purpose: To learn how to assign coordinates to the unknown protein.

  1. Draw a box around the regions of the 2 sequences that are aligned. Boxes must NOT contain gaps. Freeze these boxes.

  2. Click on Sequences/AssignCoords
    Enter the first box number. Set Segment Definition to SCR. Choose bump check to search for steric violations as the new structure is generated. Examine the new structure and compare it to the old structure. Repeat for the other box.

  3. Review the Textport window to determine if "bumps" are present. The presence of "bumps" may indicte a problem with the sequence alignment or a need for model-building to fix the "bumps".


Step 4: Building a loop.

Duration: ~15 minutes
Purpose: To learn how to build a loop into the model of the unknown protein.

  1. Click on Loops/Generate. Click on the residue N-terminal to the loop in a frozen box in your unknown sequence to specify the Start Residue. Click on the residue C-terminal to the loop in a frozen box in your unknown sequence to specify the Stop Residue. The number of Flex Residues will be automatically entered, but verify that the number of Flex Residues corresponds to the loop length in the unknown sequence.

  2. Click Execute. The program will randomly select main-chain dihedral angles and then minimize these angles. Side chains will be set to extended rotamer angles. A total of 10 loops will be randomly generated.

  3. A menu will appear that allows you to display each of these loops. Display ALL loops. Then display the loop with the lowest low Root-Mean-Square (RMS) geometrical alignment. Display all other loops one-by-one.

  4. Choose the best loop. The best choice is the loop that has a low Root-Mean-Square (RMS) geometrical alignment of the loop-to-Start Residue and loop-to-Stop Residue interface, The loop should also extend away from the body of the protein.

  5. Assign the coordinates for this loop to your model using the Loops/AssignCoords menu.

  6. If NO loops are acceptable, repeat this procedure and add one Flex Residue to either side of the loop (i.e., make the Start Residue and Stop Residue part of the Flex Residue loop).

"SearchLoops" Loop Generation Procedure

Some researchers prefer to model the new loop based on existing loops in the Protein Data Bank. However, there is no specific advantage to modeling the unknown loop based on an existing loop model vs de novo loop generation. To search for existing loops, the entire Protein Data Bank must be on a hard disk accessible by your workstation. Workstaitons in the MolViz Facility are not configured to contain the entire Protein Data Bank, so this method is not typically used in the MolViz Facility. If you want to try this method, contact the MolViz Facility Staff.

  1. Click on Loops/Search. Click on the residue N-terminal to the loop in a frozen box in your unknown sequence to specify the Start Residue. Click on the residue C-terminal to the loop in a frozen box in your unknown sequence to specify the Stop Residue. The number of Flex Residues will be automatically entered, but verify that the number of Flex Residues corresponds to the loop length in the unknown sequence. The number of Preflex and Postflex residues will be 5, unless the sequences is shorter than 5 residues before or after the loop. There moust be a MINIMUM OF TWO Preflex and Postflex residues. See the Loops Diagram from Lecture 2 for a diagram of the Preflex, Postflex, and Flex resudies.

  2. Click Execute. The program will search the alpha-Carbon coordinates of the PDB and find the 10 best loops that can be spliced into your new model.

  3. A menu will appear that allows you to display each of these loops. Toggle Tails to be On so that the Preflex and Postflex residues are displayed.

  4. Choose the best loop. The best choice is the one that has a low Root-Mean-Square (RMS) superposition of Preflex and Postflex regions, and extends away from the body of the protein.

  5. Assign the coordinates for this loop to your model using the Loops/AssignCoords menu.

  6. If NO loops are acceptable, repeat this procedure and add one Flex Residue to either side of the loop (i.e., make the Start Residue and Stop Residue part of the Flex Residue loop).

"Designated_Loop" Loop Generation Procedure

If you have more than one reference sequence, and one of the sequeces contains a loop that can be used to model the loop of your unknown sequence (but the other sequences have a gap), use the Designated_loop option in the Sequences/AssignCoords menu.

If you plan to do Homology Modeling with multiple sequence alignments for your Independent Modeling Project, more information is available in the Homology Manual in room A701 (please DO NOT REMOVE manuals from room A701 without contacting the Molecular Visualization Facility staff!).


Step 4: Fixing the Geometry

Duration: ~20 minutes
Purpose: To learn how to change the initial 3D structural model of the unknown protein to a structure that is more stable and structurally & chemically reasonable.

  1. Compare the the unknown and reference protein structures. Examine:
Identify several residues that have side chains with potentially non-optimal rotamer angles. During this analysis, remember how the program has assigned conformations to each of these parts of the unknown protein.

Use the Residue/Manual_Rotamer menu to manually adjust the rotamer angles for a residue. Review rotamers for a specific Rotamer Residue with Evaluate Energy and Bump Check turned on. Select the Current rotamer and vie the energy (reported at the bottom of the screen). Select the next rotamer and view the energy. Repeat for all other rotamers. Select the best rotamer, change Review to Done, and execute to set this rotamer selection in the model.

Use the Residue/Auto_Rotamer menu to automatically adjust the rotamer angles for several residues. Add the four loop residues that have side chain rotamer angles (i.e., Glycine and Alanine do NOT have rotamer angles!). Search for the best combination of rotamers. When the calculation is finished, click on Done and Keep the new rotamers.

Caution! Do not select more than 5 residues for auto-rotamer optimization for this tutorial. The length of this calculation dramatically increases as you add 6, 7, or more side chains.


Part 5: Verify that you have completed this portion of the assignment.

  1. SAVE YOUR FINAL HOMOLOGY MODEL AS zf_homology.psv.
  2. See the WWW/Alignment/Homology Assignment page for details.


Back to  |  C687 Spring 1999  |  Courses & Instruction  |  MolViz Home  |
Send comments to chemvis@indiana.edu
Last updated: 01/23/2001