January 11, 1999
Instructor: Marty Pagel
Questions to guide you in searching biosequence databases
Consider these questions before you begin a search of biochemical sequence databases.
Careful consideration will help you to more effectively select the database(s) to explore, the tools to use in
searching the database(s), and the parameters to use in employing those tools.
between databases.
Consider redundancy within databases and
Reptiles, mammals, and birds share the underlying pattern of homologous forelimb bones of their tetrapod common ancestor, but their wings have evolved independently. Note different arrangements of skeletal elements supporting the wing. (Figure 1 of Bolker and Raff (1996), "Developmental genetics and traditional homology," BioEssays 18(6): 489-494.) |
![]() |
If a sequence > 100 residues,
Alignments based on Amino Acid Sequences
| Number of Mutations | Score |
|---|---|
Limitations:
Very Sensitive to Penalty Weights
An Example:
FOUR POSSIBLE ALIGNMENTS OF TWO SEQUENCES
matches are shown in UPPERCASE letters
unmatched residues are shown in lowercase letters
gaps are shown with "-" characters
-----------------------------------------
SEQ1 ATGCGggACaTG
SEQ2 AgGCG--cC-TG (7 matches, 1 gap of 1 bp, 1 gap of 2 bp)
or
SEQ1 ATGCGGgaCaTG
SEQ2 AgGCGc--C-TG (7 matches, 1 gap of 1 bp, 1 gap of 2 bp)
or
SEQ1 ATGCGggaCATG
SEQ2 AgGCG---CcTG (7 matches, 1 gap of 3 bp)
or
SEQ1 ATGCGgGaCaTG
SEQ2 AgGCG-c-C-TG (7 matches, 3 gaps of 1 bp)
SCORES FOR THESE FOUR POSSIBLE ALIGNMENTS UNDER DIFFERENT GAP PENALTIES
-----------------------------------------------------------------------
--------------------------------------------
Gap Opening Penalty 0 1 1 1
Gap Extension Penalty 0 0 0.1 1
--------------------------------------------
Alignment 1 7 5 4.9 4
Alignment 2 7 5 4.9 4
Alignment 3 7 6 5.8 4
Alignment 4 7 4 4 4
The two entire amino acid sequences are directly compared.
Fully automatic and guaranteed to converge on one solution.
CPU & memory requirements limit this technique to the aligment of < 5 sequences.
Two sequences are aligned, then third sequence is aligned to consensus sequence, etc.
A fast method, but depends on the order in which the sequences are compared---can miss large
regions of similarity.
Alignment scores of "blocks" of residues in sequences are compared to
the alignment scores of a (population-weighted) random sequence and the sequences.
If a sequence's alignment score is greater than 6 times
the standard deviation of a random sequence, then the alignment contains structurally conserved regions.
A very fast, very popular sequence alignment method.
Comparing Segments of
2 alpha-Carbon matrices
Validation of Sequence Alignments
by Off-Diagonal alpha-Carbon
Matrix Elements
Align sequences weighted by "distances" between host species in evolutionary or phylogenetic tree.
Phylogeny schemes:
Pearson/FASTA sequence format
Alignment engines
Manually Editing Alignments
Other uses of sequence alignments besides Homology Modeling
Building the Homology Model
General Methodology for Homology Protein Modeling:
Scan the database of alpha-carbon positions in the PDB for loops that
have the same "loop base".Definition of the PREFLEX and POSTFLEX "Stem" Regions

Definition of the Geometry of the "Base"

The propensity for the residue in a secondary structure type is evaluated in the
context of frequency that amino acid type is found in a particular secondary structure.
80% of identical residues and 70-75% of mutated residues have the same rotomeric state in
homologous proteins. Therefore, homology models adopt the same roameric state for all side
chains where possible. For mutations to longer side chains, statistically-prefered angles are chosen
(e.g., for most side chains with 2 chi angles, there are 4 commonly-seen conformations of 9 possible gauche-anti conformers).
Specific types of rotamers are also found with certain structural motifs.
User-defined "moving" side chains can be automatically changed iteratively until there is no change in energy.
Evaluating the Quality of Protein Models
Options available in the ProStat menu of the Homology Module of InsightII
are shown in bold. Options available in the InsightII Viewer Module
are shown in italics.
Mainchain torsion angle distribution (Ramachandran plots)
Sidechain torsion angle distributions
"Bump" check
"Atomic contact" quality
location and geometry
distribution of polar and nonpolar amino acids on surface & interior
RMS deviations between backbone atoms of superimposed structures
Energy comparisons of homology models are NOT accurate enough to determine which homology model is correct.
Incorrectly folded protein models almost always have larger surface areas than correctly folded proteins.