Display COMPLETE DOCUMENT Scroll Up Scroll DOWN MORE! TOP

Why have my alignments changed in Version 9?

The following information is taken from the GCG Newsletter, July 1997.

Protein sequence alignment programs in the Wisconsin Package use scoring matrices to make comparisons between pairs of amino acids. In Version 9.0, released in December 1996, changes were made to the scoring matrices which may affect the alignments you create. If the sequences you aligned in previous versions of the Package were very similar, you may not have noticed much difference to those same alignments produced using Version 9.0. However, with more distantly related sequences the alignments produced may be substantially different.

Version 9 Changes to Scoring Matrices

There are two main changes to protein scoring matrices in Version 9.0.

The modified PAM250 matrix which was used in previous software versions is still available in Version 9.0. The matrix has been renamed oldpep.cmp, and the original floating point values were changed to integer values. However, we do not recommend using this matrix. If you are trying to align distantly related sequences and are not achieving expected results with BLOSUM62, use the BLOSUM30 matrix. You can specify this alternative matrix with the -MATRix=blosum30.cmp (UNIX) or /MATRix=blosum30.cmp (OpenVMS) command-line parameter when you run a sequence comparison program.

How Does This Affect Me?

Two common questions you may have about the alignments created in Version 9.0 and their answers are detailed below.

Q: The global alignment displayed by PileUp or Gap shows one sequence completely to the right of the others, with only a few end bases shown as matched when run with BLOSUM62 (see examples below). But with the old matrix the sequences appeared to align together. Why?

A: The old matrix (PAM250) was overly permissive of mismatches and allowed you to align unrelated sequences. The BLOSUM62 matrix is more restrictive and will not routinely align unrelated sequences along their entire lengths. Thus, you may find that the displaced sequence(s) may be unrelated to the rest of your alignment. If you have reason to believe that the sequences are globally but very distantly related, then you might want to align them with a matrix based on an evolutionary model that assumes greater divergence time, like BLOSUM30. (A complete series of BLOSUM matrices is provided in the GenMoreData directory.)

(UNIX) prompt> namels genmoredata:blosum*

(OpenVMS) prompt> dir genmoredata:blosom*

             51                                                 100   

  Calm_Human  RHVMTNLGEK LTDEEVDEMI READIDGDGQ VNYEEFVQMM TAK~~~~~~~

  Calm_Drome  RHVMTNLGEK LTDEEVDEMI READIDGDGQ VNYEEFVTMM TSK~~~~~~~

  Calm_Wheat  RHVMTNLGEK LTDEEVDEMI READVDGDGQ INYEEFVKVM MAK~~~~~~~

  Hsp2_Mouse  ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~MV RYRMRSPSEG

              101                                             147

  Calm_Human  ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~

  Calm_Drome  ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~

  Calm_Wheat  ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~

  Hsp2_Mouse  PHQGPGQDHE REEQGQGQGL SPERVEDYGR THRGHHHHRH RRCSRKR

Global alignments created by PileUp in Version 9.


  51 RHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVTMMTSK....... 93    

                                           |.  :

   1 ......................................MVRYRMRSPSEG 12

Global alignment created by Gap in Version 9.