Should I use Blast or Fasta?
Blast
is a program developed at
NCBI which searches with a query sequence against a database. Blast
looks for regions of local similarity -- i.e. it searches for regions of the
same length between the query sequence and the database sequence. It does
not insert gaps to improve the quality of the match. Blast calculates
a 'quality-of-match' value using a scoring matrix, and the output will have
all matches which are above a cutoff value. Since Blast looks for regions of
similarity, it can find that two different regions of the same database
sequence are similar to a query sequence.
Fasta
uses the Pearson-Lippman algorithm, and
looks for "word" matches (an optimal word size is 2 for
proteins and 4-6 for nucleic acids) between the query and the database
sequence. It finds the region in the database sequence that has the most
word matches, and then looks for other nearby regions that match (i.e.
it inserts gaps into the sequences if that will improve the match). It
keeps track of the highest-matching sequences, and will report only the
top matching sequences.
Which is more appropriate for your purpose? Note the following:
- Blast is faster. A typical Blast search takes a few minutes, a
typical Fasta search is 10-20 mins. (However, with the parallelized
version of Fasta on helix, this may no longer be true)
- Blast is usually more sensitive than Fasta for detecting protein
sequence similarity, since it doesn't require a perfect match at
the first stage of the search. Blast can also filter out low-complexity
protein sequence regions which may result in non-specific matches.
- Blast has more search modes. For example, you can use tblastn
to search with a query nucleotide sequence against a protein database.
The query sequence will be translated into all 6 reading frames
before the search. To do the same thing with Fasta, you'd have to
translate the nucleotide sequence separately and run Fasta 6 times.
but
- Fasta will let you tailor your search more precisely. For
example, you could search just the plant sequences (using pl:*)
while Blast will require you to search the whole GenEMBL database.
Thus you may get a more meaningful result from Fasta.
- Blast has a long word size, which reduces its sensitivity.
- Fasta is good at detecting genomic DNA regions using a
cDNA query sequence because it allows a gap extension penalty of
zero. Blast will find only the longest exon or fail, since it
only measures ungapped alignments.
- Blast cannot search with very short sequences. While not the
best approach, Fasta does better.
Because of its speed and simplicity, a Blast search is probably
your best first bet. If one of the above points is important to
you, or you want more details, try Fasta.
Additional links:
More
on Blast and Fasta at U. Oxford.
Database
Similarity Searching using Blast and Fasta -- an article by Bruno Gaeta
with details about the algorithms and choices.
Interpreting
FastA output from the University of Minnesota.
The Blast
Help Manual from NCBI.
Searching databases
for sequences similar to a sequence of interest from NYU.