Search Results

Should I use Blast or Fasta?

Blast is a program developed at NCBI which searches with a query sequence against a database. Blast looks for regions of local similarity -- i.e. it searches for regions of the same length between the query sequence and the database sequence. It does not insert gaps to improve the quality of the match. Blast calculates a 'quality-of-match' value using a scoring matrix, and the output will have all matches which are above a cutoff value. Since Blast looks for regions of similarity, it can find that two different regions of the same database sequence are similar to a query sequence.

Fasta uses the Pearson-Lippman algorithm, and looks for "word" matches (an optimal word size is 2 for proteins and 4-6 for nucleic acids) between the query and the database sequence. It finds the region in the database sequence that has the most word matches, and then looks for other nearby regions that match (i.e. it inserts gaps into the sequences if that will improve the match). It keeps track of the highest-matching sequences, and will report only the top matching sequences.

Which is more appropriate for your purpose? Note the following:

Blast is faster. A typical Blast search takes a few minutes, a typical Fasta search is 10-20 mins. (However, with the parallelized version of Fasta on helix, this may no longer be true)
Blast is usually more sensitive than Fasta for detecting protein sequence similarity, since it doesn't require a perfect match at the first stage of the search. Blast can also filter out low-complexity protein sequence regions which may result in non-specific matches.
Blast has more search modes. For example, you can use tblastn to search with a query nucleotide sequence against a protein database. The query sequence will be translated into all 6 reading frames before the search. To do the same thing with Fasta, you'd have to translate the nucleotide sequence separately and run Fasta 6 times.
but
Fasta will let you tailor your search more precisely. For example, you could search just the plant sequences (using pl:*) while Blast will require you to search the whole GenEMBL database. Thus you may get a more meaningful result from Fasta.
Blast has a long word size, which reduces its sensitivity.
Fasta is good at detecting genomic DNA regions using a cDNA query sequence because it allows a gap extension penalty of zero. Blast will find only the longest exon or fail, since it only measures ungapped alignments.
Blast cannot search with very short sequences. While not the best approach, Fasta does better.

Because of its speed and simplicity, a Blast search is probably your best first bet. If one of the above points is important to you, or you want more details, try Fasta.

Additional links: