Applications on Biowulf
Sequence Analysis
BLAST
on Biowulf
BLAST is a
set of programs to find similarity between a query protein or DNA sequence
and a sequence
database. A scheme for efficiently running a large number of sequence files
against a variety of BLAST databases has been implemented
on Biowulf.
BLAT
BLAT is a DNA/Protein Sequence Analysis program written by Jim Kent
at UCSC. It is designed to quickly find sequences of 95% and greater
similarity of length 40 bases or more. It may miss more divergent or
shorter sequence alignments. It will find perfect sequence matches of
33 bases, and sometimes find them down to 22 bases. BLAT on proteins
finds sequences of 80% and greater similarity of length 20 amino acids
or more. In practice DNA BLAT works well on primates, and protein blat
on land vertebrates. See the documentation for
details on how to run Blat on Biowulf.
FASTA
The fasta program package contains many programs for searching
DNA and protein databases and one program (prss) for evaluating statistical
significance from randomly shuffled sequences.
HMMER
Profile
hidden Markov models (profile HMMs) can be used to do sensitive
database searching using statistical descriptions of a sequence
family's consensus.
HMMER uses
profile HMMs for several types of homology searches.
PFSearch
The PfSearch
program searches a protein or DNA sequence library for sequence
segments matching a profile. The result is an unsorted list of
profile-sequence
matches written
to the standard output. See the instructions for
details on how to run Pfsearch on Biowulf.
RepeatMasker
RepeatMasker is a program that screens DNA sequences for interspersed
repeats and low complexity DNA sequences. The output of the
program is a detailed
annotation of the repeats that are present in the query
sequence as well as a modified version of the query sequence in
which
all the annotated repeats
have been masked (default: replaced by Ns). On average,
almost 50% of a human genomic DNA sequence currently will be masked
by the program.
Jim Kent Library jksrc454.zip
A collection of executables from Jim Kent
have been compiled on Biobos. The programs perform a multitude of tasks
from simple number crunching to highly specific sequence analysis and
database construction. The executables are located in the directory
/usr/local/ucsc on biowulf.
Phylogenetic/Linkage Analysis
SimWalk
is a statistical genetics computer application for haplotype, parametric linkage, non-parametric linkage (NPL), identity by descent (IBD) and mistyping analyses on any size of pedigree. SimWalk2 uses Markov chain Monte Carlo (MCMC) and simulated annealing algorithms to perform these multipoint analyses.
Tree-Puzzle
TREE-PUZZLE is a computer
program to reconstruct phylogenetic trees from molecular
sequence data by maximum likelihood. It implements a fast
tree search algorithm, quartet puzzling,
that allows analysis of large data sets and automatically
assigns estimations of support to each internal branch.
FastDNAml
fastDNAml
computes the likelihood of various phylogenetic trees,
starting with aligned DNA sequences
from a number of species. It is derived from part of
the PHYLIP package.
Merlin MERLIN uses sparse trees to represent gene flow in pedigrees and is one of the fastest pedigree analysis packages around (Abecasis et al, 2002).
FASTLINK/FastSLINK
FASTLINK
is a modified and improved version of the original LINKAGE suite for
genetic linkage analysis. The additional LINKAGE utilities are also
installed. FastSLINK is a merger of code from FASTLINK v 2.x to the SLINK package, which simulates and analyzes replicates.
Computational Chemistry/Molecular
Modeling
AMBER
AMBER (Assisted Model
Building with Energy Refinement) is a package of
molecular simulation programs. Version 6 is currently installed
on Biowulf.
Major programs
in the AMBER package
include sander, gibbs, nmode, LEap.
CHARMM
on Biowulf
CHARMM (Chemistry at HARvard Molecular Mechanics)
is a program which supports a wide range of theoretical
modeling calculations
of the structure and dynamics of biological molecules.
In addition
to energy minimization
and molecular dynamics simulations, Monte
Carlo sampling, use of genetic algorithms, and several
interfaces
to quantum
codes (AM1, GAMESS) are available
or under development. Recent CHARMM versions
have been made available for use on Biowulf, as a
joint effort between
NHLBI/LBC
Computational Biophysics
Section and CBER/OVRR Biophysics Lab
and with the support of Biowulf Staff. Multiple executables
are available
for each version,
in order to support
larger molecular systems, and the different
types of parallel communications available on Biowulf, i.e. ethernet
and Myrinet 2000. The support files are also available
for the above versions, e.g. version .doc
files, and the standard topology and
parameter files.
CHARMM is a fairly
sophisticated and complicated
command line
based program; detailed CHARMM Documentation is available online.
GAMESS
GAMESS
is a program for ab initio quantum chemistry. Briefly,
GAMESS can compute wavefunctions ranging
from RHF, ROHF, UHF, GVB, and MCSCF, with CI and
MP2 energy corrections available for some of these.
Analytic
gradients
are available
for these SCF functions,
for automatic geometry optimization, transition
state searches, or reaction path following. Computation
of the energy hessian
permits prediction of vibrational
frequencies. A variety of molecular properties,
ranging from simple dipole moments to frequency dependent
hyperpolarizabilities may
be
computed. Many
basis sets are stored internally, and together
with effective core potentials, all elements up to Radon
may be included
in molecules.
Several graphics programs
are available for viewing of the final results.
Many of the
computational functions can be performed using
direct techniques, or in parallel
on appropriate
hardware.
GAUSSIAN
03
Gaussian 03 is a series
of electronic structure programs performing computations
starting from the basic laws of quantum mechanics.
Gaussian can predict
energies, molecular
structures, vibrational frequencies for systems
in the gas phase and in solution, and it can
model them
in both
their
ground state
and excited states.
GROMACS
is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. It is primarily designed for biochemical molecules like proteins and lipids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers.
NAMD
NAMD
is a parallel molecular dynamics program for
UNIX platforms designed for high-performance
simulations
in structural biology. It is developed by the
Theoretical Biophysics Group at the Beckman
Center, University
of
Illinois. NAMD is
particularly well suited
to Beowulf clusters, as it was specifically
designed to run efficiently on parallel machines.
PROSPECT PROSPECT is a threading-based protein structure prediction system. PROSPECT will find structural homologs of a target sequence, even when the structural homolog sequences have insignificant identity to the target sequence.
Q-Chem
is an ab initio electronic structure program capable of performing first principles calculations on both the ground and excited states of molecules.
Mathematical Analysis
/ Statistics
GAUSS
The
GAUSS Mathematical and Statistical System is a
fast matrix programming language designed for
computationally intensive tasks, which has
a wide variety of statistical, mathematical and
matrix
handling
routines.
R
R
(the R Project) is a language and environment
for statistical computing
and
graphics. R is similar
to S, and provides a wide variety of
statistical and
graphical techniques (linear and nonlinear
modelling, statistical tests,
time series
analysis,
classification, clustering, ...).
SAS
The SAS System is an integrated, hardware-independent system of applications
software for data access, management, statistical analysis and report writing.
The Base SAS windowing environment provides a full-screen facility for
interacting with all parts of a SAS program.
Matlab
Matlab integrates mathematical computing, visualization, and a powerful language to provide a flexible environment for technical computing.
Structural Biology
CNS
Crystallography
and NMR System (CNS) is a flexible multi-level package for macromolecular
structure determination.
XPLOR-NIH
Xplor-NIH
is a structure determination program which builds on the X-PLOR
program, including additional tools for NMR analysis. The advantage
of running
Xplor-NIH on Biowulf
would be to spawn a large number of independent refinement jobs
which would run on multiple Biowulf nodes. High-quality molecular
graphics
PovRay
POVRAY (Persistence
of Vision RAYtracer) is a high-quality tool for creating three-dimensional
graphics.
Raytraced images are publication-quality and 'photo-realistic',
but are computationally expensive so that large images can take many
hours
to create. PovRay images
can also require more memory than many desktop machines can handle.
To address these concerns, a parallelized version of PovRay has been
installed on the
Biowulf system.
General Purpose
Swarm
Swarm
is a program designed to simplify submitting a group of commands
to the cluster. Some programs do
not scale well and thus are not suited to true parallelizing.
Other programs may be such that each individual job is very short, but
many such jobs need
to be run. Such programs are well suited to running 'swarms
of
single-threaded jobs'. The Swarm program simplifies this process.
See the documentation for
details. Download swarm.
Utilities on Biowulf
|