Status
About
Hardware
Applications
Batch queues
Disk storage

MPI
Performance
New Users
User Guide
Documentation
Research
Photos


Applications on Biowulf

Sequence Analysis

BLAST on Biowulf
BLAST is a set of programs to find similarity between a query protein or DNA sequence and a sequence database. A scheme for efficiently running a large number of sequence files against a variety of BLAST databases has been implemented on Biowulf.

BLAT
BLAT is a DNA/Protein Sequence Analysis program written by Jim Kent at UCSC. It is designed to quickly find sequences of 95% and greater similarity of length 40 bases or more. It may miss more divergent or shorter sequence alignments. It will find perfect sequence matches of 33 bases, and sometimes find them down to 22 bases. BLAT on proteins finds sequences of 80% and greater similarity of length 20 amino acids or more. In practice DNA BLAT works well on primates, and protein blat on land vertebrates. See the documentation for details on how to run Blat on Biowulf.

FASTA
The fasta program package contains many programs for searching DNA and protein databases and one program (prss) for evaluating statistical significance from randomly shuffled sequences.

HMMER
Profile hidden Markov models (profile HMMs) can be used to do sensitive database searching using statistical descriptions of a sequence family's consensus. HMMER uses profile HMMs for several types of homology searches.

PFSearch
The PfSearch program searches a protein or DNA sequence library for sequence segments matching a profile. The result is an unsorted list of profile-sequence matches written to the standard output. See the instructions for details on how to run Pfsearch on Biowulf.

RepeatMasker
RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). On average, almost 50% of a human genomic DNA sequence currently will be masked by the program.

Jim Kent Library jksrc454.zip
A collection of executables from Jim Kent have been compiled on Biobos. The programs perform a multitude of tasks from simple number crunching to highly specific sequence analysis and database construction. The executables are located in the directory /usr/local/ucsc on biowulf.

Phylogenetic/Linkage Analysis

SimWalk
is a statistical genetics computer application for haplotype, parametric linkage, non-parametric linkage (NPL), identity by descent (IBD) and mistyping analyses on any size of pedigree. SimWalk2 uses Markov chain Monte Carlo (MCMC) and simulated annealing algorithms to perform these multipoint analyses.

Tree-Puzzle
TREE-PUZZLE is a computer program to reconstruct phylogenetic trees from molecular sequence data by maximum likelihood. It implements a fast tree search algorithm, quartet puzzling, that allows analysis of large data sets and automatically assigns estimations of support to each internal branch.

FastDNAml
fastDNAml computes the likelihood of various phylogenetic trees, starting with aligned DNA sequences from a number of species. It is derived from part of the PHYLIP package.

Merlin
MERLIN uses sparse trees to represent gene flow in pedigrees and is one of the fastest pedigree analysis packages around (Abecasis et al, 2002).

FASTLINK/FastSLINK
FASTLINK is a modified and improved version of the original LINKAGE suite for genetic linkage analysis. The additional LINKAGE utilities are also installed. FastSLINK is a merger of code from FASTLINK v 2.x to the SLINK package, which simulates and analyzes replicates.

Computational Chemistry/Molecular Modeling

AMBER
AMBER (Assisted Model Building with Energy Refinement) is a package of molecular simulation programs. Version 6 is currently installed on Biowulf. Major programs in the AMBER package include sander, gibbs, nmode, LEap.

CHARMM on Biowulf
CHARMM (Chemistry at HARvard Molecular Mechanics) is a program which supports a wide range of theoretical modeling calculations of the structure and dynamics of biological molecules. In addition to energy minimization and molecular dynamics simulations, Monte Carlo sampling, use of genetic algorithms, and several interfaces to quantum codes (AM1, GAMESS) are available or under development. Recent CHARMM versions have been made available for use on Biowulf, as a joint effort between NHLBI/LBC Computational Biophysics Section and CBER/OVRR Biophysics Lab and with the support of Biowulf Staff. Multiple executables are available for each version, in order to support larger molecular systems, and the different types of parallel communications available on Biowulf, i.e. ethernet and Myrinet 2000. The support files are also available for the above versions, e.g. version .doc files, and the standard topology and parameter files.

CHARMM is a fairly sophisticated and complicated command line based program; detailed CHARMM Documentation is available online.

GAMESS
GAMESS is a program for ab initio quantum chemistry. Briefly, GAMESS can compute wavefunctions ranging from RHF, ROHF, UHF, GVB, and MCSCF, with CI and MP2 energy corrections available for some of these. Analytic gradients are available for these SCF functions, for automatic geometry optimization, transition state searches, or reaction path following. Computation of the energy hessian permits prediction of vibrational frequencies. A variety of molecular properties, ranging from simple dipole moments to frequency dependent hyperpolarizabilities may be computed. Many basis sets are stored internally, and together with effective core potentials, all elements up to Radon may be included in molecules. Several graphics programs are available for viewing of the final results. Many of the computational functions can be performed using direct techniques, or in parallel on appropriate hardware.

GAUSSIAN 03
Gaussian 03 is a series of electronic structure programs performing computations starting from the basic laws of quantum mechanics. Gaussian can predict energies, molecular structures, vibrational frequencies for systems in the gas phase and in solution, and it can model them in both their ground state and excited states.

GROMACS
is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. It is primarily designed for biochemical molecules like proteins and lipids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers.

NAMD
NAMD is a parallel molecular dynamics program for UNIX platforms designed for high-performance simulations in structural biology. It is developed by the Theoretical Biophysics Group at the Beckman Center, University of Illinois. NAMD is particularly well suited to Beowulf clusters, as it was specifically designed to run efficiently on parallel machines.

PROSPECT
PROSPECT is a threading-based protein structure prediction system. PROSPECT will find structural homologs of a target sequence, even when the structural homolog sequences have insignificant identity to the target sequence.

Q-Chem
is an ab initio electronic structure program capable of performing first principles calculations on both the ground and excited states of molecules.

Mathematical Analysis / Statistics

GAUSS
The GAUSS Mathematical and Statistical System is a fast matrix programming language designed for computationally intensive tasks, which has a wide variety of statistical, mathematical and matrix handling routines.

R
R (the R Project) is a language and environment for statistical computing and graphics. R is similar to S, and provides a wide variety of statistical and graphical techniques (linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, ...).

SAS
The SAS System is an integrated, hardware-independent system of applications software for data access, management, statistical analysis and report writing. The Base SAS windowing environment provides a full-screen facility for interacting with all parts of a SAS program.

Matlab
Matlab integrates mathematical computing, visualization, and a powerful language to provide a flexible environment for technical computing.

Structural Biology

CNS
Crystallography and NMR System (CNS) is a flexible multi-level package for macromolecular structure determination.

XPLOR-NIH
Xplor-NIH is a structure determination program which builds on the X-PLOR program, including additional tools for NMR analysis. The advantage of running Xplor-NIH on Biowulf would be to spawn a large number of independent refinement jobs which would run on multiple Biowulf nodes. High-quality molecular graphics

PovRay
POVRAY (Persistence of Vision RAYtracer) is a high-quality tool for creating three-dimensional graphics. Raytraced images are publication-quality and 'photo-realistic', but are computationally expensive so that large images can take many hours to create. PovRay images can also require more memory than many desktop machines can handle. To address these concerns, a parallelized version of PovRay has been installed on the Biowulf system.

General Purpose

Swarm
Swarm is a program designed to simplify submitting a group of commands to the cluster. Some programs do not scale well and thus are not suited to true parallelizing. Other programs may be such that each individual job is very short, but many such jobs need to be run. Such programs are well suited to running 'swarms of single-threaded jobs'. The Swarm program simplifies this process. See the documentation for details. Download swarm.

Utilities on Biowulf


This document is available as http://biowulf.nih.gov/apps/index.html
Biowulf home page | Helix Systems | NIH