hide
You are viewing a Web site, archived on 13:36:15 Oct 20, 2004. It is now a Federal record managed by the National Archives and Records Administration.
External links, forms, and search boxes may not function within this collection.
Grid Computing
Forecast
Systems
Laboratory
Aviation
Division
Advanced
Computing
Branch
1. What is Grid Computing?
Grid computing connects compute, data and software resources via high
speed
networks into a single computational entity. The user logs on to
the grid via standard authentication procedures. Once on the
grid,
they can access all resources authorized by each site. For
example,
at one site they might be able to view / access data but not save data
to remote disk. Grids can combine compute, data or software
resources exclusively or in combination. Consider two scenarios:
Data Grid
In a strictly data grid scenario, a user might wish to locate output
from a particular regional climate model run. Instead of
re-running
the climate run, the user can instead use grid tools to locate and
transfer
the data to their workstation for further analysis. Grid tools
allow
data to easily be moved between sites without being forced into site
specific
data / system access procedures.
Compute Grid
In a compute grid scenario, the user signs on to the grid
using
standard authentication protocols and then has access to any resources
they are authorized to use. These resources might include
supercomputing
cycles at FSL, PMEL and NCEP, data resources / storage at NCAR, and
high
speed networks between the sites and the users desktop. To run a
weather model, the user submits a job to the grid. Input data is
located for the job and transfered to the system where the job will run
to completion upon which data results will be automatically transfered
to the user's desktop system for post processing / visualization.
High speed networks are the glue that ties compute and data resources
together.
These networks allow for the free exchange of data and compute
resources
across the country that is analogous to the free exchange of power
between
users and providers. Compute resources from multiple sites are
available,
on demand, to users of the compute grid. There are many examples
of compute grids in operation today. Some notable grids include:
2. Grid Applications at FSL
A limiting factor in the performance of any grid application will
always
be the time required to transfer data from one machine to
another.
Latency is the time required to move data from one machine to another
and
is limited by the speed of light. Bandwidth is the amount of data
that can be moved at any point in time. In the last decade,
tremendous
progress has been made in increasing network bandwidth - for example,
the
TeraGrid is now boasting the ability to transfer up to 40 GBits/second
between its primary supercomputing sites. These improvements have
driven the resurgance of interest in grid computing, but latency will
always
be the limiting factor due to the simple laws of nature. For
example,
the time required to transfer messages between Boulder and Seattle
(~1500
miles) will be no faster than the product of the speed of light times
and
the distance travelled. To limit the effect of latency in
parallel
or grid applications, techniques such as data pre-fetching, message
aggregation,
and redundant computations are used.
FSL is currently exploring the usefulness of grid in two areas.
-
development of a grid-enabled coupled WRF/ROMS model that will run on
multiple machines of a prototype NOAA grid
- development of a WRF portal that
will
launch simultaneous model runs whose results will be used to evaluate
different
combinations of initial model and boundary conditions (RUC, Eta,
COAMPS,etc),
data assimilation, dynamics and physics packages.
3. Grid Software
There is plenty of grid software being developed. In the United
States,
Globus
has
emerged as the defacto standard to handle lower level grid operations
including
authentication, file transfer, resource allocation, and process
management.
Globus allows users to develop applications that make calls to their
API
to perform these and many other operations. Globus does not
perform
scheduling functions. In other words, you cannot submit a job to
the grid using globus and have it run on an undetermined system; you
must
choose the system. Recognizing this limitation, Globus hooked up
with the Condor
software,
a development that provides resource allocation and scheduling for
sites
for a decade or more. Together, they released CondorG
- a grid aware version of Condor. This software, along with
Globus,
allows users to submit jobs to the Condor queue which can then schedule
and run the task using globus (and the grid). Globus and CondorG
can be difficult to install (we haven't tried yet), but the National
Middleware Initiative (NMI) has developed a package that bundles
these
packages for RedHat platforms.
The European counterpart to Globus is called Unicore.
This package appears to be available with support by the Pallas and
others.
In addition, packages have been released to provide Commodity Grid
(CoG)
Tool Kits to map grid functionality into language environments.
Several
such CoG kits are
available
that interface to Globus including Java, Perl, CORBA, MatLab and
Python.
We are using the Java Cog in the development of a WRF
grid portal.
4. Other Grid Activities
-
we are collaboratiing with Argonne National Labs on the development of
espresso
-
we are using / tracking the development of the globus
middleware
Prepared by Mark Govett, Mark.W.Govett@noaa.gov
Date of last update: September-2003