You are viewing a Web site, archived on 13:36:15 Oct 20, 2004. It is now a Federal record managed by the National Archives and Records Administration.
External links, forms, and search boxes may not function within this collection.

Grid Computing

Forecast Systems Laboratory
Aviation Division
Advanced Computing Branch

1. What is Grid Computing?

Grid computing connects compute, data and software resources via high speed networks into a single computational entity.  The user logs on to the grid via standard authentication procedures.  Once on the grid, they can access all resources authorized by each site.  For example, at one site they might be able to view / access data but not save data to remote disk.  Grids can combine compute, data or software resources exclusively or in combination.  Consider two scenarios:
  • Data Grid
  • Compute Grid
  • In a compute grid scenario, the user signs on to the grid using standard authentication protocols and then has access to any resources they are authorized to use.  These resources might include supercomputing cycles at FSL, PMEL and NCEP, data resources / storage at NCAR, and high speed networks between the sites and the users desktop.  To run a weather model, the user submits a job to the grid.  Input data is located for the job and transfered to the system where the job will run to completion upon which data results will be automatically transfered to the user's desktop system for post processing / visualization.
    High speed networks are the glue that ties compute and data resources together.  These networks allow for the free exchange of data and compute resources across the country that is analogous to the free exchange of power between users and providers.  Compute resources from multiple sites are available, on demand, to users of the compute grid.  There are many examples of compute grids in operation today.  Some notable grids include:

    2.  Grid Applications at FSL

    A limiting factor in the performance of any grid application will always be the time required to transfer data from one machine to another.  Latency is the time required to move data from one machine to another and is limited by the speed of light.  Bandwidth is the amount of data that can be moved at any point in time.  In the last decade, tremendous progress has been made in increasing network bandwidth - for example, the TeraGrid is now boasting the ability to transfer up to 40 GBits/second between its primary supercomputing sites.  These improvements have driven the resurgance of interest in grid computing, but latency will always be the limiting factor due to the simple laws of nature.  For example, the time required to transfer messages between Boulder and Seattle (~1500 miles) will be no faster than the product of the speed of light times and the distance travelled.  To limit the effect of latency in parallel or grid applications, techniques such as data pre-fetching, message aggregation, and redundant computations are used.

    FSL is currently exploring the usefulness of grid in two areas.

    3.  Grid Software

    There is plenty of grid software being developed.  In the United States, Globus has emerged as the defacto standard to handle lower level grid operations including authentication, file transfer, resource allocation, and process management.  Globus allows users to develop applications that make calls to their API to perform these and many other operations.  Globus does not perform scheduling functions.  In other words, you cannot submit a job to the grid using globus and have it run on an undetermined system; you must choose the system.  Recognizing this limitation, Globus hooked up with the Condor software, a development that provides resource allocation and scheduling for sites for a decade or more.  Together, they released CondorG - a grid aware version of Condor.  This software, along with Globus, allows users to submit jobs to the Condor queue which can then schedule and run the task using globus (and the grid).  Globus and CondorG can be difficult to install (we haven't tried yet), but the National Middleware Initiative (NMI) has developed a package that bundles these packages for RedHat platforms.

    The European counterpart to Globus is called Unicore.  This package appears to be available with support by the Pallas and others.  In addition, packages have been released to provide Commodity Grid (CoG) Tool Kits to map grid functionality into language environments.  Several such CoG kits are available that interface to Globus including Java, Perl, CORBA, MatLab and Python.  We are using the Java Cog in the development of a WRF grid portal.

    4.  Other Grid Activities

    Prepared by Mark Govett, Mark.W.Govett@noaa.gov
    Date of last update:    September-2003