ESDCD
News Home > System Metrics at the NCCS
NCCS Highlights
System Metrics at the NCCS
A lot goes on behind the scenes at the NASA Center
for Computational Sciences (NCCS) that most users
never see but that is important to the effective
management of the Center. Information is gathered
and analyzed each month on the usage of the computing
and storage facilities. This information allows
NASA management to track trends in usage, predict
the impacts of system changes, and plan for High
End Computing (HEC) and storage needs for the
Earth and space science community.
Figure 1 shows the increase in usage of the largest
NCCS HP/Compaq system over the past year and a
half.
Figure 1. NCCS Average Daily Workload
vs. Average CPU-Days, October 2002-April 2004
The maximum
number of processors available for batch processing
is 1,292, so the system utilization consistently
exceeded 90% from November 2003 through February
2004. Large parallel systems like this one are
commonly considered to be saturated when they
reach 85-90% utilization. Because the system is
composed of processors with two different speeds,
the weighted average can exceed 1,292 CPU-Days.
The information gained from tracking the metrics
at the level shown above is important for tracking
the overall utilization of the various systems
at the NCCS and for long-term capacity planning,
but system metrics are also tracked at a much finer
level of granularity to determine the usage of
major organizations over time. The usage of the
same system by organization for March 2004 is shown
in Figure 2.
Figure 2. March 2004 Unweighted CPU-Day Utilization
at the NCCS
Many other breakdowns of the data are
examined for a more complete perspective of
how the systems are being used by hour of the day,
day of the week, job size, workload, queues,
and groups. This allows NASA management to identify
trends or deficiencies in system utilization
and take action based upon quantitative data.
The NCCS has changed batch queue structures
in response to trends identified by this analysis
and provided feedback to major user groups
to improve job scheduling and make better use of
the resources. The information is also presented
to NASA management and the NCCS Customer Board
as part of the decision making process on requests
for resource allocation. For example, as the
full HP/Compaq system was being brought into production,
the NASA Seasonal-to-Interannual Prediction Project
had almost exclusive use of a large part of the
system. As usage grew on the system, other groups’ workloads
were shifted through queues to better distribute
the load. More recently the Goddard Institute for
Space Studies (GISS) needed additional capacity
to meet deadlines for the Intergovernmental Panel
on Climate Change. To help GISS meet its requirements,
queue limits were adjusted to allow the Institute
to use more resources on the system; its usage
has more than doubled over the past three months.
The NCCS maintains
a database of detailed system accounting information
that allows reports to be generated down to the
level of individual user batch jobs. Although
this level of detail is rarely needed, when the
NCCS rebuilt and reconfigured the systems this
summer, the usage data by individual user was one
of the criteria that determined the scheduled return
to service of the user community. Similarly,
detailed usage information also impacts
management decisions during the Fiscal Year Initiation
process. Questions about system accounting
data should be directed to NCCS User Services
at support@nccs.nasa.gov.
| Summer
2004 ESDCD News Home | Next
Article |
|