FSL in Review 2002 - 2003

Cover/Title Page


Organizational Chart


Office of the Director


Office of Administration
and Research


Information and
Technology Services


Forecast Research
Division


Demonstration Division


Systems Development
Division


Aviation Division


Modernization Division


International Division


Publications/
Subscriptions


Acronyms and Terms


Figures Listing



Contact the Editor
Nita Fullerton


Web Design:
Will von Dauster
John Osborn


Best Viewed With
Internet Explorer

FIR 2002 - 2003 ITS MastHead

Peter A. Mandics, Chief Information Officer
(303-497-6854)

Web Homepage: http://www-fd.fsl.noaa.gov/

Mark D. Andersen, Senior Database Analyst, 303-497-6518
Jonathon B. Auerbach, Computer Operator, 303-497-3760
Joan M. Brundage, Deputy Chief Info. Officer, 303-497-6895
Joseph R. Carlson, Professional Research Asst., 303-497-6794
Lee M. Cohen, Professional Research Asst., 303-497-6052
Michael A. Doney, FSL Network Manager, 303-497-6364
Steve J. Ennis, Network Engineer, 303-497-6372
Leslie A. Ewy, Systems Analyst, 303-497-6018
Joaquin Felix, Systems Administrator, 303-497-5267
Paul Hamer, Systems Analyst, 303-497-6342
Huming Han, Computer Operator, 303-497-6862
Chris Harrop, Associate Scientist , 303-497-6808
Leslie B. Hart, Jet Management Team Lead, 303-497-7253
Yeng Her, Computer Operator, 303-497-7339
Patrick D. Hildreth, Computer Operator, 303-497-7359
Forrest Hobbs, HPTi Program Manager, 303-497-3821
Keith G. Holub, Systems Administrator, 303-497-6774
Ara T. Howard, Professional Research Asst., 303-497-7238
Paul Hyder, Professional Research Asst., 303-497-6656
Peter Lannigan, Systems Administrator, 303-497-4639
Robert C. Lipschutz, Production Control Mgr., 303-497-6636
Chris MacDermaid, Data Systems Group Lead, 303-497-6987
Debra J. Martinez, Secretary OA, 303-497-6109
Chuck Morrison, Systems Engineer, 303-497-6486
Ed Moxley, Systems Administrator, 303-497-6844
Scott T. Nahman, Logistics Mgt. Specialist, 303-497-5349
Glen F. Pankow, Systems Analyst, 303-497-7028
John V. Parker, FSL IT Security Officer, 303-497-5124
Gregory M. Phillips, Lead Systems Admin., 303-497-7685
Peter Rahm-Coffey, Computer Operator, 303-497-7341
Richard Ryan, Systems Analyst, 303-497-6991
Robert Sears, Network Engineer, 303-497-4226
Amenda B. Stanley, Systems Analyst, 303-497-6964
Sarah E. Thompson, Systems Administrator, 303-497-6024
Dr. Craig C. Tierney, Systems Engineer, 303-497-3112
Cristel Van Leer, Computer Operator, 303-497-7537

(The above roster, current when document is published, includes
government, cooperative agreement, and commercial affiliate staff.)

Address: NOAA Forecast Systems Laboratory – Mail Code: FST
David Skaggs Research Center
325 Broadway
Boulder, Colorado 80305-3328


Objectives

The Information and Technology Services (ITS) manages the computers, communications and data networks, and associated peripherals that FSL staff use to accomplish their research and systems-development mission. The FSL Central Facility comprises over 60 Sun Microsystems, Inc., Silicon Graphics, Inc. (SGI), Hewlett-Packard (HP), and Dell computers ranging from workstations and servers to a High Performance Technologies, Inc. (HPTi) supercomputer. The facility also contains a variety of meteorological data-ingest interfaces, storage devices, including the FSL Mass Store System (MSS), local- and wide-area networks, communications links to external networks, and display devices. Over 500 Internet Protocol (IP)-capable hosts and network devices are connected to the FSL network. They include Unix hosts, PCs and Macintoshes, and network routers, hubs, and switches. These hardware and associated software enable FSL staff to design, develop, test, evaluate, and transfer to operations the advanced weather information systems and new forecasting techniques.

The group designs, develops, upgrades, administers, operates, and maintains the FSL Central Computer Facility. For the past 22 years, the facility has undergone continual enhancements and upgrades in response to changing and expanding FSL project requirements and new advances in computer and communications technology. In addition, ITS lends technical support and expertise to other federal agencies and research laboratories in meteorological data acquisition, processing, storage, distribution, and telecommunications.

The Central Facility acquires and stores a large variety of conventional (operational) and advanced (experimental) meteorological observations in real time. The ingested data encompass almost all available meteorological observations in the Front Range of Colorado and much of the available data in the entire United States. Data are also received from Canada, Mexico, and some observations from around the world. The richness of this meteorological database is illustrated by such diverse datasets as advanced automated aircraft, wind and temperature profiler, satellite imagery and soundings, Global Positioning System (GPS) moisture, Doppler radar measurements, and hourly surface observations. The Central Facility computer systems are used to analyze and process these data into meteorological products in real time, store the results, and make the data and products available to researchers, systems developers, and forecasters. The resultant meteorological products cover a broad range of complexity, from simple plots of surface observations to meteorological analyses and model prognoses generated by sophisticated mesoscale computer models.

Accomplishments

Central Computer Facility

FSL High-Performance Computer System – During 2002, a 48-processor testbed system based upon the Intel Pentium IV-Xeon chip was deployed to evaluate the software and hardware issues associated with an Intel-based cluster for the final upgrade of the High-Performance Computing System (HPCS). The disk subsystem was upgraded to provide an additional 5.76 Terabytes of usable space. The Portable Batch System (PBS) was replaced with Sun's Grid Engine (SGE) to improve batch services on the HPCS. This change was necessary since FSL's job load had increased to thousands of jobs per day, and the PBS was not designed to handle such a heavy load. The computational core of the HPCS underwent a three-phased upgrade last year. Phase I involved the delivery of two 64-node clusters of dual-processor Pentium IV-Xeon systems (a total of 256 CPUs). During Phase II, the 256 single-processor Compaq Alpha systems (from the initial delivery) were decommissioned. Phase III involved the delivery of three more Intel-based clusters, one 128-node cluster, and two 256-node clusters, raising the HPCS to 280 Alpha CPUs and 1,536 Pentium IV-Xeon CPUs (Figure 8). Steps were taken to ensure the reliability of the HPCS, including the development of a sophisticated monitoring system with a screen that shows the status of the HPCS compute and file systems, chron servers, front-ends, and user activity (Figure 9).

Figure 8 - New Equipment - HPCS

Figure 8. An example of newly installed equipment (280 Alpha CPUs and 1536 Pentium IV-Xeon CPUs) in the Phase III upgrade of FSL's High-Performance Computing System.

Figure 9 -

Figure 9. Monitoring screen showing the status of HPCS systems activity.

ITS continued its support of at least 40 projects on FSL's supercomputer Jet. The HPCS provides computational capability for numerous modeling efforts related to the atmosphere, ocean, climate, and air quality, which are carried out by FSL and non-FSL researchers. For example, several Joint Institutes, OAR (NOAA's Office of Atmospheric Research) laboratories including the Environmental Technology Laboratory (ETL), Aeronomy Laboratory (AL), National Severe Storms Laboratory (NSSL), and the NWS National Centers for Environmental Prediction (NCEP) all take advantage of the HPCS.

FSL Mass Store System (MSS) – Major upgrades were made to the Mass Store System to correct reliability and performance problems. This was necessary primarily because the large database maintained by Advanced Digital Information Corporations's (ADIC) FileServ Hierarchical Storage Management (HSM) software had compromised performance, and Sony's Advanced Intelligent AIT-2 drives and cassettes had become unreliable. First, steps were taken to stabilize the MSS by upgrading the ADIC FileServ/VolServ Hierarchical File System (HFS) software and server operating system. A major upgrade was implemented later that included installation of an additional, completely new HFS, which logically split the ADIC AML/J automated storage library robot into two virtual robots. The original FileServ/VolServ-based system continues to function in a read-only mode with 1,232 Sony AIT-2 tape slots and 4 AIT-3 tape drives. The new HFS, based on a Sun SunFire 480 server running ADIC's StoreNext software, features l,040 Linear Tape-Open (LTO) tape slots and 8 IBM LTO tape drives. Two 600-Gigabyte managed file systems (caches) were also provided, one dedicated to real-time data ingested by the Central Facility and the other available for user data. The new HFS has significantly increased speed and reliability. Major enhancements were also made to the FSL-developed tools for accessing the MSS.

Central Facility Systems Enhancements and Cost Savings – A major ongoing project in ITS involves defining ways to cut costs in the FSL Central Facility. Toward this end, ITS system administrators have decommissioned several older systems with high maintenance costs after moving to newer, less expensive systems. Central administration processes are being implemented for most Unix systems to cut system management costs. The printing systems have been reconfigured to increase reliability and offer better service to users. These activities allow system adminstrators more time to address other important issues.

System administrators became familiar with Sun's Solaris 9 operating system (OS) before moving systems to the newer OS. A used testbed system was procured and configured, and standard Solaris 9 installation procedures were defined and implemented. With the exception of systems running software that requires Solaris 8, new (replacement) Sun systems and rebuilds of current Sun systems have been placed on the more secure Solaris 9 platform, increasing security and decreasing system administrator time.

Another effective cost-cutting measure included developing more efficient use of existing resources. FSL's central data repository employing a Network Appliance, Inc. filer (NFS server) is a good example. This filer had become excessively overloaded, and often failed to respond to real-time data-access needs. An intensive mitigation project was implemented to reduce unnecessary load on this costly resource, avoiding (or at least postponing) the need to procure a new system.

FSL system administrators have been applying an unending stream of security related patches and upgrades. It is a major task to keep multiple versions of six different operating systems (Sun, Solaris, Linux, SGI IRIX, Microsoft Windows, etc.) patched and up to date.

The FSL mail lists were converted to NOAA Enterprise Messaging System (NEMS) groups. The names and descriptions of these groups are now visible in the NEMS directory, and conform to the NOAA enterprise mail strategy. Also, most of the laboratory was transferred to the main FSL mail server, eliminating miscellaneous mail servers and improving mail-handling reliability.

FSL PC Administration – The FSL Windows 2000 network was stabilized. Server logs containing errors and configuration problems related to Domain Name System (DNS) issues were corrected and updated. Prior to these upgrades, users were experiencing logon failures and connectivity outages. FSL's domain servers were rebuilt and patched with all known fixes and service packs, and are now running smoothly.

Network maintenance on the server level also included an upgrade to the antivirus software and a full rollout of the updated software to all PCs on the FSL network running the Windows operating system.

An additional 25 machines from the FSL International Division were transferred to the FSL PC Administrator. Network management software suites were evaluated to help manage the increasing number of PCs. The IBM Tivoli suite was chosen for its ability to control, update, and administer windows computers remotely.

PC security and systems patching remained a high priority throughout the year. Systems were kept up to date using Microsoft's Windows Update Utility. Also, the Microsoft Security Baseline Advisor was used to constantly monitor for security holes on all Windows networked machines.

The PC administrators' day-to-day tasks included support for various problems involving hardware and software, failed logons, password changing, disk problems, printing errors, drive failures, RAM issues, program errors, security updates, E-mail, OS reloads, backup configurations, dial-up accounts, data recovery, and network connectivity.

Systems Support and Computer Operations – The Systems Support Group (SSG) maintains a log (utilizing the FSLHelp System) that provides effective communication among the SSG staff, ITS Data Systems Group (DSG), system and network administrators, and other essential staff. The SSG log provides a higher level of service to FSL users in dealing with the numerous, varied issues responded to on a daily basis. This log also offers, among other things, a means for recording the history of events and tracking the procedures used to correct problems. During the year, about 2,170 log tickets were initiated and resolved. In addition, approximately 154 customer FSLHelp requests were processed for data compilations, file restoration, account management, video conferencing, and other requests requiring operator assistance.

The Web database used to document the procedures for maintaining the Central Facility has grown to 131 documents. New procedures and updated information require continual refinements, corrections, and updates to the documents. Good documentation, in turn, provides operators the means to troubleshoot and resolve issues involving real-time data, Central Facility equipment, and customer queries. The improved efficiency and consistency resulted in shorter downtimes and faster response to users.

SGG staff renewed efforts to provide assistance to system administrators, when feasible, in user account maintenance (such as adding/removing accounts) and other special projects on an as-needed basis.

The SSG weekly schedule was adjusted so that the lead operator could be more available during busier days. Also, overlap days, when three operators were on duty at once, were more spread out. This allows more time for special projects, facilitates flexibility in group training, and helps reduce overtime when operators take leave.

To accommodate 24-hour/7-day onsite support and augment staffing during emergencies, an emergency operator coverage plan was implemented which outlines the course of action to be taken when emergency coverage is required. Also, because of staff departures, and to ensure shift coverage, two full-time operators were hired and trained.

The SSG oversaw and monitored the daily laboratorywide computer system backups, with ~300 GB of information written each night for ~260 FSL client systems. Quarterly offsite backups were successfully completed on time. The tape rotation for quarterly offsite backups was increased to provide individual machine backups for up to one year.

In coordination with the Data Systems Group, numerous new products and critical systems (such as Fire Weather data servers, Temperature and Air Quality; TAQ systems, and RUC/RSAS (Rapid Update Cycle and the RUC Surface Assimilation System) backup were added to the Facility Information and Control System (FICS). To support these additions, several critical support documents and SSG Help documentation were updated so that the basic functions of the SSG (monitoring, troubleshooting, and discussing real-time data issues) are properly maintained.

A renewed emphasis placed on proper procedures for notifying data end-users (customers) resulted in updated documentation and other assistance tools (e.g., flow diagrams to ensure consistency within the SSG in this important area of customer service. The FSL Central Facility Data Availability Status Webpage was updated, and so was the tool that creates updates to this important customer information source.

A new feature was added to FICS that monitors product delivery to the NWS Telecommunications Gateway servers, in support of continued FSL backup of RUC/RSAS products for NWS/NCEP. The SSG online documentation was updated, and other assistance materials and tools were developed and implemented. These improvements ensured that SSG is more proactive and responsive in monitoring and communicating about FSL RUC/RSAS production and delivery to NCEP.

To keep well informed of computer security issues and maintain compliance with DOC, NOAA, and OAR security guidance, SSG staff took the NOAA IT online Security Awareness training, and also completed the online, in-depth SANS (SysAdmin, Audit, Network, Security) Institute Security training course. All SSG staff received ongoing, in-depth training on the main computer room VESDA Smoke Detection System and FM-200 Fire Suppression System.

Facility Infrastructure Upgrades – FSL underwent two substantial infrastructure upgrades to address the power, cooling, and space requirements of the final upgrade to the High-Performance Computing System. Every effort was made to implement the infrastructure upgrades with minimal downtime to existing equipment and FSL users.

The first infrastructure upgrade involved the expansion of the Central Facility Annex. An office and a storage room were relocated to add space for the computer room next door. The walls surrounding the new computer room required extensive sound mitigation work to meet the American Society of Heating, Refrigeration, and Air-Conditioning Engineers (ASHRAE) noise protection criteria for private offices. Surrounding walls were extended deck to deck, and an Uninterrruptible Power Supply (UPS, Figure 10) was installed to provide short-term backup power. A ramp was installed to raise the floor to 12 inches in support of a new dedicated CRAC (Computer Room Air Conditioner) unit.

Figure 10 - UPS and GOES Ground-Station

Figure 10. Uninterruptable Power Supply (cabinets to the left) and the
GOES ground-station rack (tall, black unit toward the back) in the new
Central Facility Annex.

To create space for the final upgrade, older racks and equipment were moved from the main computer room to the new annex. The finished Central Facility Annex (Figure 11) was then fully certified in accordance with National Fire Protection Association standards.

Figure 11a - Servers, Network HW, CRAC in CF - a

Figure 11b - Servers, Network HW, CRAC in CF - b

Figure 11c - Servers, Network HW, CRAC in CF-c

Figure 11. (top) Forecast Research Division compute servers, (middle) Network equipment row, (bottom) one of four CRAC (Computer Room Air Conditioner) units in the Central Facility.

The second infrastructure upgrade brought the Central Computer Facility up to original specifications by increasing the cooling capacity to 120 tons and emergency UPS electric power to 300 kVA. Four 15-ton CRAC units were replaced with four 30-ton units. Chilled water piping modifications and leak detection upgrades were required as well as floor tile cutouts and stronger underfloor supports. Additional power distribution panels and larger power transformers were also installed to support the increased electrical requirements. The Emergency Power Off (EPO) bypass capability was separated from the FM-200 Fire Suppression bypass switch in order to perform functional FM-200 Fire Suppression testing and maintenance without powering down the entire computer room. Finally, 28 legacy HPCS computer racks were removed and 48 new HPCS final upgrade racks were installed. The implemented specifications for the main computer room and the annex are shown in Table 1.

Table 1.
Specifications for Upgraded FSL Central Computer Facility and Annex


Main Computer Room
Dimensions: 3,600 square feet; 12-inch Raised Floor;
8-foot, 6-inch Ceiling Height
Capacity: 142 Racks
Access: Restricted
Power: Utility with Emergency Generator Backup
Transient Voltage Surge Suppressor (TVSS) Protected,
Emergency Power Off (EPO) Switch Protected
UPS: 300 kVA (Useable), 8-minute Runtime (Full Load),
and Semiannual Preventive Maintenance
Cooling Capacity: 90-tons Downdraft (De-rated for altitude)
Fire Protection: FM-200 Fire Suppression System (Tied to EPO),
VESDA Fire Detection System,
Semiannual Preventive Maintenance,
Semiannual Operational Training,
Cerberus Smoke Detection System (GSA Notification),
Sprinkler System (155oF trigger point),
CO2 Portable Fire Extinguishers (Class B & C Fires)
Cleaning: Semiannual Professional (Above and Below Floor)

Annex Computer Room
Dimensions: 1,100 square feet; 12-inch Raised Floor,
8-foot; 6-inch Ceiling Height
Capacity: 46 Racks
Access: Restricted
Power: Utility with Emergency Generator Backup,
Transient Voltage Surge Suppressor (TVSS) Protected,
Emergency Power Off (EPO) Switch Protected
UPS: 90 kVA (Useable), 8-minute Runtime (Full Load),
and Semiannual Preventive Maintenance
Cooling Capacity: 23-ton Downdraft (De-rated for altitude)
5-ton Fan Coil (In Standby until Temp > 78oF)
Fire Protection: Cerberus Smoke Detection System (Tied to EPO),
Sprinkler System (155oF trigger point),
CO2 Portable Fire Extinguisher (Class B & C Fires)
Cleaning: Semiannual Professional (Above and Below Floor)


FSL Network

Key enhancements were made to the FSL local area network (LAN) in 2002, with the integration of Gigabit Ethernet (GigE) and Asynchronous Transfer Mode (ATM) technologies and implementation of hardware-based routing. Two Cisco 6509 switch/routers were installed to make it possible to combine GigE and ATM, thus maintaining the fully meshed high-speed (622-Mbps) connections to core network devices and servers. The core network was also augmented with 1,000-Mbps GigE connections, and GigE links were provided to the FSL HPCS and new high-end servers. The Cisco 6509s increased switching performance in the two computer rooms from 20 – 32 Gbps. Other legacy switching devices (in the wiring closets) serving user workstations were replaced with 20-Gbps switches.

The ability of two Cisco 6509s to perform hardware-based routing represents a substantial improvement over the previous configuration of 5 Marconi PowerHub software-based routers for the 35 active networks at FSL. Figure 12 shows the upgraded network configuration.

Figure 12 - FSL Network Configuration

Figure 12. Diagram of the upgraded FSL Network as of September 2002.

The management of redundant path routing was also improved with implementation of the Virtual Router Redundancy Protocol (VRRP). Cisco's version of VRRP, the Hot Standby Router Protocol (HSRP) now provides one virtual default router address for each network, with automatic failover to the secondary router when needed. Redundant routes were previously managed at both the network and host level, which yielded an unnecessary burden on servers and workstations. The advent of HSRP at FSL has offloaded this burden from FSL hosts, freeing up valuable memory and CPU cycles, and placing the responsibility for network redundancy back in the network. The addition of one other protocol was also important for accomplishing the integrated ATM/GigE network at FSL. The Spanning Tree Protocol (STP) was enabled to help mitigate loops in the network. Because of the dual nature of our ATM/GigE network, there are loops present by design, and without STP, loops can quickly render Ethernet networks unusable.

During 2002, network services were provided for 207 FSL staff. The network utilized 532 total links, comprising 482 user (workstation and server) links and 51 network device links. The number of user links increased by 50 over the last year. The number of network device links decreased by 30 because routing services were consolidated and 14 small network switches were replaced with four better performing devices with a higher capabity for user ports. Port capabity available for network growth reached 18%, and 146 free ports were distributed across FSL computer rooms and wiring closets. All network routers and switches were running at an average CPU utilization of 13%, with the highest at 26% on the primary switch between FSL and the NOAA Boulder Network. A substantial improvement in routing efficiency was realized over last year. The PowerHub routers were exhibiting 100% utilization at times, resulting in poor routing performance until they were replaced with the Cisco 6509 routers, which now average just 1% utilization.

Link utilization in the core of the FSL network averaged 9.1% (57 Mbps on the 622-Mbps ATM segments), and 4.8% (48 Mbps) on the 1000-Mbps Gigabit Ethernet segment. In combination with all other NOAA Boulder network traffic, Wide-Area Network (WAN) utilization to commodity Internet and Abilene (Internet2) via the Front Range GigaPOP (FRGP) averaged 7%, with a maximum of 47% of the 155 Mbps available. WAN traffic over the secondary commodity Internet link provided by MCI/UUnet averaged 45%, with a maximum of 100% of the 12 Mbps available. WAN traffic via the FRGP stayed about the same as in 2002, while traffic via the MCI link increased by 11%, primarily for outbound traffic. FSL comprised 63% of the total NOAA Boulder WAN traffic, with the next nearest laboratory, the Climate Diagnostics Center (CDC) at 17%. While these figures are similar to 2002, the most recent month's statistics showed FSL at 84% of the total NOAA Boulder WAN traffic. The top protocols were once again FTP (43%), LDM (18%), and HTTP (16%).

As mentioned earlier, the computer room annex was converted into a fully operational computer room space housing network equipment and servers for six scientific divisions. In support of this task, FSL Networking staff installed all network cabling and patch systems. This included nearly a mile and a half of Ethernet, fiber optic, and console cables, underfloor power whips, and an ATM/Ethernet switch to connect 38 computer racks – all installed within one week. This computer room design and installation, and assistance provided for the relocation of servers, ensured minimal downtime for FSL users.

Enhanced network monitoring was implemented on all major FSL network devices and links. Webpage graphic displays of CPU and network link loads were implemented using public domain software Multi Router Traffic Grapher (MRTG). The resultant statistics were, and continue to be, valuable for resource management of network devices, and also improved the resolution of network problems. Web links to MRTG plots were made available for all direct-connected ATM hosts, primary FSL servers, and workstations upon request, allowing users to view network activity on the servers for which they are responsible. Access to the Web information is limited to FSL only.

Information Technology (IT) Security

Information Technology (IT) security is an integral part of the FSL design for all new network growth. Establishing a GigE backbone and GigE-capable routers was critical to the FSL IT security architecture. Redundant GigE network links from FSL to the NOAA Boulder network is the basis for a high-speed security perimeter. This security architecture includes a zone for public access servers that was logically implemented with the use of a dedicated subnet. This zone allows Network and Security staff to begin monitoring traffic flow into and out of the zone to appropriately set access policies for differentiated types of data and server access. Remote access (Dial-in and Virtual Private Networking (VPN)) to FSL resources was upgraded to utilize encrypted authentication via the Remote authentication Dial-In User Service (RADIUS). To assist with testing FSL IT security policies and methods, an AT&T; small business Digital Subscriber Line (DSL) was installed at FSL. This DSL link is dedicated and physically separate from all FSL networks and address spaces. Network staff, System Administration staff, and users are able to utilize the DSL network from the Internet to view how FSL "looks" from the outside. This security tool is very effective for testing access control lists, verifying access to internal Webpages, and confirming patches on the host-based systems in FSL.

The FSL IT Security Officer (ITSO) developed and presented an IT security strategy to FSL managers, system administrators, FSL Technical Steering Committee, FSL IT Architecture Group, and FSL users. The security plan was approved and funded. In coordination with FSL network management and system administrators, the ITSO evaluated three firewall appliances in-house, and will recommend the one best suited to FSL's needs. Testing and implementation of the firewall and the associated Intrusion Detection System (IDS) and centralized logging will depend on completion and stabilization of the FSL and NOAA Boulder network backbone upgrades. Commercial and open-source vulnerability tools were evaluated, and the open-source tool Nessus was selected and implemented within FSL. Regular audits of FSL hosts were performed as required. A patch server was acquired and tested that mirrors local, secure copies of the latest vendor patches for all applicable FSL systems and applications. Centralized log servers were installed for secure logging of Unix host event entries; the old log server systems will be moved to the new infrastructure. System administrators and users were supported in several security responses, and appropriate input was submitted to the NOAA Computer Incident Response Team (N-CIRT). Newsgroups, mailing lists, and security sites are monitored for vulnerability alerts, potential threats are analyzed, and FSL security contacts are notified when applicable. Approximately 125 e-mail alerts were issued. The ITSO collaborated with N-CIRT personnel in Washington, D.C., to present their 16-hour "Essential Security Measures" training classes in Boulder (for the first time). This training was offered to all NOAA-Boulder and Western Region staff after all training requirements had been met.

Data Acquisition, Processing, and Distribution

The ITS Data Systems Group continued to design and develop real-time meteorological data acquisition and processing systems required by laboratory projects and data users. Multiple computers operate in a distributed, event-driven environment known as the Object Data System (ODS) to acquire, process, store, and distribute conventional and advanced meteorological data. These data services (Figure 13) are provided to scientists and developers who use them in various modeling, application, and meteorological analysis/forecast workstation research and development activities. Users accessed raw, translated, and processed data according to their needs.

Figure 13 - FSL Data Services

Figure 13. Data services currently provided by FSL.

Data Acquisition and Distribution – Data received from operational and experimental sources included:

  • National Weather Service -
    • NCEP, including the Aviation Weather Center's (AWC) use of the Distributed Brokered Networking (DBNet) software
    • WSR-88D narrowband and wideband Doppler radar data
  • Aeronautical Radio Inc. (ARINC)
  • Weather Services International Corporation (WSI) High-Capabity Satellite Network (HCSN) Data-Acquisition System that supplies WSI NOWrad and NEXRAD products
  • FSL Demonstration Division
  • Geostationary Operational Environmental Satellite (GOES-8 and GOES-10)
  • National Center for Atmospheric Research (NCAR)
  • Meteorological Assimilation Data Ingest System (MADIS) data providers

Distributed datasets included:

  • GOES imagery to the NOAA Environmental Technology Laboratory (ETL)
  • Wind profiler data to University Corporation for Atmospheric Research (UCAR) Unidata program
  • Quality controlled ACARS (Aircraft Communications Addressing and Reporting System) data to NCAR, government agencies, and universities
  • RUC/RSAS data to NCEP for operational backup
  • Real-time data were also distributed to several external organizations using the Unidata Local Data Manager (LDM) protocol

Data Acquisition Upgrades – An upgrade of the GOES data processing system was designed, developed, and completed. The local ground station system receives and ingests GOES Variable (GVAR) data (Figure 14) from the GOES-8 and -10 satellites. The system generates a suite of imager and sounder products in netCDF format.

Figure 14 - GVAR Data Processing

Figure 14. Schematic of GVAR (GOES variable) data processing at FSL.

The ACARS ingest hardware was replaced and processing software upgrades were completed. The new system was designed using IBM's MQ Server software, which replaced legacy hardware and software that acquired data using the outmoded X.25 protocol.

A new NOAAPORT Receive System (NRS) was evaluated, purchased, and integrated into production, resulting in much improved data reliability.

Data Processing and Management Upgrades

Object Data Systems (ODS) – Software was designed and developed to streamline the acquisition and processing of point, radar, and satellite data. Advanced Object-Oriented (OO) techniques were used in creating the software to reduce required maintenance and to allow for generic, more efficient handling of various data types.

As part of the ODS improvements, the satellite GVAR processing was upgraded, as shown in Figure 14. In keeping with the ODS model of handling "raw" data for both the real-time and archive streams, a completely new scheme was developed which allows greater flexibility in the configuration and maintenance of GVAR datasets.

Facility Information and Control Systems (FICS) – FICS Monitor changes were implemented to account for the arrival of a variety of new datasets. Scripts were developed to monitor operation of the High-Performance Computing System and Mass Store System. A new, more flexible method of monitoring LDM servers also was developed. FICS monitoring of AWIPS Data Servers was upgraded. The new version includes an "AWIPS Data Servers" page which allows for more flexibility with the number of data servers being monitored, while keeping the main FICS page minimally cluttered.

Real-Time Advanced Weather Interactive Processing System (AWIPS) Data Processing – Several new Linux AWIPS data servers were implemented. Numerous Local Data Acquisition and Dissemination (LDAD) data providers were added as part of cooperative projects, including the International H2O Project (IHOP) and the NOAA New England Forecasting Pilot Program: High Resolution Temperature and Air Quality Project (TAQ).

In collaboration with the FX-Net project, several AWIPS data servers were customized for displaying data for the TAQ, IHOP, and Fire Weather projects. Associated FICS monitoring and troubleshooting procedures were developed to monitor these systems. These tasks included customizing AWIPS data servers to process non-NOAAPORT model data, such as high-resolution MM5 and GPS-Met integrated precipitable water vapor (IPWV) data for display on FX-Net.

Data Storage and Access – The FSL Data Repository (FDR) and the Real-Time Data Saving (RTDS) systems were merged. Using ODS software to create a configurable and scalable system, the new FDR method reduces both the number of files (using Unix tar-tape archive) and the volume of data (using gzip compression) that is being stored using the MSS. As a result of the MSS upgrades described earlier and the improvements in data storage and access, users were able to store and retrieve data much faster and more reliably.

New Systems Architecture Development – Data ingest, processing, and distribution systems were developed to provide reliable, low-cost solutions for ftp, LDM, and other server applications using open source software. Since the new systems spread services among multiple commodity PC servers running on the Linux operating system, it is now easier to deploy additional servers to accommodate new services. An example of improved efficiency using these new systems is an ftp server (called eftp) which showed a steady increase of external users of FSL data, exceeding 150 by the end of 2002. To provide necessary backup, spare servers are now available that can be quickly imaged to assume the identity and function of any host that suffers a hardware failure. SystemImager software was used to clone appropriate server(s) from stored images and to duplicate and restore services as needed. File services for these systems were provided using low-cost RAIS devices with IDE disks. Refer to http://www-fd.fsl.noaa.gov/dsg/ for additional information.

Laboratory Project, Research, and External Support

FSL's supercomputer provided computational capability for FSL modeling efforts, high-performance computing software development, and other NOAA organizations. The latter includes seven NOAA Research Laboratories: Aeronomy Laboratory (AL), Atlantic Oceanographic and Meteorological Laboratory (AOML), Air Resources Laboratory (ARL), Climate Diagnostics Center (CDC), Environmental Technology Laboratory (ETL), National Severe Storms Laboratory (NSSL), Pacific Marine Environmental Laboratory (PMEL), and the National Geophysical Data Center. All projects were reviewed on the basis of scientific merit and their appropriateness for a commodity distributed-memory machine such as the FSL High Performance Computing System. The modeling projects involved the ocean, climate, atmosphere, and air quality.

ITS continued to distribute real-time and retrospective data and products to all internal FSL projects and numerous outside groups and users. External recipients included:

  • ETL – Real-time GOES-8 and -10 extended-sector satellite data, and WSR-88D radar data.
  • NWS Storm Prediction Center (SPC) in Norman, Oklahoma – Six-minute Profiler data.
  • NWS Aviation Weather Center in Kansas City – Six-minute Profiler, ACARS, and RUC data.
  • UCAR COMET© and Unidata Program Center – Six-minute Profiler, ACARS, MM5, RUC, and RASS data.
  • NCAR RAP and Mesoscale and Microscale Meteorology Division – RUC and MM5 data.

Other data and product sets were provided to outside groups, including Doppler radar, ACARS, upper-air soundings, meteorological aviation reports (METARs), profiler, satellite imagery and soundings, MAPS and LAPS grids, and Meteorological Assimilation Data Ingest System (MADIS) datasets. As liaison for outside users, the Systems Support Group provided information on system status, modifications, and upgrades.

Staff continued development of the FSL Hardware Assets Management System (HAMS), whose database incorporates an accurate and detailed list of FSL's hardware and software holdings. HAMS produces reports that are invaluable in tracking FSL equipment and software and provide input for yearly maintenance contracts and updating government property lists.

The two Oracle servers for HAMS were upgraded to the latest releases of Oracle 9i, Oracle 9i Application Server, and Apache Web Server software. The HAMS application processed over 120,000 wireless and Web-based transactions during the year and tracked equipment and software resources within FSL. Version 3.0 of the HAMS application software was packed with new features and enhancements, including Support Contracts, Support Costs, Project and Task Hours, Dynamic Views, Software Parenting, Groups, Members, Room Contents, Rack Contents, Storage Contents, and hundreds of other enhancements. HAMS training courses were developed and classes were held for FSL system administrators, network administrators, operators, and property custodians. The HAMS Web-based application won the FSL Web Award for the "Best Internal Use" category this year.

Division staff advised FSL management on the optimal use of laboratory computing and network resources, and participated in cross-cutting activities that extended beyond FSL, as follows:

  • Chaired the FSL Technical Steering Committee (FTSC), which reviewed all FSL equipment fund requests and provided the FSL director and senior staff with technical recommendations for equipment procurements.
  • Served on the FSL Technical Review Committee.
  • Served as Core Team and Advisory Team members for selecting upgrades to the FSL HPCS.
  • Participated in the creation of OAR's (Office of Oceanic and Atmospheric Research) IT Architecture plan.
  • Served as members of the Jet Allocation Committee, which reviews proposals for use of FSL's HPCS and provides recommendations to the FSL director for their acceptance.
  • Served on the Boulder IT Council (BITC), including assuming the office of chair.
  • Served as FSL representative and chair of the OAR Technical Committee for Computing Resources (TCCR).

ITS staff presented a well-received review of the FSL High Performance Computing System to the Commerce IT Review Board last September, and the program was given a green light to continue.

Projections

Central Computer Facility

FSL High-Performance Computer System and Mass Store System – The final upgrade and acceptance of the HPCS is planned for early 2003. The 48-CPU testbed system will be decommissioned as a cluster, and its nodes will be used within the HPCS and other areas of the Central Facility. A new RAID with file system software from IBM will be integrated into the HPCS. Associated software, the General Purpose File System (GPFS), will initially support highly critical usages of the HPCS such as real-time runs of forecast models. GPFS will be tested on the existing RAID system, with plans to make GPFS file systems available to more of the general user community. A cluster with 64-bit processors will be acquired and tested later in 2003 to ensure that critical software at FSL functions properly in such an environment. In collaboration with the Aviation Division, Grid Computing software (Globus, MPICH-G2) and Condor will be installed on a portion of the HPCS for initial development and testing of the Grid concept.

The MSS upgrade, also to be accomplished and accepted in early 2003, will involve moving to a new Hierarchical Storage Management System, a new host, and a more robust tape media, upgrading the HPCS RAID to be used as cache. FSL will survey its existing user base and other areas of OAR regarding requirements for future computational platforms to support NOAA research applications.

Central Facility Systems Enhancements and Cost Savings – With the implementation of the required firewalls during 2003, many Central Facility services will need to be rearchitected to work properly with the new firewalls and the resulting new network design. The DNS and e-mail gateways will be redesigned and rebuilt on new hardware that will be less expensive to maintain than the old systems. The new hardware will also have the advantage of allowing advanced testing of the rearchitected DNS and e-mail functions before switching FSL to the new systems. A new design for Web content delivery has already been drawn up to accommodate the firewalls and additional security without requiring the purchase of many new server systems.

A new version of FSLHelp will be developed and introduced that will fix bugs, increase security, provide an easier interface for users, and decrease response time of the system.

Cost saving efforts will continue through implementation of a much simpler computer environment. For example, testing will begin on a new, standard desktop system that will run either Red Hat Linux or Microsoft Windows , so that only two types of operating systems and one type of hardware will need to be supported in ITS desktops. This goes hand in hand with a steady move toward only running Red Hat Linux and Sun Solaris on server systems within ITS.

Systems Support and Computer Operations – Staff will continue to identify regularly failing client backups, track down the reasons for the failures, and implement proper corrective measures to reduce the number of client backups that fail daily. This will more effectively utilize system/network resources and provide a higher level of service for all FSL users.

Additional tools will be implemented to ensure task performance consistency. Links will be added to the FICS monitor that will allow quick and consistent generation of SSG Log tickets and Data Outage notifications. The Data Outage Notification Generator form Common Gateway Interface (CGI) script will be created and implemented. Additional new products, real-time machine loading, and systems will be added to the FICS monitor. To support these additions, several critical support documents and SSG Help documentation will be updated to maintain and enhance monitoring, troubleshooting, and communicating about real-time data issues with users and system developers. Staff also will continue to provide assistance to systems administrators.

A refresher training session for the VESDA Smoke Detection system and FM-200 Fire Suppression System will be provided for the SGG staff. Documentation will be updated, and other training devices and additional aids for quickly resolving issues with these systems will be developed. Facility Infrastructure Upgrades – To improve emergency communications safety within the FSL's Central Computer Facility, wall-mounted telephones will be installed near each FM-200 Fire suppression abort switch. Electrical power surveys will be performed to more efficiently analyze power usage within the facility. Documentation will be prepared to better manage computer room growth with relation to space, cooling, and electrical power consumption.

FSL Network

The NOAA Boulder network plan to replace the ATM network backbone in the David Skaggs Research Center (DSRC) with a Gigabit Ethernet backbone will start in 2003. FSL will take advantage of the higher-speed GigE network by converting the link between FSL and NOAA Boulder from ATM to GigE, as shown in Figure 15. Internally, FSL will continue to maintain an integrated ATM and GigE network until funding becomes available, and the need is warranted technically to consolidate the two technologies into one (multiple) GigE core network. The GigE network within NOAA Boulder and FSL provides the topology needed to implement a high-speed redundant firewall security perimeter. This GigE topology will make it possible to physically separate the public access zone and internal network, which are currently only logically separated. Addess policies will be implemented to closely control all network traffic into and out of FSL public and internal (private) networks. Remote access (Dial-in and VPN) to FSL will be upgraded to utilize a server with 56-Kbps modems, and a Cisco 300 VPN server to provide more secure authentication of users. The entry point for these remote access services will be relocated to the public access zone, as part of the IT Security Architecture implementation. A third form of remote access is planned by deploying wireless networking. The plan is to implement wireless networking access points throughout FSL office spaces in the DSRC, and provide a small number of wireless network interface cards for loan to laptop computer users. This will be very beneficial to a broad range of users, including visitors who need short-term network access, seminar presenters requiring live network access to materials or for demonstrations, systems administrators for quick on-the-spot troubleshooting and generating/updating systems inventories, and for roaming FSL internal users. The entry point for wireless networking will also be in the public access zone, and will require FSL-granted account access and encrypted user authentication.

Figure 15 - FSL Network 2003

Figure 15. Diagram of upgraded FSL Network as of March 2003.

The FSL WAN traffic increased significantly in 2002, and the secondary link to the commodity Internet, often fully utilized at 12 Mbps, will be upgraded to increase the bandwidth of this link to 18 Mbps. Additionally, more economical WAN services will be investigated to determine if higher bandwidths, such as GigE, may be available through WAN service providers located on the Boulder Research and Administrative Network (BRAN) path. FSL could benefit from a GigE, or direct optical service to national-scale networks for connecting to other major supercomputing centers.

Information Technology (IT) Security

FSL will install a combined firewall and Intrusion Detection System (IDS) as a first step toward securing the network perimeter (see Figure 15). The firewall and IDS will be an integral part of the new Gigabit Ethernet network backbone, allowing FSL a modular, off-the-shelf upgrade path to accommodate configuration changes and future bandwidth growth. To achieve an economy of scale, the IDS will be implemented as an integral part of the NOAA Boulder Network Operations Center (NOC) building-wide system. The goal is for all NOAA Boulder Laboratories to eventually participate in a common IDS infrastructure, which will be managed by the NOAA Computer Incident Response Team (N-CIRT). For improved security, the earlier acquired and tested patch server wiill be placed into production. This server will mirror local, secure copies of the latest vendor patches for all applicable FSL systems and applications and will facilitate faster and more efficient systems patching.

Additional IT security challenges require that a full-time assistant IT security officer be hired in order to keep abreast of the new policies, regulations, and actions from the Department of Commerce and NOAA, and to implement and maintain the firewall/IDS/central logging infrastructure. This additional help will ensure that FSL can respond quickly to the increasing security workload and stricter security directives. The N-CIRT also plans to hire a full-time security specialist for the Boulder campus.

Data Acquisition, Processing, and Distribution

Design and development for new and modified datasets will continue. Use of ODS applications and methods will expand as legacy translators and product-generation methods are replaced by the new techniques. Object Oriented software development for point data will continue.

Upgrades and enhancements to the AWIPS data servers will be performed in response to the continual addition of products to the NOAAPORT dataset. Design and development staff will continue to create an automated research system for generating AWIPS review cases form retrospective datasets.

Metadata handling techniques for use with GRIB datasets will be implemented for real-time data processing. An automated system for acquiring and incorporating digital metadata is part of this plan. Further work includes continued development of the interactive interface that allows for easy query and management of the metadata content, the addition of program interfaces to allow for secure controlled data access, and incorporation of retrospecive data processing and metadata mangement.

Laboratory Project, Research, and External Support

ITS will continue to support FSL users and projects as well as external FSL collaborators and data users. This support comprises real-time and retrospective FSL data, meteorological products, and technical data-handling expertise.

Efforts will continue toward providing HPCS support, assistance, and advice to both FSL users and numerous other NOAA and outside users.

To facilitate the management and tracking of the laboratory's multimillion dollar assets, more enhancements are planned for the FSL Hardware Assets Management System (HAMS). The new Electrical Power Management enhancement will accurately track and monitor computer equipment power requirements within FSL. HAMS will provide the FSL Central Facility managers with tools to monitor, balance, and plan electrical load and consumption within the laboratory.

Other planned enhancements include network connections, enhanced equipment searches, mass changes, vendor searches, credit card reconciliation, and automated excess property processing.


FSL Staff FSL in Review (Other Years) FSL Forum Publications