Navy DSRC Introduction and Policy Guide

Table of Contents

1. Introductionto top

1.1. Purpose

This document provides an overview of the Navy DSRC. This guide is intended to offer assistance to users and their S/AAAs in determining which systems will best meet specific computational needs.

To contact us with questions, comments, or suggestions about this guide, please visit the Contact Us page for complete contact information.

1.2. Overview of Supported CTAs

The Navy Department of Defense (DoD) Supercomputing Resource Center (Navy DSRC) is organizationally located with the Naval Meteorology and Oceanography Command (NAVMETOCCOM) and is collocated with the headquarters (Commander, Naval Meteorology and Oceanography Command - CNMOC) at the John C. Stennis Space Center, MS. NAVMETOCCOM/CNMOC provides oceanographic support to the Department of Defense through a wide range of oceanographic modeling, prediction and data collection techniques.

The Navy DSRC, formerly the NAVO MSRC, was the second of the four major shared DoD High Performance Computing (HPC) centers to be formed under the auspices of the DoD HPC Modernization Program. Now one of five such centers, the Navy DSRC provides specialized support in the following critical defense computational technology areas (CTAs):

Supported CTAs
CTADescription
CWO Climate/Weather/Ocean Modeling and Simulation
CFD Computational Fluid Dynamics
CSM Computational Structural Mechanics
CCM Computational Chemistry, Biology, and Materials Science
CEA Computational Electromagnetics and Acoustics
ENS Electronics, Networking, and Systems/C4I
SIP Signal/Image Processing
FMS Forces Modeling and Simulation
EQM Environmental Quality Modeling and Simulation
IMT Integrated Modeling and Test Environments
SAS Space and Astrophysical Science

DoD Supercomputing Resource Centers provide DoD scientists and engineers with most of the program's computational resources. Each center supports a full range of centralized systems and services, including vector machines, scalable parallel systems, clustered workstations, DoD scientific visualization resources, and training.

1.3. Requesting Assistance

The Consolidated Customer Assistance Center (CCAC) is available to help users with unclassified problems, issues, or questions. Analysts are on duty 8:00 a.m. - 11:00 p.m. Eastern, Monday - Friday (excluding Federal holidays).

You can contact the Navy DSRC for after-hours support and for support services not provided by CCAC. You can contact us in any of the following ways:

  • E-mail: dsrchelp@navo.hpc.mil
  • Phone: 1-800-993-7677 or (228) 688-7677
  • Fax: (228) 688-4356
  • U.S. Mail:
    Navy DoD Supercomputing Resource Center
    1002 Balch Boulevard
    Stennis Space Center, MS 39522-5001

For more detailed contact information, please see the Contact Us page.

1.4. Obtaining an Account

The process of getting an account on the HPC systems at any of the DSRCs begins with getting an account on the HPCMP Portal to the Information Environment, commonly called a "pIE User Account". If you do not yet have a pIE User Account, please visit the Consolidated Customer Assistance Center (CCAC) Accounts page and follow the instructions there. Once you have an active pIE User Account, visit the Navy DSRC Accounts page for instructions on how to request accounts on the Navy DSRC HPC systems. If you need assistance with any part of this process, please contact CCAC at accounts@ccac.htp.mil.

1.5. Visitor Information

If you are planning to visit the Navy DSRC, it is important that you review the instructions on the Planning a Visit page. This page contains important information including pre-trip and on-arrival instructions that you will need to know to ensure that your visit to our center goes smoothly.

2. Hardware, Network, and Softwareto top

All HPC systems currently in operation at the Navy DSRC are seamlessly integrated with the Mass Storage Archive Server and the Defense Research and Engineering Network (DREN) via many high-speed networking technologies.

2.1. High Performance Computing

2.1.1. IBM iDataPlex (Haise)

Haise is an IBM iDataPlex. The login and compute nodes are populated with 2.6-GHz Intel Xeon Sandy Bridge E5-2670 16-core processors. Haise uses the FDR-10 InfiniBand interconnect in a Fat Tree configuration as its high-speed network for MPI messages and IO traffic. Haise uses IBM's General Parallel File System (GPFS) to manage its parallel file system that targets IBM's IS4600 (Infinite Storage) RAID arrays. Haise has 1,176 compute nodes that share memory only on the node; memory is not shared across the nodes. Each login node has two 8-core processors (16 cores) with its own Red Hat Enterprise Linux operating system, sharing 64 GBytes of memory, with no user-accessible swap space. Each compute node has two 8-core processors (16 cores) with its own Red Hat Enterprise Linux operating system, sharing 32 GBytes of memory, with no user-accessible swap space. Haise is rated at 391 peak TFLOPS and has 2.8 PBytes (formatted) of disk storage.

Haise is intended to be used as a batch-scheduled HPC system. Its login nodes are not to be used for large computational (e.g., memory, IO, long executions) work. All executions that require large amounts of system resources must be sent to the compute nodes by batch job submission.

haise.navo.hpc.mil
IBM iDataPlex - 391 TFLOPS
Login Nodes Compute Nodes
Total Nodes 8 1176
Operating System RedHat Enterprise Linux RedHat Enterprise Linux
Cores/Node 16 16
Core Type Intel Xeon Sandy Bridge E5-2670 Intel Xeon Sandy Bridge E5-2670
Core Speed 2.6 GHz 2.6 GHz
Memory/Node 64 GBytes 32 GBytes
Accessible Memory/Node 8 GBytes 27 GBytes
Memory Model Shared on node. Shared on node.
Distributed across cluster.
Interconnect Type 10 GigEthernet FDR-10 InfiniBand
File Systems on Haise
Path Capacity Type
/scr2.4 PBytesGPFS
/u/home16 TBytesGPFS
/p/cwfs800 TBytesPanFS

For detailed information on using Haise, see the Haise User Guide.

2.1.2. IBM iDataPlex (Kilrain)

Kilrain is an IBM iDataPlex. The login and compute nodes are populated with 2.6-GHz Intel Xeon Sandy Bridge E5-2670 16-core processors. Kilrain uses the FDR-10 InfiniBand interconnect in a Fat Tree configuration as its high-speed network for MPI messages and IO traffic. Kilrain uses IBM's General Parallel File System (GPFS) to manage its parallel file system that targets IBM's IS4600 (Infinite Storage) RAID arrays. Kilrain has 1,176 compute nodes that share memory only on the node; memory is not shared across the nodes. Each login node has two 8-core processors (16 cores) with its own Red Hat Enterprise Linux operating system, sharing 64 GBytes of memory, with no user-accessible swap space. Each compute node has two 8-core processors (16 cores) with its own Red Hat Enterprise Linux operating system, sharing 32 GBytes of memory, with no user-accessible swap space. Kilrain is rated at 391 peak TFLOPS and has 2.8 PBytes (formatted) of disk storage.

Kilrain is intended to be used as a batch-scheduled HPC system. Its login nodes are not to be used for large computational (e.g., memory, IO, long executions) work. All executions that require large amounts of system resources must be sent to the compute nodes by batch job submission.

kilrain.navo.hpc.mil
IBM iDataPlex - 391 TFLOPS
Login Nodes Compute Nodes
Total Nodes 8 1176
Operating System RedHat Enterprise Linux RedHat Enterprise Linux
Cores/Node 16 16
Core Type Intel Xeon Sandy Bridge E5-2670 Intel Xeon Sandy Bridge E5-2670
Core Speed 2.6 GHz 2.6 GHz
Memory/Node 64 GBytes 32 GBytes
Accessible Memory/Node 8 GBytes 27 GBytes
Memory Model Shared on node. Shared on node.
Distributed across cluster.
Interconnect Type 10 GigEthernet FDR-10 InfiniBand
File Systems on Kilrain
Path Capacity Type
/scr2.4 PBytesGPFS
/u/home16 TBytesGPFS
/p/cwfs800 TBytesPanFS

For detailed information on using Kilrain, see the Kilrain User Guide.

2.2. Mass Storage Archive Server (Newton)

There is one Oracle T4-4 system, Newton, which makes up the Resilient Mass Storage Server (RMSS). The system is configured with two 8-core 3.0-GHz processors, 256 GBytes of main memory, and over 70 TBytes of hard disk storage. For information on using the archive system, see the Archive User Guide.

2.3. Network Connectivity

Our site is a primary node of the Defense Research and Engineering Network, or DREN. DREN is a robust, high-speed network providing connectivity to user sites and centers nationwide. We connect to the DREN Wide Area Network (WAN) via an OC-48 circuit capable of data transfers up to 2.48 Gbits/sec and a secondary OC-12 circuit capable of data transfers up to 622 Mbits/sec to provide fault tolerance and additional bandwidth.

Our Local Area Network (LAN), a 10-Gigabit Ethernet connection, provides primary connectivity to the Navy DSRC infrastructure, HPCs, and mass storage assets. The users of the Navy DSRC are able to use this high-performance connectivity for interactive and data transfer functions.

2.4. Software Environment

All Navy DSRC systems run derivatives of the UNIX System V operating system with vendor-specific enhancements. A large variety of compiler environments, math libraries, programming tools and third-party analysis applications are available on the DSRC systems.

HPC Software Listings
SystemSoftware Listing
IBM iDataPlex (Haise) http://www.navo.hpc.mil/software/index.html?sys=Haise
IBM iDataPlex (Kilrain) http://www.navo.hpc.mil/software/index.html?sys=Kilrain

3. Data Storageto top

The Navy DSRC data storage consists of local home directories on each system, temporary disk storage on each system and long-term storage on the Resilient Mass Storage Server (RMSS). Files stored on the RMSS are subject to migration to off-line status that is controlled by Sun's Storage and Archive Manager/Quick File System (SAM/QFS) software.

3.1. Permanent File Storage

Users are allocated a home directory (referenced locally with the $HOME environment variable) on each Navy DSRC system with 1 GByte of non-migrated storage. $HOME is not backed up by the Center; therefore users are responsible for maintaining backup copies of any files in this directory.

3.2. Temporary File Storage

Each Navy DSRC system is configured with a large quantity of high-speed disk storage configured as the /scr file system. /scr is the globally accessible, high-speed working storage primarily for interactive and batch processing. Batch jobs use large amounts of temporary space. There are no limits on the size of individual files. Users are responsible for managing their own files in the /scr areas. The /scr file system is not backed up by the Center. Users are responsible for maintaining backup copies of any files in the temporary file system. Users can access their temporary storage by using the $WORKDIR environment variable. The table below lists the /scr allocations for each system.

Temporary Space Allocations on HPC Systems
System/scr
IBM iDataPlex (Haise)20 TBytes
IBM iDataPlex (Kilrain)20 TBytes

3.3. Archival File Storage

All of our HPC systems have access to an online archival mass storage system that provides long-term storage for users' files on a petascale archival storage system that resides on a robotic tape library system. A 70-TByte disk cache frontends the tape file system and temporarily holds files while they are being transferred to or from tape.

The environment variables $ARCHIVE_HOST and $ARCHIVE_HOME are automatically set for you. $ARCHIVE_HOST can be used to reference the archive server, and $ARCHIVE_HOME can be used to reference your archive directory on the server. These can be used when transferring files to/from archive. For information on using the archive system, see the Archive User Guide.

4. Processing Environmentto top

4.1. Determining the Correct HPC System

Determining the correct HPC System for your needs can be a complex task. The following are just a few of the factors that might influence your choice:

4.1.1. Software Availability

If your work depends upon a specific Commercial Off-The-Shelf (COTS) application, you can verify it's availability on any system in the HPCMP by checking the Consolidated Software List. Software information for Navy DSRC systems is also available on our local software page. If you can't find the application that you need, contact CCAC for assistance.

4.1.2. Hardware Requirements

To ensure that your jobs will have access to sufficient cores and memory to run as needed, you can review the hardware specifications on our Hardware page. Additional details are available in each of the HPC User Guides, available from the Documentation page.

4.1.3. Queue Limits

If your jobs require exceptionally long run times or if you need an exceptionally large number of cores, you should verify that queue limits on the system you choose allow both the number of cores and run time that you need. To check this, see our Queue Summary page.

4.2. Processing Environment Overview and Philosophy

Navy DSRC provides both an interactive and a batch submission environment. Batch queue environments are available on all of the systems. The batch environment is the primary environment for most user work. All of the HPC systems at the Navy DSRC use the PBS batch queue system.

The batch queue environments allow users to submit, monitor and terminate their own batch jobs. This capability is intended for jobs requiring large amounts of memory and/or CPU time that generally run for many hours. Through the batch queue environments, the user submits a job either from the command line or through a shell script. Resource requirements (e.g., CPU time and number of processors) or runtime parameters (e.g., output file redirection) can be issued on the command line or embedded in the shell script for the batch job to be executed.

4.3. Job Scheduling/Queuing Environment and Policies

4.3.1. IBM iDataPlex Queue Usage Policies
Summary of Queues on the IBM iDataPlex - Haise and Kilrain
Priority Queue
Name
Job
Class
Max Wall
Clock Time
Max Cores
Per Job
Comments
Highest urgent Urgent 24 Hours 4096 Designated urgent projects by DoD HPCMP
Down Arrow for decreasing priority high High 168 Hours 6144 Designated high-priority projects by service/agency
challenge Challenge 168 Hours 6144 Challenge projects only
special N/A 24 Hours 4096 Access available by request
debug Debug 30 Minutes 1024 User diagnostic jobs
standard Standard 168 Hours 4096 Non-challenge user jobs
transfer N/A 12 Hours 1 Data transfer jobs
Lowest background Background 4 Hours 512 User jobs that will not be charged against the project allocation

4.4. Interactive CPU-time Limits

The Navy DSRC has implemented a 15 minute (900 second) interactive processing limit on login nodes for processes running outside of the batch scheduler. This also applies to systems that do not have a batch scheduler installed. If you were to run an application on a login node, the application would be allowed to accrue 900 seconds-worth of CPU time, not real time, before being terminated. This policy has been put in place in order to protect interactive access for all users.

Interactive CPU-Time Limits
SystemCPU Time
IBM iDataPlex (Haise)15 Minutes
IBM iDataPlex (Kilrain)15 Minutes
Oracle T4-4 (Newton)15 Minutes

5. Navy DSRC Specific Documentationto top

On-line documentation and information can be found through the Navy DSRC Web site, the message of the day (MOTD) that is displayed when logging on any system, and manual pages via the man command.