Skip Nav


New User Guide for the Navy Cray XT5

System name: einstein.navo.hpc.mil

Table of Contents

  1. Overview of System Hardware
  2. Logging on Einstein
  3. User Environment
  4. File Transfers and Einstein
  5. Programming Environment
  6. Programming Models
  7. Important Programming Libraries
  8. Batch Queuing System

  1. Overview of System Hardware

    Einstein contains 1,660 compute nodes (12,872 cores), with 1,592 nodes (12,736 cores) available for computing, 25 terabytes of memory and 538 terabytes of disk storage. Each of the compute nodes contain two quad-core 2.3 GHz AMD Opteron processors with 16 GBytes of usable memory. Eight of these nodes are configured with 31 GBytes of usable memory to support larger memory applications.

    Einstein also contains 68 service nodes (136 cores). Each service node contains one dual-core 2.8 GHz Opteron processor and varying amounts of memory.

    Operating System

    Einstein features the Unicos and SUSE Linux operating systems, UNICOS/lc. Compute Node Linus (CNL) is run on each compute node and a fully featured SUSE Linux is run on the service nodes. The service nodes include the login, I/O and system nodes.

    CNL, a light weight OS based on Linux, reduces system overhead by managing virtual memory addressing, providing memory protection and performing basic scheduling.

    File Storage

    Einstein has two types of user disk storage available: Home file storage and Temporary work storage. Users are allocated a home directory, /u/home/$login, with a quota of 1 GByte that is intended for permanent storage. Source files, binary files, and small data files can be stored here. The home directory can be referenced by the $HOME environment variable. This filesystem is not backed up. Users are responsible for backing up their files to the Navy DSRC archive server, Newton, or to some other local system. Additional information for copying files can be found in the "File Transfers and Einstein" section below.

    Users can check their $HOME quota with the following command:

    einstein% lfs quota -u user1 /u/home 
    Disk quotas for user user1 (uid 5002): 
    Filesystem kbytes quota limit grace files quota limit grace 
    /u/home 50128 1000000 1024000 309 0 0
           

    The temporary work directory, /scr/$login, is available for the temporary storage of files needed by executing jobs. Larger input and output files can be stored here as this filesytem is much larger than /u/home. The temporary directory can be referenced by the $WORKDIR environment variable. This filesystem is also not backed up and files may be deleted at any time. Users are responsible for backing up their files to Newton or to some other local system.

    $WORKDIR is available in a Lustre globally shared filesystem that can be accessed from all compute nodes.

  2. Logging onto Einstein

    Using Kerberos

    Users with active accounts on Einstein may access any of eight interactive login nodes. All login nodes are named einstein.navo.hpc.mil (via external DNS lookup) but have separate IP numbers and internal hostnames within the cluster itself. Interactive login access to the internal network of compute nodes is not allowed to users; only batch jobs can run on the compute nodes. All available DoD Kerberos commands (including kftp, krsh and kerberized ssh) may be used to access the login nodes. Here are several examples of logging in, transferring data and remotely running commands on the interactive nodes:

    % ktelnet -x -F einstein.navo.hpc.mil
    % krlogin -x -F einstein.navo.hpc.mil
    % krcp myfile einstein.navo.hpc.mil:myfile
    % krsh -x -F einstein      # short names allowed inside the Navy DSRC network
    % ssh einstein             # ssh that supports DoD Kerberos tickets
    % kftp einstein            # kftp via default Gig-e network inside Navy lan

    Remote user "user1" at another site can log into Einstein as Navy DSRC user "user11", using the "-l login" option and a valid Kerberos ticket:

    % krlogin -l user11 -x einstein.navo.hpc.mil

    Next we transfer Fortran files from our home site outside Navy to our personal GPFS work filesystem on Einstein (/scr/your_Einstein) and then use krsh to check that the files actually transferred correctly:

    * NOTE: The -X option to krcp turns off data encryption for faster transfers.

    % krcp -X *.f einstein.navo.hpc.mil:/scr/user11
    
    % krsh einstein.navo.hpc.mil ls -ltr /scr/user11
    -rw-r--r--   1 user11  NA0101      1416 Jul 29 23:31 kticket.f
    -rw-r--r--   1 user11  NA0101      4251 Jul 29 23:31 parse_klist.f

    For additional information about using or obtaining DoD Kerberos, please refer to the
    Instructions for Kerberos.

    Interactive Sessions

    Interactive login sessions on Einstein and any child processes started from an interactive login shell are allowed a total of 15 minutes cputime. Any parent process and all its children that exceed this time limit will be killed by the system. File transfer commands such as kftp, krcp, rcp, and ftp are exempted from this time limit, but other processing such as shell scripts, tars, gzip, or backgrounded serial processes that require more than 15 minutes of cputime must be submitted to the PBS batch queuing system.

    Parallel processing via MPI is only allowed to run in the batch queues; parallel executables under the aprun jobstarter command are not allowed to run interactively on Einstein.

  3. User Environment

    By default, user logins to Einstein are set up with a valid PATH, MANPATH, and other environment variables that give access to the compilers, batch queue commands, Kerberos commands, and most common user-level commands such as tar, make, cat, and so on.

    Einstein also has the following environment variables set to access the archive servers and home directory, location of the sample code repository, and location of the COTS software:

    $ARCHIVE_HOME    /u/home/$login 
    $ARCHIVE_HOST    newton.navo.hpc.mil 
    $SAMPLES_HOME   /usr/local/Examples_Codes 
    $PET_HOME       /usr/local/PET 
    $CSI_HOME       /usr/local/CSI

    Users who want to add their own personal commands, paths, aliases, and syntax to their environment can create individual .cshrc and .login files (or .profile file for Korn/Bourne shell users) in their home directories.

  4. File Transfers and Einstein

    There are a number of ways to transfer data between Einstein and local Navy DSRC systems or other remote non-Navy systems:

    Einstein --> RMSS Archive (Newton)

    • use non-kerberized rsh/rcp from login nodes (cron or interactive login)
    • use kerberized rsh/rcp/ftp from login nodes (cron or interactive login)
    • use rsh/rcp or krsh/krcp/kftp inside a PBS transfer queue job
      (Submitted from login, cron, or from running PBS batch job)

    Note: Transfer queue jobs run on login nodes, so both rcp and krcp can be used. However, you can only use non-kerberized rsh/rcp from the compute node internal network (accessible only via PBS batch jobs).

    Einstein <--> other Navy DSRC systems and other systems offsite from Navy

    • use krcp/krsh/kftp from login nodes (cron or interactive login)
    • use krsh/krcp/kftp inside a PBS transfer queue job
      (Submitted from login, cron or from running PBS batch job)

    Note: There is no network routing from the compute node internal network on Einstein to any Navy DSRC systems except the RMSS servers. There is no network routing to any offsite systems from Einstein's internal network.

    For additional online information regarding accessing the Navy RMSS Archive server (Newton), please refer to the RMSS Overview Page

  5. Programming Environment

    Available Programming Environments

    Einstein has three programming environments available for compiling: Portland Group (PGI), PathScale and GNU. PGI is the default programming environment on Einstein.

    Modules

    The modules utility consists of the "module" command and module files. It initializes the programming environment for specific compilers. It allows an easy way to swap compilers and compiler versions.

    The "module" command can be used to change your default programming environment. The main programming environment module for each compiler is listed below:

    PrgEnv-pgi
    PrgEnv-pathscale
    PrgEnv-gnu
    Common Module Commands
    CommandDescription
    module listLists the modules in your environment
    module availLists available modules
    module loadLoads a module
    module showShows what a module loads
    module swapSwaps two modules

    Examples of each are listed below:

    einstein% module list 
    Currently Loaded Modulefiles:
       1) modules/3.1.6                12) xt-totalview/8.4.1b 
       2) MySQL/5.0.45                 13) fftw/3.1.1 
       3) xt-service/2.1.41HD          14) xt-libsci/10.3.0 
       4) xt-libc/2.1.41HD             15) xt-mpt/2.1.27HD 
       5) xt-os/2.1.41HD               16) xt-pe/2.1.41HD 
       6) xt-boot/2.1.41HD             17) xt-asyncpe/1.0c 
       7) xt-lustre-ss/2.1.41HD_1.6.5  18) PrgEnv-pgi/2.1.41HD 
       8) xtpe-target-cnl              19) pbs/9.2.2.82426 
       9) Base-opts/2.1.41HD           20) xtpe-quadcore 
      10) pgi/7.1.6                    21) mpscp/1.2b 
      11) totalview-support/1.0.5      22) bct-env/0.1 
    
    einstein% module avail 
    
    ----------------------- /opt/xt-asyncpe/1.0c/modulefiles ----------------------- 
    xtpe-quadcore 
    
    ------------------------------- /opt/modulefiles ------------------------------- 
    Base-opts/2.1.27HD                 pgi/7.1.6(default) 
    Base-opts/2.1.27HD.lusrelsave      pgi/7.2.2 
    Base-opts/2.1.41HD(default)        pgi/7.2.3 
    Base-opts/2.1.41HD.lusrelsave      pkgconfig/0.15.0(default) 
    MySQL/5.0.45                       totalview-support/1.0.5(default) 
    PrgEnv-gnu/2.1.27HD                xt-asyncpe/1.0 
    PrgEnv-gnu/2.1.41HD(default)       xt-asyncpe/1.0c(default) 
    PrgEnv-pathscale/2.1.27HD          xt-boot/2.1.27HD 
    PrgEnv-pathscale/2.1.41HD(default) xt-boot/2.1.41HD 
    PrgEnv-pgi/2.1.27HD                xt-catamount/2.1.27HD 
    PrgEnv-pgi/2.1.41HD(default)       xt-catamount/2.1.41HD 
    acml/4.0.1a(default)               xt-craypat/4.2 
    apprentice2/4.3.0(default)         xt-craypat/4.3.1 
    bct-env/0.1                        xt-craypat/4.3.2(default) 
    blcr/0.7.3                         xt-libc/2.1.27HD 
    cray/audit/1.0.0-1.0000.13871.0    xt-libc/2.1.41HD 
    dwarf/8.2.0(default)               xt-libsci/10.2.1 
    elf/0.8.10(default)                xt-libsci/10.3.0(default) 
    fftw/2.1.5                         xt-lustre-ss/2.1.27HD_1.6.5 
    fftw/3.1.1(default)                xt-lustre-ss/2.1.27HDa_1.6.5 
    gcc/4.1.2                          xt-lustre-ss/2.1.41HD_1.6.5 
    gcc/4.2.0.quadcore(default)        xt-mpt/2.1.27HD(default) 
    gcc/4.2.3                          xt-mpt/2.1.41HD 
    gcc-catamount/3.3                  xt-mpt/3.0.0 
    gnet/2.0.5                         xt-mpt/3.0.2 
    iobuf/1.0.6(default)               xt-os/2.1.27HD 
    java/jdk1.6.0_05(default)          xt-os/2.1.41HD 
    libscifft-pgi/1.0.0(default)       xt-papi/3.5.99c(default) 
    modules/3.1.6(default)             xt-pe/2.1.27HD 
    mpscp/1.2b                         xt-pe/2.1.41HD 
    pathscale/3.2(default)             xt-service/2.1.27HD 
    pbs/9.2.2.82426                    xt-service/2.1.41HD 
    petsc/2.3.3a(default)              xt-totalview/8.4.1b(default) 
    petsc-complex/2.3.3a(default)      xtgdb/1.0.0(default) 
    pgi/6.2.5                          xtpe-target-catamount 
    pgi/7.0.7                          xtpe-target-cnl 
    
    ------------------------------ /opt/modules/3.1.6 ------------------------------ 
    modulefiles/modules/dot         
    modulefiles/modules/modules 
    modulefiles/modules/module-cvs  
    modulefiles/modules/null 
    modulefiles/modules/module-info 
    modulefiles/modules/use.own 
    
    einstein% module load fftw 
    
    einstein% module show fftw 
    ------------------------------------------------------------------- 
    /opt/modulefiles/fftw/3.1.1: 
    
    conflict         fftw/2.1.5  
    prepend-path     PATH /opt/fftw/3.1.1/cnos/bin  
    prepend-path     LD_LIBRARY_PATH /opt/fftw/3.1.1/cnos/lib  
    prepend-path     MANPATH /opt/fftw/3.1.1/cnos/man  
    append-path      PE_PRODUCT_LIST FFTW  
    setenv           FFTW_POST_LINK_OPTS  -L/opt/fftw/3.1.1/cnos/lib -lfftw3 -lfftw3f  
    setenv           FFTW_INCLUDE_OPTS  -I/opt/fftw/3.1.1/cnos/include  
    setenv           FFTW_DIR /opt/fftw/3.1.1/cnos/lib  
    setenv           FFTW_INC /opt/fftw/3.1.1/cnos/include  
    
    ------------------------------------------------------------------- 
    
    einstein% module swap PrgEnv-pgi PrgEnv-pathscale
    Compilers

    It is recommended that the fortran, C and C++ compilers are called with the following Cray Compiler scripts:

    Cray Compiler Scripts
    Compiler ScriptDescription
    ftn Invokes Fortran 90 / Fortran95
    cc Invokes C
    CC Invokes C++

    The following table lists the vendor compilers available with a brief description for each programming enviroment available on Einstein.

    PGI Programming Environment Compilers
    CompilerDescription
    pgf90 Fortran 90/95 Compiler Invokes the Fortran compiler, assembler and linker with options derived from its command line arguments.
    pgf77 Fortran 77 Compiler Inovkes the Fortran compiler, assembler and linker with options derived from its command line arguments.
    pgcc C Compiler
    pgCC C++ Compiler

    Commonly Used PGI Options (Fortran and C/C++)
    OptionDescription
    -c Instructs the compiler to only compile and assemble source code without the link step.
    -fast
    These options create a generally optimal set of flags for targets that support SSE/SSE2 capability. They incorporate optimization options to enable use of vector streaming SIMD instructions (64-bit targets) and enable vectorization with SEE instructions, cache aligned and flushz.
    -g Instructs the compiler to include symbolic debugging information in the object module.
    -gopt Instructs the compiler to include symbolic debugging information in the object file, and to generate optimized code identical to that generated when -g is not specified.
    -help Provides information about available options.
    -mcmodel=medium Enables medium=model core generation for 64-bit targets; useful when the data space of the program exceeds 4GB.
    -Mconcur Instructs the compiler to enable auto-concurrentization of loops. If specified, the compiler uses multiple processors to execute loops that it determines to be parallelizable; thus, loop iterations are split to execute optimally in a multithreaded execution context.
    -Minfo Instructs the compiler to produce information on standard error.
    -Minline Enables function inlining.
    -Mipa=fast,inline Enables interprocedural analysis and optimization. Also enables automatic procedure inlining.
    -Mpfi or -Mpfo Enable profile feedback driven optimizations.
    -Mkeepasm Keeps the generated assembly files.
    -Munroll Invokes the loop unroller to unroll loops, executing multiple instances of the loop during each iteration. This also sets the optimization level to 2 if the level is set to less than 2, or if no -O or -g options are supplied.
    -M[no]vect Enables/Disables the code vectorizer.
    --[no_]exceptions Removes exception handling from user code. For C++, declares that the functions in this file generate no C++ exceptions, allowing more optimal code generation.
    -o Names the output file.
    -O<level> Specifies code optimization level where <level> is 0, 1, 2, 3, or 4.
    -tp <target> [,target...] Specify the target processor(s); for the 64-bit compilers, more than one target is allowed, and enables generation of PGI Unified Binary executables.

    PathScale Programming Environment Compilers
    CompilerDescription
    pathf95 Fortran 77, 90 and 95 Compiler Invokes the Fortran 77, Fortran 90, and Fortran 95 compiler.
    pathf90 Fortran 77, 90 and 95 Compiler Alias for pathf95 Invokes the Fortran 77, Fortran 90, and Fortran 95 compiler.
    pathcc C Compiler Invokes the C compiler and the source preprocessor on source files that contain C code.
    pathCC C++ Compiler Invokes the C++ compiler

    Commonly Used PathScale Fortran 90 Compiler Options
    OptionDescription
    -show-defaults List default compiler options for the compiler and exits.
    -c Generate intermediate object file but does not attempt to link.
    -g Adds information for debugging to the object file and/or executable.
    -I<directory> Tells the preprocessor to search in directory for include or module files.
    -L<directory> Tells the linker to search in directory for libraries.
    -r8 Promotes REALs from the default size of 4 bytes to 8 bytes.
    -i8 Promotes INTEGERs from the default size of 4 bytes to 8 bytes.
    -default64 Passes the -i8 and -r8 options to the compiler.
    -O3 Higher level of optimization than -O2 (the default optimization level).
    -cpp Preprocess files with the C preprocessor. Enabled by default for files ending in .F,.F90, or .F95.
    -ftpp Preprocess files with the Fortran preprocessor. Useful when portions of the Fortran code could be misinterpreted as C preprocessor directives (e.g. "//")
    -O3 Higher level of optimization than -O2 (the default optimization level).
    -O3 -OPT:Ofast Higher optimization level than -O3
    -ipa Tells the compiler to perform interprocedural analysis. Can be very time consuming to perform. This flag should also be used in both compilation and linking steps. Not recommended for programs over 100,000 lines for the current compiler release.
    -intrinsic=PGI Enables intrinsic functions that are available in the PGI compiler which are not ANSI standard (e.g. rand)
    -apo Enables autoparallelization.
    -mp Enables parallelization via OpenMP directives.

    Commonly Used PathScale C/C++ Compiler Options
    OptionDescription
    -show-defaults List default compiler options for the compiler and exits.
    -c Generate intermediate object file but does not attempt to link.
    -g Adds information for debugging to the object file and/or executable.
    -I<directory> Tells the preprocessor to search in directory for include or module files.
    -L<directory> Tells the linker to search in directory for libraries.
    -O3 Higher level of optimization than -O2 (the default optimization level).
    -Ofast Higher level optimization (default is -O2). This flag should be used in both compilation and linking steps.
    -ipa Tells the compiler to perform interprocedural analysis. This option can be very time consuming to perform. This flag should be used in both compilation and linking steps. Not recommended for programs over 100,000 lines for the current compiler release.
    -apo Enables autoparallelization.
    -mp Enables parallelization via OpenMP directives.

    GNU Programming Environment Compilers
    CompilerDescription
    gfortran Fortran Compiler
    gcc C and C++ Compiler
    g++ C and C++ Compiler

    PGI Compilers - http://www.pgroup.com/resources/docs.htm

    PathScale Compilers - http://www.pathscale.com/docs.html

    Additional information about the compilers under each programming environment, can be found in the manpages on the system.

  6. Programming Models

    There are three programming models supported on EINSTEIN: Message Passing Interface (MPI), Shared Memory (SHMEM) and Open Multi-Processing (OpenMP).

    Message Passing Interface (MPI)

    When creating an MPI program, ensure the Message Passing Toolkit (MPT) is loaded. MPT should be loaded in the default programming environment. Users can verify this by running the "module list" command. If it is not listed, then run the "module load xt_mpt" to load. Additional information on modules can be found under the "Programming Environment Section".

    To compile an MPI program in C or Fortran, use the following examples:

          cc -o mpiprog.exe mpi_prog.c
          ftn -o mpiprog.exe mpi_prog.f
          

    To run an MPI program within a batch script, use the following command:

          aprun -n N $WORKDIR/mympidirectory/mpiprog.exe
          

    The aprun utility executes across a specified number of compute nodes. The "-n N" option specifies the number of cores to start. Please note the aprun utility only works in a Lustre-mounted file system. Users should ensure that all files needed by the MPI job are located in $WORKDIR. Additional information on the aprun utility can be found in the online manpages.

    The syntax used in the MPI compile and run examples above can be used in all three programming environments: PGI, PathScale or GNU.

    Shared Memory (SHMEM)

    When creating an SHMEM program, ensure the Message Passing Toolkit (MPT) is loaded. MPT should be loaded in the default programming environment. Users can verify this by running the "module list". If it is not listed, then run the "module load xt_mpt" to load. Additional information on modules can be found under the "Programming Environment Section".

    The SHMEM routines can be found in libsma.a and are linked with the compiler option "-lsma". They are included in the standard Cray compilers, but can be specified on the compile line.

    To compile a SHMEM program in C or Fortran, use the following examples:

          cc -lsma -o shmemprog.exe shmem_prog.c
          ftn -lsma -o shmemprog.exe shmem_prog.f
          

    To run a SHMEM program within a batch script, use the following command:

          aprun -n N $WORKDIR/myshmem/shmemprog.exe
          

    The syntax used in the SHMEM compile and run examples above can be used with all three programming environments: PGI, PathScale or GNU.

    The aprun utility executes across a specified number of compute nodes. The "-n N" option specifies the number of cores to start. Please note the aprun utility only works in a Lustre-mounted file system. Users should ensure that all files needed by the SHMEM job are located in $WORKDIR. Additional information on the aprun utility can be found in the online manpages.

    Open Multi-Processing (OpenMP)

    When creating an OpenMP program, ensure the Message Passing Toolkit (MPT) is loaded. MPT should be loaded in the default programming environment. Users can verify this by running the "module list". If it is not listed, then run the "module load xt_mpt" to load. Additional information on modules can be found under the "Programming Environment Section".

    To compile an OpenMP program in C or Fortran, use the following examples:

    PGI Programming Environment
          cc -o openmpprog.exe -mp=nonuma openmp_prog.c
          ftn -o openmpprog.exe -mp=nonuma openmp_prog.f
          
    PathScale Programming Environment
          cc -o openmpprog.exe -mp openmp_prog.c
          ftn -o openmpprog.exe -mp openmp_prog.f
          
    GNU Programming Environment
          cc -o openmpprog.exe -fopenmp openmp_prog.c
          ftn -o openmpprog.exe -fopenmp openmp_prog.f
          

    The syntax for the OpenMP compile examples above is different for each programming environment.

    To run an OpenMP program within a batch script, use the following command:

          aprun -n 1 -d 4 openmpprog.exe
          

    The syntax for the OpenMP run example above can be used will all three programming environments: PGI, PathScale or GNU.

    The aprun utility executes across one compute node with four threads. Also, the environment variable OMP_NUM_THREADS needs be set to the number of threads. And, as in the previous programming models, aprun ony works in a Lustre-mounted file system. Users should ensure that all files needed by the OpenMP job are located in $WORKDIR. Additional information on the aprun utility can be found in the online manpages.

  7. Important Programming Libraries

    Cray LibSci

    LibSci is Cray's optimized numerical routines library. The library is loaded by default and contains BLAS (levels 1, 2, and 3), LAPACK, SCALAPACK and BLACS routines.

    This library, along with the MPI library and the ACML library (if the ACML module is loaded), is added to the linker options by default when using the Cray compiler wrappers (i.e. ftn, cc, CC).

    ACML

    ACML contains BLAS (levels 1, 2, and 3), LAPACK and FFT routines. This library is not loaded by default and can be by issuing the following module command:

    einstein%module load acml

    Once this library is loaded to your default programming environment, it will take precedence over the LibSci routines.

  8. PBS Batch Queuing System

    The Portable Batch System (PBS) is the default batch queue system on Einstein. It works much the same as other batch queue packages like LoadLeveler, NQS or LSF, but uses different keywords to specify options to the scheduler. PBS keeps track of all queues and user jobs and starts, holds, and ends batch jobs on the Cray.

    Batch Queues

    There are nine queues available on Einstein: standard, high, challenge, urgent, bigmem, transfer, analysis, debug and background. The limits for these queues are listed in the table below.

    Queue Limits
    Queue Name Max Nodes / Cores CPU Time Limit Comments
    standard256 / 2048168 hrs.Non-Challenge Jobs
    high512 / 4096168 hrs.High Priority Jobs
    challenge512 / 4096168 hrs.Challenge Jobs
    urgentTBD / TBDTBDRestricted Access
    bigmem7 / 5624 hrs.Large Memory Jobs
    transfer1 / 112 hrs.Transfer Jobs
    analysis1 / 18 hrs.Serial Jobs
    debug64 / 51230 mins.Debug Jobs
    background64 / 5124 hrs.Negative Allocation
    PBS Commands

    A brief summary of standard PBS commands is given in the following table. There are also manpages for each command on Einstein:

    PBS Commands
    CommandDescription
    qsub script Submit a job_script
    (Once job is submitted, PBS assigns it a unique jobID number)
    qstat -a Check all jobs under PBS
    qstat jobID Check one job
    qstat -u login Check one user's jobs
    qstat -f jobID Obtain detailed status on a job
    qdel jobID Cancel a running/queued Job
    qstat -Q List all PBS Queues

    The "qsub" command is used to submit jobs to the queues. Much like using LSF, one can submit jobs directly from the command line with qsub. However, it is easier to create a scriptfile that contains all the necessary PBS options and user commands needed to run a job, and then qsub the script. PBS allows you to embed options into scripts. The string "#PBS" is used to start each valid PBS option so it will be correctly parsed by the qsub command.

    einstein% qsub serialjob.pbs
       PBS submit script located at "serialjob.pbs".
       Filtering user12 submission...
    
              ***
    Job script serialjob.pbs parsed as:
    #PBS -N serialjob
    #PBS -o serialjob.out
    #PBS -e serialjob.err
    #PBS -A NAVOSLMA
    #PBS -l walltime=0:30:00
    #PBS -q debug
    #PBS -l mppwidth=1
    echo hello
              ***
    
       Job will submit via project NAVOSLMA
       Walltime directive is: #PBS -l walltime=0:30:00
       Queue is "standard".
    
       Submitting "serial.pbs" ...
    35095.sdb
    
       user12 currently has 1 SUBMITTED jobs - of 40 permitted.
       (running plus queued.)
       user12 currently has 0 RUNNING jobs  - of 10 permitted.
    
       You currently have the following jobs:
    
    sdb: 
                                                                Req'd  Req'd   Elap
    Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
    --------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
    35095.sdb       user12   standard serial.     --    1   1    --  00:30 Q   --

    The "qstat -a" command is used to view information about queued and running jobs. By default, qstat outputs the following information for all running (R) and waiting (Q) jobs:

    einstein% qstat -a 
    
    sdb: 
                                                                Req'd  Req'd   Elap
    Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
    --------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
    35095.sdb       user11   standard tjob1.     --      1   1    --  48:00 H   -- 
    35100.sdb       user11   standard tjob2.     25951   1   1    --  48:00 R 32:48
    35102.sdb       user12   transfer tran1.       705   1   1    --  10:00 R 00:58
    35103.sdb       user12   transfer tran2.      1039   1   1    --  10:00 R 00:03

    . Col 1 is the job identifier (Job ID)
    . Col 2 lists the owner (Username) of the job
    . Col 3 lists the name of the queue (Queue) the job has been submitted to
    . Col 4 lists the job name (Jobname)
    . Col 5 lists the session ID (SessID)
    . Col 6 lists number of nodes (NDS) required
    . Col 7 lists number of tasks (TSK) required
    . Col 8 lists the required memory (Req'd Memory)
    . Col 9 lists the required time (Req'd Time)
    . Col 10 lists the job status (S)
    . Col 11 lists the elapsed time (Elap Time)

    To view all jobs for a user on Einstein, use "qstat -u login", where "login" is the user's login name:

    einstein% qstat -u user11
    
    sdb: 
                                                                Req'd  Req'd   Elap
    Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
    --------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
    35095.sdb       user11   standard tjob1      --      1   1    --  48:00 H   -- 
    35100.sdb       user11   standard tjob2      25951   1   1    --  48:00 R 32:48

    To remove a job that is queued to run or already running, use the "qdel" command with the JobID of the job to be removed. Users can only kill their own batch jobs:

    einstein% qdel 227053
    Job <227053> is being terminated

    The "qstat" command lists the various queues on Einstein and basic information about them.

    einstein% qstat -q
    
    server: sdb
    
    Queue            Memory CPU Time Walltime Node   Run   Que   Lm  State
    ---------------- ------ -------- -------- ---- ----- ----- ----  -----
    challenge          --      --    168:00:00  --      0     0   --   E R
    standard           --      --    168:00:00  --      4     0   --   E R
    transfer           --      --    12:00:00   --      0     0   16   E R
    bigmem             --      --    24:00:00   --      0     0   --   E R
    urgent             --      --    24:00:00   --      0     0    1   D S
    high               --      --    168:00:00  --      0     0   --   E R
    debug              --      --    00:30:00   --      0     0    1   E R
    background         --      --    04:00:00   --      0     0    1   E R
                                                    -----  -----
                                                        4     0

    Once queued, jobs will run as soon as possible, given the current queuing algorithm. The more precise you can be about your job's resource requirements (especially the walltime limit), the quicker your job may execute.

    Sample PBS Scripts

    A good starting template for a PBS script to run a serial job for 1 hour is show below:

    #!/bin/csh
    #PBS -N serialjob
    #PBS -o serialjob.out
    #PBS -e serialjob.err
    #PBS -A NAVOSLMA
    #PBS -l walltime="1:00:00"
    #PBS -q standard
    #PBS -l mppwidth=1
    #
    ## Run serial fortran executable on 1 processor
    cd $WORKDIR 
    aprun ./serial_fort.exe
    ##End of Sample PBS Script

    The first few lines tell PBS to save the standard output and error output to the given files and give the job a name. Skipping ahead, we estimate the run-time to be about 1 hour and know that this is acceptable for the standard batch queue. We need 1 CPU in total, so we request 1 CPU.

    Here is an example of a PBS script to run a parallel job:

    #!/bin/csh
    #PBS -N run_par
    #PBS -o run_par.out
    #PBS -e run_par.error
    #PBS -A Your_Project_Name
    #PBS -l walltime="12:00:00"
    #PBS -l mppwidth="256"
    #PBS -l mppnppn="4"
    #PBS -q standard
    # ----------------------------------
    #
    # Set appropriate data and output directories variable names
    setenv DATADIR $WORKDIR/DATA
    setenv OUTPUTDIR $WORKDIR/OUTPUT
    # Create the directories
    #
    mkdir -p $DATADIR
    mkdir -p $OUTPUTDIR
    #
    # Copy executable and any required input files to your work directory
    # located under the /scr Lustre filesystem.
    cp $HOME/my_MPI_code $WORKDIR
    cp $HOME/{namelist,myinput} $DATADIR
    cd $WORKDIR
    #
    # Run the MPI job with the Cray "aprun" command:
    #
    aprun -N 4 -n 256 ./my_MPI_code
    #
    # Archive any output from the job to your home directory on Einstein
    # or to your home directory on the archive server, Newton.
    #
    /usr/bin/rcp $OUTPUTDIR/* newton:$ARCHIVE_HOME/
    /usr/bin/rcp restart newton:$ARCHIVE_HOME/restart
    # ----- End of sample PBS script -----

    The first few lines tell PBS to save the standard output and error output to the given files and give the job a name. Skipping ahead, we estimate the run-time to be about 12 hours and know that this is acceptable for the standard batch queue. The next couple of lines set the number of cores and the number of PEs per node. This job is requesting 256 cores and 4 PEs letting the job run on 64 nodes. The default value for number of PEs is 8.

  9. Pointers to more Information

    Cray Documents - http://docs.cray.com/