Einstein contains 1,660 compute nodes (12,872 cores), with 1,592 nodes (12,736 cores) available for computing, 25 terabytes of memory and 538 terabytes of disk storage. Each of the compute nodes contain two quad-core 2.3 GHz AMD Opteron processors with 16 GBytes of usable memory. Eight of these nodes are configured with 31 GBytes of usable memory to support larger memory applications.
Einstein also contains 68 service nodes (136 cores). Each service node contains one dual-core 2.8 GHz Opteron processor and varying amounts of memory.
Einstein features the Unicos and SUSE Linux operating systems, UNICOS/lc. Compute Node Linus (CNL) is run on each compute node and a fully featured SUSE Linux is run on the service nodes. The service nodes include the login, I/O and system nodes.
CNL, a light weight OS based on Linux, reduces system overhead by managing virtual memory addressing, providing memory protection and performing basic scheduling.
Einstein has two types of user disk storage available: Home file storage and Temporary work storage. Users are allocated a home directory, /u/home/$login, with a quota of 1 GByte that is intended for permanent storage. Source files, binary files, and small data files can be stored here. The home directory can be referenced by the $HOME environment variable. This filesystem is not backed up. Users are responsible for backing up their files to the Navy DSRC archive server, Newton, or to some other local system. Additional information for copying files can be found in the "File Transfers and Einstein" section below.
Users can check their $HOME quota with the following command:
einstein% lfs quota -u user1 /u/home
Disk quotas for user user1 (uid 5002):
Filesystem kbytes quota limit grace files quota limit grace
/u/home 50128 1000000 1024000 309 0 0
The temporary work directory, /scr/$login, is available for the temporary storage of files needed by executing jobs. Larger input and output files can be stored here as this filesytem is much larger than /u/home. The temporary directory can be referenced by the $WORKDIR environment variable. This filesystem is also not backed up and files may be deleted at any time. Users are responsible for backing up their files to Newton or to some other local system.
$WORKDIR is available in a Lustre globally shared filesystem that can be accessed from all compute nodes.
Users with active accounts on Einstein may access any of eight interactive login nodes. All login nodes are named einstein.navo.hpc.mil (via external DNS lookup) but have separate IP numbers and internal hostnames within the cluster itself. Interactive login access to the internal network of compute nodes is not allowed to users; only batch jobs can run on the compute nodes. All available DoD Kerberos commands (including kftp, krsh and kerberized ssh) may be used to access the login nodes. Here are several examples of logging in, transferring data and remotely running commands on the interactive nodes:
% ktelnet -x -F einstein.navo.hpc.mil % krlogin -x -F einstein.navo.hpc.mil % krcp myfile einstein.navo.hpc.mil:myfile % krsh -x -F einstein # short names allowed inside the Navy DSRC network % ssh einstein # ssh that supports DoD Kerberos tickets % kftp einstein # kftp via default Gig-e network inside Navy lan
Remote user "user1" at another site can log into Einstein as Navy DSRC user "user11", using the "-l login" option and a valid Kerberos ticket:
% krlogin -l user11 -x einstein.navo.hpc.mil
Next we transfer Fortran files from our home site outside Navy to our personal GPFS work filesystem on Einstein (/scr/your_Einstein) and then use krsh to check that the files actually transferred correctly:
* NOTE: The -X option to krcp turns off data encryption for faster transfers.
% krcp -X *.f einstein.navo.hpc.mil:/scr/user11 % krsh einstein.navo.hpc.mil ls -ltr /scr/user11 -rw-r--r-- 1 user11 NA0101 1416 Jul 29 23:31 kticket.f -rw-r--r-- 1 user11 NA0101 4251 Jul 29 23:31 parse_klist.f
For additional information about using or obtaining DoD Kerberos, please refer to the
Instructions for Kerberos.
Interactive login sessions on Einstein and any child processes started from an interactive login shell are allowed a total of 15 minutes cputime. Any parent process and all its children that exceed this time limit will be killed by the system. File transfer commands such as kftp, krcp, rcp, and ftp are exempted from this time limit, but other processing such as shell scripts, tars, gzip, or backgrounded serial processes that require more than 15 minutes of cputime must be submitted to the PBS batch queuing system.
Parallel processing via MPI is only allowed to run in the batch queues; parallel executables under the aprun jobstarter command are not allowed to run interactively on Einstein.
By default, user logins to Einstein are set up with a valid PATH, MANPATH, and other environment variables that give access to the compilers, batch queue commands, Kerberos commands, and most common user-level commands such as tar, make, cat, and so on.
Einstein also has the following environment variables set to access the archive servers and home directory, location of the sample code repository, and location of the COTS software:
$ARCHIVE_HOME /u/home/$login $ARCHIVE_HOST newton.navo.hpc.mil $SAMPLES_HOME /usr/local/Examples_Codes $PET_HOME /usr/local/PET $CSI_HOME /usr/local/CSI
Users who want to add their own personal commands, paths, aliases, and syntax to their environment can create individual .cshrc and .login files (or .profile file for Korn/Bourne shell users) in their home directories.
There are a number of ways to transfer data between Einstein and local Navy DSRC systems or other remote non-Navy systems:
Note: Transfer queue jobs run on login nodes, so both rcp and krcp can be used. However, you can only use non-kerberized rsh/rcp from the compute node internal network (accessible only via PBS batch jobs).
Note: There is no network routing from the compute node internal network on Einstein to any Navy DSRC systems except the RMSS servers. There is no network routing to any offsite systems from Einstein's internal network.
For additional online information regarding accessing the Navy RMSS Archive server (Newton), please refer to the RMSS Overview Page
Einstein has three programming environments available for compiling: Portland Group (PGI), PathScale and GNU. PGI is the default programming environment on Einstein.
The modules utility consists of the "module" command and module files. It initializes the programming environment for specific compilers. It allows an easy way to swap compilers and compiler versions.
The "module" command can be used to change your default programming environment. The main programming environment module for each compiler is listed below:
PrgEnv-pgi PrgEnv-pathscale PrgEnv-gnu
| Command | Description |
|---|---|
| module list | Lists the modules in your environment |
| module avail | Lists available modules |
| module load | Loads a module |
| module show | Shows what a module loads |
| module swap | Swaps two modules |
Examples of each are listed below:
einstein% module list Currently Loaded Modulefiles: 1) modules/3.1.6 12) xt-totalview/8.4.1b 2) MySQL/5.0.45 13) fftw/3.1.1 3) xt-service/2.1.41HD 14) xt-libsci/10.3.0 4) xt-libc/2.1.41HD 15) xt-mpt/2.1.27HD 5) xt-os/2.1.41HD 16) xt-pe/2.1.41HD 6) xt-boot/2.1.41HD 17) xt-asyncpe/1.0c 7) xt-lustre-ss/2.1.41HD_1.6.5 18) PrgEnv-pgi/2.1.41HD 8) xtpe-target-cnl 19) pbs/9.2.2.82426 9) Base-opts/2.1.41HD 20) xtpe-quadcore 10) pgi/7.1.6 21) mpscp/1.2b 11) totalview-support/1.0.5 22) bct-env/0.1 einstein% module avail ----------------------- /opt/xt-asyncpe/1.0c/modulefiles ----------------------- xtpe-quadcore ------------------------------- /opt/modulefiles ------------------------------- Base-opts/2.1.27HD pgi/7.1.6(default) Base-opts/2.1.27HD.lusrelsave pgi/7.2.2 Base-opts/2.1.41HD(default) pgi/7.2.3 Base-opts/2.1.41HD.lusrelsave pkgconfig/0.15.0(default) MySQL/5.0.45 totalview-support/1.0.5(default) PrgEnv-gnu/2.1.27HD xt-asyncpe/1.0 PrgEnv-gnu/2.1.41HD(default) xt-asyncpe/1.0c(default) PrgEnv-pathscale/2.1.27HD xt-boot/2.1.27HD PrgEnv-pathscale/2.1.41HD(default) xt-boot/2.1.41HD PrgEnv-pgi/2.1.27HD xt-catamount/2.1.27HD PrgEnv-pgi/2.1.41HD(default) xt-catamount/2.1.41HD acml/4.0.1a(default) xt-craypat/4.2 apprentice2/4.3.0(default) xt-craypat/4.3.1 bct-env/0.1 xt-craypat/4.3.2(default) blcr/0.7.3 xt-libc/2.1.27HD cray/audit/1.0.0-1.0000.13871.0 xt-libc/2.1.41HD dwarf/8.2.0(default) xt-libsci/10.2.1 elf/0.8.10(default) xt-libsci/10.3.0(default) fftw/2.1.5 xt-lustre-ss/2.1.27HD_1.6.5 fftw/3.1.1(default) xt-lustre-ss/2.1.27HDa_1.6.5 gcc/4.1.2 xt-lustre-ss/2.1.41HD_1.6.5 gcc/4.2.0.quadcore(default) xt-mpt/2.1.27HD(default) gcc/4.2.3 xt-mpt/2.1.41HD gcc-catamount/3.3 xt-mpt/3.0.0 gnet/2.0.5 xt-mpt/3.0.2 iobuf/1.0.6(default) xt-os/2.1.27HD java/jdk1.6.0_05(default) xt-os/2.1.41HD libscifft-pgi/1.0.0(default) xt-papi/3.5.99c(default) modules/3.1.6(default) xt-pe/2.1.27HD mpscp/1.2b xt-pe/2.1.41HD pathscale/3.2(default) xt-service/2.1.27HD pbs/9.2.2.82426 xt-service/2.1.41HD petsc/2.3.3a(default) xt-totalview/8.4.1b(default) petsc-complex/2.3.3a(default) xtgdb/1.0.0(default) pgi/6.2.5 xtpe-target-catamount pgi/7.0.7 xtpe-target-cnl ------------------------------ /opt/modules/3.1.6 ------------------------------ modulefiles/modules/dot modulefiles/modules/modules modulefiles/modules/module-cvs modulefiles/modules/null modulefiles/modules/module-info modulefiles/modules/use.own einstein% module load fftw einstein% module show fftw ------------------------------------------------------------------- /opt/modulefiles/fftw/3.1.1: conflict fftw/2.1.5 prepend-path PATH /opt/fftw/3.1.1/cnos/bin prepend-path LD_LIBRARY_PATH /opt/fftw/3.1.1/cnos/lib prepend-path MANPATH /opt/fftw/3.1.1/cnos/man append-path PE_PRODUCT_LIST FFTW setenv FFTW_POST_LINK_OPTS -L/opt/fftw/3.1.1/cnos/lib -lfftw3 -lfftw3f setenv FFTW_INCLUDE_OPTS -I/opt/fftw/3.1.1/cnos/include setenv FFTW_DIR /opt/fftw/3.1.1/cnos/lib setenv FFTW_INC /opt/fftw/3.1.1/cnos/include ------------------------------------------------------------------- einstein% module swap PrgEnv-pgi PrgEnv-pathscale
It is recommended that the fortran, C and C++ compilers are called with the following Cray Compiler scripts:
| Compiler Script | Description |
|---|---|
| ftn | Invokes Fortran 90 / Fortran95 |
| cc | Invokes C |
| CC | Invokes C++ |
The following table lists the vendor compilers available with a brief description for each programming enviroment available on Einstein.
| Compiler | Description |
|---|---|
| pgf90 | Fortran 90/95 Compiler Invokes the Fortran compiler, assembler and linker with options derived from its command line arguments. |
| pgf77 | Fortran 77 Compiler Inovkes the Fortran compiler, assembler and linker with options derived from its command line arguments. |
| pgcc | C Compiler |
| pgCC | C++ Compiler |
| Option | Description |
|---|---|
| -c | Instructs the compiler to only compile and assemble source code without the link step. |
| -fast |
These options create a generally optimal set of flags for targets that support SSE/SSE2 capability. They incorporate optimization options to enable use of vector streaming SIMD instructions (64-bit targets) and enable vectorization with SEE instructions, cache aligned and flushz. |
| -g | Instructs the compiler to include symbolic debugging information in the object module. |
| -gopt | Instructs the compiler to include symbolic debugging information in the object file, and to generate optimized code identical to that generated when -g is not specified. |
| -help | Provides information about available options. |
| -mcmodel=medium | Enables medium=model core generation for 64-bit targets; useful when the data space of the program exceeds 4GB. |
| -Mconcur | Instructs the compiler to enable auto-concurrentization of loops. If specified, the compiler uses multiple processors to execute loops that it determines to be parallelizable; thus, loop iterations are split to execute optimally in a multithreaded execution context. |
| -Minfo | Instructs the compiler to produce information on standard error. |
| -Minline | Enables function inlining. |
| -Mipa=fast,inline | Enables interprocedural analysis and optimization. Also enables automatic procedure inlining. |
| -Mpfi or -Mpfo | Enable profile feedback driven optimizations. |
| -Mkeepasm | Keeps the generated assembly files. |
| -Munroll | Invokes the loop unroller to unroll loops, executing multiple instances of the loop during each iteration. This also sets the optimization level to 2 if the level is set to less than 2, or if no -O or -g options are supplied. |
| -M[no]vect | Enables/Disables the code vectorizer. |
| --[no_]exceptions | Removes exception handling from user code. For C++, declares that the functions in this file generate no C++ exceptions, allowing more optimal code generation. |
| -o | Names the output file. |
| -O<level> | Specifies code optimization level where <level> is 0, 1, 2, 3, or 4. |
| -tp <target> [,target...] | Specify the target processor(s); for the 64-bit compilers, more than one target is allowed, and enables generation of PGI Unified Binary executables. |
| Compiler | Description |
|---|---|
| pathf95 | Fortran 77, 90 and 95 Compiler Invokes the Fortran 77, Fortran 90, and Fortran 95 compiler. |
| pathf90 | Fortran 77, 90 and 95 Compiler Alias for pathf95 Invokes the Fortran 77, Fortran 90, and Fortran 95 compiler. |
| pathcc | C Compiler Invokes the C compiler and the source preprocessor on source files that contain C code. |
| pathCC | C++ Compiler Invokes the C++ compiler |
| Option | Description |
|---|---|
| -show-defaults | List default compiler options for the compiler and exits. |
| -c | Generate intermediate object file but does not attempt to link. |
| -g | Adds information for debugging to the object file and/or executable. |
| -I<directory> | Tells the preprocessor to search in directory for include or module files. |
| -L<directory> | Tells the linker to search in directory for libraries. |
| -r8 | Promotes REALs from the default size of 4 bytes to 8 bytes. |
| -i8 | Promotes INTEGERs from the default size of 4 bytes to 8 bytes. |
| -default64 | Passes the -i8 and -r8 options to the compiler. |
| -O3 | Higher level of optimization than -O2 (the default optimization level). |
| -cpp | Preprocess files with the C preprocessor. Enabled by default for files ending in .F,.F90, or .F95. |
| -ftpp | Preprocess files with the Fortran preprocessor. Useful when portions of the Fortran code could be misinterpreted as C preprocessor directives (e.g. "//") |
| -O3 | Higher level of optimization than -O2 (the default optimization level). |
| -O3 -OPT:Ofast | Higher optimization level than -O3 |
| -ipa | Tells the compiler to perform interprocedural analysis. Can be very time consuming to perform. This flag should also be used in both compilation and linking steps. Not recommended for programs over 100,000 lines for the current compiler release. |
| -intrinsic=PGI | Enables intrinsic functions that are available in the PGI compiler which are not ANSI standard (e.g. rand) |
| -apo | Enables autoparallelization. |
| -mp | Enables parallelization via OpenMP directives. |
| Option | Description |
|---|---|
| -show-defaults | List default compiler options for the compiler and exits. |
| -c | Generate intermediate object file but does not attempt to link. |
| -g | Adds information for debugging to the object file and/or executable. |
| -I<directory> | Tells the preprocessor to search in directory for include or module files. |
| -L<directory> | Tells the linker to search in directory for libraries. |
| -O3 | Higher level of optimization than -O2 (the default optimization level). |
| -Ofast | Higher level optimization (default is -O2). This flag should be used in both compilation and linking steps. |
| -ipa | Tells the compiler to perform interprocedural analysis. This option can be very time consuming to perform. This flag should be used in both compilation and linking steps. Not recommended for programs over 100,000 lines for the current compiler release. |
| -apo | Enables autoparallelization. |
| -mp | Enables parallelization via OpenMP directives. |
| Compiler | Description |
|---|---|
| gfortran | Fortran Compiler |
| gcc | C and C++ Compiler |
| g++ | C and C++ Compiler |
PGI Compilers - http://www.pgroup.com/resources/docs.htm
PathScale Compilers - http://www.pathscale.com/docs.html
Additional information about the compilers under each programming environment, can be found in the manpages on the system.
There are three programming models supported on EINSTEIN: Message Passing Interface (MPI), Shared Memory (SHMEM) and Open Multi-Processing (OpenMP).
When creating an MPI program, ensure the Message Passing Toolkit (MPT) is loaded. MPT should be loaded in the default programming environment. Users can verify this by running the "module list" command. If it is not listed, then run the "module load xt_mpt" to load. Additional information on modules can be found under the "Programming Environment Section".
To compile an MPI program in C or Fortran, use the following examples:
cc -o mpiprog.exe mpi_prog.c
ftn -o mpiprog.exe mpi_prog.f
To run an MPI program within a batch script, use the following command:
aprun -n N $WORKDIR/mympidirectory/mpiprog.exe
The aprun utility executes across a specified number of compute nodes. The "-n N" option specifies the number of cores to start. Please note the aprun utility only works in a Lustre-mounted file system. Users should ensure that all files needed by the MPI job are located in $WORKDIR. Additional information on the aprun utility can be found in the online manpages.
The syntax used in the MPI compile and run examples above can be used in all three programming environments: PGI, PathScale or GNU.
When creating an SHMEM program, ensure the Message Passing Toolkit (MPT) is loaded. MPT should be loaded in the default programming environment. Users can verify this by running the "module list". If it is not listed, then run the "module load xt_mpt" to load. Additional information on modules can be found under the "Programming Environment Section".
The SHMEM routines can be found in libsma.a and are linked with the compiler option "-lsma". They are included in the standard Cray compilers, but can be specified on the compile line.
To compile a SHMEM program in C or Fortran, use the following examples:
cc -lsma -o shmemprog.exe shmem_prog.c
ftn -lsma -o shmemprog.exe shmem_prog.f
To run a SHMEM program within a batch script, use the following command:
aprun -n N $WORKDIR/myshmem/shmemprog.exe
The syntax used in the SHMEM compile and run examples above can be used with all three programming environments: PGI, PathScale or GNU.
The aprun utility executes across a specified number of compute nodes. The "-n N" option specifies the number of cores to start. Please note the aprun utility only works in a Lustre-mounted file system. Users should ensure that all files needed by the SHMEM job are located in $WORKDIR. Additional information on the aprun utility can be found in the online manpages.
When creating an OpenMP program, ensure the Message Passing Toolkit (MPT) is loaded. MPT should be loaded in the default programming environment. Users can verify this by running the "module list". If it is not listed, then run the "module load xt_mpt" to load. Additional information on modules can be found under the "Programming Environment Section".
To compile an OpenMP program in C or Fortran, use the following examples:
cc -o openmpprog.exe -mp=nonuma openmp_prog.c
ftn -o openmpprog.exe -mp=nonuma openmp_prog.f
cc -o openmpprog.exe -mp openmp_prog.c
ftn -o openmpprog.exe -mp openmp_prog.f
cc -o openmpprog.exe -fopenmp openmp_prog.c
ftn -o openmpprog.exe -fopenmp openmp_prog.f
The syntax for the OpenMP compile examples above is different for each programming environment.
To run an OpenMP program within a batch script, use the following command:
aprun -n 1 -d 4 openmpprog.exe
The syntax for the OpenMP run example above can be used will all three programming environments: PGI, PathScale or GNU.
The aprun utility executes across one compute node with four threads. Also, the environment variable OMP_NUM_THREADS needs be set to the number of threads. And, as in the previous programming models, aprun ony works in a Lustre-mounted file system. Users should ensure that all files needed by the OpenMP job are located in $WORKDIR. Additional information on the aprun utility can be found in the online manpages.
LibSci is Cray's optimized numerical routines library. The library is loaded by default and contains BLAS (levels 1, 2, and 3), LAPACK, SCALAPACK and BLACS routines.
This library, along with the MPI library and the ACML library (if the ACML module is loaded), is added to the linker options by default when using the Cray compiler wrappers (i.e. ftn, cc, CC).
ACML contains BLAS (levels 1, 2, and 3), LAPACK and FFT routines. This library is not loaded by default and can be by issuing the following module command:
einstein%module load acml
Once this library is loaded to your default programming environment, it will take precedence over the LibSci routines.
The Portable Batch System (PBS) is the default batch queue system on Einstein. It works much the same as other batch queue packages like LoadLeveler, NQS or LSF, but uses different keywords to specify options to the scheduler. PBS keeps track of all queues and user jobs and starts, holds, and ends batch jobs on the Cray.
There are nine queues available on Einstein: standard, high, challenge, urgent, bigmem, transfer, analysis, debug and background. The limits for these queues are listed in the table below.
| Queue Name | Max Nodes / Cores | CPU Time Limit | Comments |
|---|---|---|---|
| standard | 256 / 2048 | 168 hrs. | Non-Challenge Jobs |
| high | 512 / 4096 | 168 hrs. | High Priority Jobs |
| challenge | 512 / 4096 | 168 hrs. | Challenge Jobs |
| urgent | TBD / TBD | TBD | Restricted Access |
| bigmem | 7 / 56 | 24 hrs. | Large Memory Jobs |
| transfer | 1 / 1 | 12 hrs. | Transfer Jobs |
| analysis | 1 / 1 | 8 hrs. | Serial Jobs |
| debug | 64 / 512 | 30 mins. | Debug Jobs |
| background | 64 / 512 | 4 hrs. | Negative Allocation |
A brief summary of standard PBS commands is given in the following table. There are also manpages for each command on Einstein:
| Command | Description |
|---|---|
| qsub script | Submit a job_script (Once job is submitted, PBS assigns it a unique jobID number) |
| qstat -a | Check all jobs under PBS |
| qstat jobID | Check one job |
| qstat -u login | Check one user's jobs |
| qstat -f jobID | Obtain detailed status on a job |
| qdel jobID | Cancel a running/queued Job |
| qstat -Q | List all PBS Queues |
The "qsub" command is used to submit jobs to the queues. Much like using LSF, one can submit jobs directly from the command line with qsub. However, it is easier to create a scriptfile that contains all the necessary PBS options and user commands needed to run a job, and then qsub the script. PBS allows you to embed options into scripts. The string "#PBS" is used to start each valid PBS option so it will be correctly parsed by the qsub command.
einstein% qsub serialjob.pbs
PBS submit script located at "serialjob.pbs".
Filtering user12 submission...
***
Job script serialjob.pbs parsed as:
#PBS -N serialjob
#PBS -o serialjob.out
#PBS -e serialjob.err
#PBS -A NAVOSLMA
#PBS -l walltime=0:30:00
#PBS -q debug
#PBS -l mppwidth=1
echo hello
***
Job will submit via project NAVOSLMA
Walltime directive is: #PBS -l walltime=0:30:00
Queue is "standard".
Submitting "serial.pbs" ...
35095.sdb
user12 currently has 1 SUBMITTED jobs - of 40 permitted.
(running plus queued.)
user12 currently has 0 RUNNING jobs - of 10 permitted.
You currently have the following jobs:
sdb:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
35095.sdb user12 standard serial. -- 1 1 -- 00:30 Q --
The "qstat -a" command is used to view information about queued and running jobs. By default, qstat outputs the following information for all running (R) and waiting (Q) jobs:
einstein% qstat -a
sdb:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
35095.sdb user11 standard tjob1. -- 1 1 -- 48:00 H --
35100.sdb user11 standard tjob2. 25951 1 1 -- 48:00 R 32:48
35102.sdb user12 transfer tran1. 705 1 1 -- 10:00 R 00:58
35103.sdb user12 transfer tran2. 1039 1 1 -- 10:00 R 00:03
. Col 1 is the job identifier (Job ID)
. Col 2 lists the owner (Username) of the job
. Col 3 lists the name of the queue (Queue) the job has been submitted to
. Col 4 lists the job name (Jobname)
. Col 5 lists the session ID (SessID)
. Col 6 lists number of nodes (NDS) required
. Col 7 lists number of tasks (TSK) required
. Col 8 lists the required memory (Req'd Memory)
. Col 9 lists the required time (Req'd Time)
. Col 10 lists the job status (S)
. Col 11 lists the elapsed time (Elap Time)
To view all jobs for a user on Einstein, use "qstat -u login", where "login" is the user's login name:
einstein% qstat -u user11
sdb:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
35095.sdb user11 standard tjob1 -- 1 1 -- 48:00 H --
35100.sdb user11 standard tjob2 25951 1 1 -- 48:00 R 32:48
To remove a job that is queued to run or already running, use the "qdel" command with the JobID of the job to be removed. Users can only kill their own batch jobs:
einstein% qdel 227053 Job <227053> is being terminated
The "qstat" command lists the various queues on Einstein and basic information about them.
einstein% qstat -q
server: sdb
Queue Memory CPU Time Walltime Node Run Que Lm State
---------------- ------ -------- -------- ---- ----- ----- ---- -----
challenge -- -- 168:00:00 -- 0 0 -- E R
standard -- -- 168:00:00 -- 4 0 -- E R
transfer -- -- 12:00:00 -- 0 0 16 E R
bigmem -- -- 24:00:00 -- 0 0 -- E R
urgent -- -- 24:00:00 -- 0 0 1 D S
high -- -- 168:00:00 -- 0 0 -- E R
debug -- -- 00:30:00 -- 0 0 1 E R
background -- -- 04:00:00 -- 0 0 1 E R
----- -----
4 0
Once queued, jobs will run as soon as possible, given the current queuing algorithm. The more precise you can be about your job's resource requirements (especially the walltime limit), the quicker your job may execute.
A good starting template for a PBS script to run a serial job for 1 hour is show below:
#!/bin/csh #PBS -N serialjob #PBS -o serialjob.out #PBS -e serialjob.err #PBS -A NAVOSLMA #PBS -l walltime="1:00:00" #PBS -q standard #PBS -l mppwidth=1 # ## Run serial fortran executable on 1 processor cd $WORKDIR aprun ./serial_fort.exe ##End of Sample PBS Script
The first few lines tell PBS to save the standard output and error output to the given files and give the job a name. Skipping ahead, we estimate the run-time to be about 1 hour and know that this is acceptable for the standard batch queue. We need 1 CPU in total, so we request 1 CPU.
Here is an example of a PBS script to run a parallel job:
#!/bin/csh
#PBS -N run_par
#PBS -o run_par.out
#PBS -e run_par.error
#PBS -A Your_Project_Name
#PBS -l walltime="12:00:00"
#PBS -l mppwidth="256"
#PBS -l mppnppn="4"
#PBS -q standard
# ----------------------------------
#
# Set appropriate data and output directories variable names
setenv DATADIR $WORKDIR/DATA
setenv OUTPUTDIR $WORKDIR/OUTPUT
# Create the directories
#
mkdir -p $DATADIR
mkdir -p $OUTPUTDIR
#
# Copy executable and any required input files to your work directory
# located under the /scr Lustre filesystem.
cp $HOME/my_MPI_code $WORKDIR
cp $HOME/{namelist,myinput} $DATADIR
cd $WORKDIR
#
# Run the MPI job with the Cray "aprun" command:
#
aprun -N 4 -n 256 ./my_MPI_code
#
# Archive any output from the job to your home directory on Einstein
# or to your home directory on the archive server, Newton.
#
/usr/bin/rcp $OUTPUTDIR/* newton:$ARCHIVE_HOME/
/usr/bin/rcp restart newton:$ARCHIVE_HOME/restart
# ----- End of sample PBS script -----
The first few lines tell PBS to save the standard output and error output to the given files and give the job a name. Skipping ahead, we estimate the run-time to be about 12 hours and know that this is acceptable for the standard batch queue. The next couple of lines set the number of cores and the number of PEs per node. This job is requesting 256 cores and 4 PEs letting the job run on 64 nodes. The default value for number of PEs is 8.
Cray Documents - http://docs.cray.com/