You are on page 1of 58

Introduction to NCI

National Computational Infrastructure

Download training materials here:


http://nci.org.au/services-support/training/

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Outline
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting

2 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

What is the NCI?


I
I

Peak Facility, Raijin, Cloud Service and Data management


Specialised Support
I
I
I
I
I

Climate system science


Astronomy
Earth Observation
Geophysics
Cloud Computing

3 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Allocation Schemes
I

National Computational Merit Allocation Scheme


I

Partner allocations
I
I

Major Partners: e.g. CSIRO, INTERSECT, GA, QCIF, BoM


University Partners: e.g. ANU, Monash, UNSW, UQ, USyd, Uni
Adelaide, Deakin

Flagship Projects
I

NCMAS includes NCI(raijin), iVEC(magnus, epic, fornax),


VLSCI(avoca), SF in Bioinformatics(barrine) and SF in Imaging and
Visualisation(MASSIVE).

Astronomy/Astrophysics, CoE in Climate Systems Science, CoE


Optics

Startup allocation
Director

4 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Distrubtions of Allocations :2014


Approximate distribution of allocations across all compute systems for
2014:
I
I
I
I
I
I
I
I
I

NCMAS 15%
CSIRO 21.4%
BOM 18.9%
ANU 17.7%
Flagships 5.0% (including CoECSS, TERN, Astro, CoE Optics)
INTERSECT 3.8%
GA 3.4%
Monash, UNSW, UQ, USyd, Uni Adelaide, 1.7% each
Directors share, QCIF, Deakin, MSI 6.3% in total

5 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

NCI HPC System


Integrated Infrastructure and Services
I
I
I
I
I

RAIJIN Fujitsu Primergy information.


Lustre Filesystems - raijin (/home and /short) and global (/g/data)
Cloud - OpenStack cloud (hosting services, specialised virtual labs,
new services, special interactive use)
High-end visualisation services and support (Vizlab)
Software Packages

6 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Getting Information
I
I
I
I
I
I
I
I
I

URL http://nci.org.au/
Detailed usage information
Raijin Quick Reference Guide
Detailed software information
Raijin FAQs
/g/data FAQs
Message of the Day (/etc/motd)
Emergency and Downtime Notices
NCI help email help@nci.org.au

7 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

New Petascale System


Fujitsu Primergy - raijin
I
I
I
I
I
I
I

3592 2X Intel Sandy Bridge E5-2670 (8 core, 2.6GHz)


57472 cores
Total memory 158Tb
Lustre filesystems: (/short, /home, /g/data)
$PBS JOBFS local to each node.
Infiniband network
See the system being installed.

8 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Cloud
NCIs Cloud services focus around:
I
I
I

Computation using the cloud


Data services using the cloud
Complementary services to NCIs HPC that are best provided
through cloud

NCI offers a NeCTAR node (National eResearch Collaboration Tools and


Resources):
I

Designed to optimize for computation and floating point (Intel


CPUs)

Designed for high speed data transfer (56Gigabit network between


nodes)

Designed for high speed IO (All SSD disk storage in the cloud)

NCI can offer a high speed interconnect between the NCI Lustre based
filesystems and NCI Cloud services.
9 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Data Storage
I
I

global Lustre filesystem /g/data/ - stores persistent data,


mounted on raijin and cloud nodes.
Mass Data storage - HSM storage with dual copies across two NCI
data centres. Effective storage for managing data that can be
staged in/out as part of batch processing.
RDSI national data collections - to be stored across the NCI data
resources listed above.

10 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Outline
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting

11 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

How to Apply a New Project (for CI)


I
I

Project leaders (Chief Investigators) will fill out on-line forms with
required details and be given a project ID.
Application process:
I
I
I
I

Partner (anytime)
Merit scheme (once a year, deadline Nov)
Start-up (anytime, max 5000 SU per year)
Commercial (anytime)

12 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

How to Apply a New Account (for User)


I
I
I
I
I
I
I
I
I

Register as a New User: register first. The registration ID is a


number such as 12345, it is not a user ID.
Connect to Project: connection form should be submitted.
Accounts are set up when a CI approves a connection request.
New user will receive an email with account details.
NCI usernames are of the form abc123 - abc for your initials and
123 for affiliation.
Passwords are sent by SMS to the mobile number provided when
you registered.
Passwords can be given over the phone if necessary, but not by
email.
Use the passwd command to change this when you first log in.
An automated on-line tool for users to set passwords is being
developed, expected availability mid 2015.
13 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Project accounting
I
I
I
I

All use on the compute systems is accounted against projects. Each


project has a single grant of time per 3 month quarter.
If your username is connected to more than one project you will
have to select which project to run jobs under.
A project may span several stakeholders (eg BoM and CSIRO).
To change or set the default project, edit your .rashrc file in your
home directory, and change the PROJECT variable as desired. A
typical .rashrc file looks like
setenv PROJECT c25
setenv SHELL /bin/bash

Login after editing .rashrc to see the changes.

14 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Default Project
I

The following displays the usage of the project in the current quarter
against each of the stakeholder funding the project.
nci_account

By adding -v you can see who is using the compute time.


nci_account -v

You can also use -P for other project and -p for different quarter, ie:
nci account -P c25 -p 2014.q2 -v

I
I

Further information will be presented under nci account - most


notably storage usage.
If you have a project that is externally funded and requires more
resource than provided, please contact us. It is possible to set up
special funding, and track under nci account.
15 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Outline
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting

16 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Establish Connection
I

Connection under Unix/Mac:


I
I
I

For ssh - ssh (terminal)


For scp/sftp - scp/sftp (terminal)
For X11 - ssh -X, make sure to install XQuartz for OSX 10.8 or
above. (terminal)

Connection under Windows:


I
I
I

For ssh - putty, mobaxterm


For scp/sftp - putty, Filezilla, winscp
For X11 - Cygwin, XMing, mobaxterm, Virtual Network Computing.

Caution!
Be sure to logout of xterm sessions, and quit the Window Manager
before leaving the system.

17 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Connecting to raijin
The hostname of the Fujitsu Primergy Cluster is
raijin.nci.org.au

and can be accessed using the secure shell (ssh) command, for example,
ssh -X abc123@raijin.nci.org.au

Your ssh connection will be to one of six possible login nodes, raijin{1,6}
(If ssh to raijin fails, you should try specifying one of the nodes, i.e.
raijin3.nci.org.au).

18 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Secure use of ssh


passphrase-less ssh keys: allow ssh to log in without a password.

Caution!
Day-to-day use is strongly discouraged.
This considerably weakens both NCI and home institution system
security. (Instead consider a key with passphrase + ssh-agent on your
workstation.)
Can be useful to support copyq batch jobs:
I
I

Generate a new key specifically for such transfers


Use rrsync to restrict what it can do

More information: Using ssh keys


19 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Outline
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting

20 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

UNIX environment
The working environment under UNIX is controlled by shells
(command-line interpreter). The shell interprets and executes user
commands.
I
I
I
I

The default is bash shell (also popular is tcsh, you may use ksh)
Shell can be changed by modifying .rashrc
Shell commands can be grouped together into scripts
Unix Quick Reference Guide

Note
Unix is case sensitive!!

21 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

UNIX environment
The shell provides environment variables that can be accessed across all
the processes initiated from the original shell e.g. login environment.
exec on login and compute nodes
exec on login nodes only
modules

csh/tcsh
.cshrc
.login
.login

sh/bash/ksh
.bashrc
.profile
.profile

tcsh syntax
setenv VARIABLE value

bash syntax
export VARIABLE=value

For an explanation of environment variables see Canonical user


environment variables
22 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Environment Modules
Modules provide a great way to easily customize your shell environment
for different software packages. The module command syntax is the
same no matter which command shell you are using.
Various modules are loaded into your environment at login to provide a
workable environment.
module list
module avail
module show name
module load
module unload

# To see the modules loaded


# To see the list of software for which environments
have been set up via modules
# To see the list of commands that are carried out
in the module
# To load the environment settings required by a
software package
# To remove extras added to the environment for a
previously loaded software package. This is
extremely useful in situations where different
package settings clash.
23 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Environment Modules
Note
To automate environment customisation at login module load
commands can be added to the .login (tcsh) or .profile (bash) files.
Users should be aware that different applications can have incompatible
environment requirements so loading multiple application modules in
your dot file may lead to problems. We recommend that modules are
loaded in scripts as needed at runtime and likewise discourage the use of
module commands in shell configuration (dot) files.
More advanced information on modules can be found in the Modules
User Guide.

24 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Editors
Several editors are available
I
I
I

vi
emacs
nano

If you are not familiar with any of these you will find that nano has a
simple interface. Just type nano.

Caution!
Use dos2unix if your input/job script files were edited on a windows
machine.

25 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Exercise 1: Getting started


Logging on to raijin - use the course account.
ssh -X aaa777@raijin.nci.org.au

Remember to read the Message of the Day (MOTD) as you login.


Commands to try:
hostname
nci_account
module list
module avail

#
#
#
#

to see the node you are logged into


to see the current state of the project
to check which modules are loaded on login
to see which software packages are installed
and accessible in this way.
module show pbs # to see what environments are set by a module

Note
In .cshrc (tcsh) or .bashrc (bash) that the intel-fc, intel-cc and openmpi
modules are loaded by default.
26 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Outline
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting

27 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Batch Queueing System


I

Most jobs require greater resources than are available to interactive


processes and must be scheduled by the batch job system
(interactive mode available).
Queueing system:
I
I

I
I
I

distributes work evenly over the system


ensures that jobs cannot impact each other (e.g. exhaust memory or
other resources)
provides equitable access to the system

Raijin uses a customised version of PBSPro.


nf limits display the limits that are set for your projects.
Default queue limit

Note
Job charging is based on wall clock time used, number of cpus requested,
queue choice.
28 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Queue Limit

29 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Batch queue structure


I

normal
I
I
I
I

express
I
I
I

Default queue designed for production use


Charging rate of 1 SU per processor-hour (walltime) on raijin
Requests for ncpus > a node (16 cores) need to be in multiples of 16.
If your grant is exhausted -> lower priority (bonus).
High priority for testing, debugging etc.
Charging rate of 3 SUs per processor-hour (walltime)
Smaller limits to discourage production use
(ncpus limits to 128, memory per core is 32GB, check nf limits for
project-specific detail. )

copyq
I

Used for file manipulation - e.g. copying files to MDSS

30 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Using the Queueing System

I
I
I

Read the How to Use PBS


Use nf limits to see your user/project queue limits.
Request resources for your job (using qsub).
I
I
I
I

walltime
memory (32GB, 64GB, 128GB per node)
disk (jobfs)
number of cpus

PBSPro will then


I
I
I
I
I

schedule the job when the resources become available


prevent other jobs from infringing on the allocated resources
display progress of the jobs (qstat, nqstat or nqstat anu)
terminate the job when it exceeds its requested resources
return stdout and stderr in batch output files
31 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Job Script Example


Example
#!/bin/bash
#PBS
#PBS
#PBS
#PBS
#PBS
#PBS

-l
-l
-l
-l
-l
-l

walltime=20:00:00
mem=2GB
jobfs=1GB
ncpus=16
software=xxx (for licenced software)
wd (to start the batch job in the working
directory from which it was submitted.)

my_program.exe

32 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Job Scheduling
I
I

Job priority is based on resourse requested, currently running jobs


under the user/project, and grant allocation.
Jobs start when sufficient resources are available. (qstat -s
jobid to see comment why its not running)

Tips
I
I

Near the end or at beginning of a quarter, busy period.


higher priority
I
I
I

shorter walltime request.


smaller memory request.
larger number of cpus request (to some extend).

33 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Long-running jobs

34 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Long-running jobs
I
I

When run jobs last longer than the queue limits


checkpoint/restart functionality is recommended for workflows that
require long run times. Long run times expose users to system
and/or numerical instabilities.
Example scripts for self-submitting jobs can be found at FAQs

Caution!
Checkpoint/restart is not a filesystem or PBSPro capability - It must be
implemented by the user or software vendor.

35 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

stdout and stderr files


PBSPro returns the standard output and standard error from each job in
.o***** and .e***** files, respectively.

Example script.o123456
============================================================
Resource Usage on 2013-07-20 12:48:04.355160:
JobId:
123456.r-man2
Project:
c25
Exit Status: 0 (Linux Signal 0)
Service Units: 0.01
NCPUs Requested: 1
NCPUs Used: 1
CPU Time Used: 00:00:43
Memory Requested: 50mb
Memory Used: 13mb
Vmem Used: 52mb
Walltime requested: 00:10:00
Walltime Used: 00:00:49
jobfs request: 100mb
jobfs used: 1mb
============================================================
36 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

stdout and stderr files


I
I

.o***** file contains the output arising from the script (if not
redirected in the script) and additional information from PBS.
.e***** file contains any error output arising from the script (if not
redirected in the script) and additional information from PBS. For a
successful job it should be empty.
Common errors to look for in the .e***** file:
I
I

Command not found. (check module list, path)


=>> PBS: job terminated: walltime 172818sec exceeded limit
172800sec (Increase runtime request)
=>> PBS: job terminated: per node mem 2227620kb exceeded limit
2097152kb (Increase memory per node request)
Segmentation fault. (check your program)

37 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Monitoring the progress of jobs


Useful commands
qstat
nqstat
nqstat_anu
qstat -s
qps jobid
qls jobid
qcat jobid
qcp jobid
qdel jobid

#
#
#
#
#
#
#
#
#

show the status of the PBS queues


enhanced display of the status of the PBS queues
enhanced display of the status of the PBS queues
display additional comment on the status of the job
show the processes of a running job
list the files in a job's jobfs directory
show a running job's stdout, stderr or script
copy a file from a running job's jobfs directory
kill a running job

Caution!
Please use nqstat anu -a | grep $USER to see the cpu% of your
jobs. An efficient parallel job should be close to 100%.

38 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Exercise 2: Submitting jobs to the batch queue


cd /short/$PROJECT/$USER/
tar xvf /short/c25/intro_exercises.tar
cd INTRO_COURSE
cat runjob
qsub runjob
watch qstat -u $USER
... (wait until job finishes, use Ctrl+C to quit)...

runjob
I

This job searches the first n prime number. Please feel free to
change the number n, or the PBS resource to see the behaviour of
the outcome.

View the output in the file runjob.o**** and any error messages
in runjob.e**** after the job completes.
39 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Interactive jobs
When running jobs on login nodes, users may see the following message
when running interactive process on login nodes:
RSS exceeded.user=abc123, pid=12345, cmd=exe,
rss=4028904, rlim=2097152 Killed
Each interactive process you run on the login nodes has imposed on it a
time (30mins) limit and a memory use (2GB) limit. If you want to run
longer or more memory intensive interactive job, please submit an
interactive job.
I The -I option for qsub will result in an interactive shell being
started out on the compute nodes once your job starts.
I A submission script cannot be used in this mode you must provide
all qsub options on the command line.
I To use X windows in an interactive batch job, include the -X option
when submitting your job this will automatically export the
DISPLAY environment variable.
40 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Exercise 3: Interactive Batch Jobs


Sometimes the resource requirements (mem, walltime etc) are larger than
allowed. You can run an interactive batch job as follows:
qsub -I -l walltime=00:10:00,mem=500Mb -P c25 -q express -X
qsub: waiting for job 215984.r-man2 to start
qsub: job 215984.r-man2 ready
[aaa777@r73 ]$ xeyes &
[aaa777@r73 ]$ module list
Currently Loaded Modulefiles:
1) pbs
4) intel-fc/12.1.9.293
2) dot
5) openmpi/1.6.3
3) intel-cc/12.1.9.293
[aaa777@r73 ]$ cd /short/$PROJECT/$USER/INTRO_COURSE
[aaa777@r73 ]$ ./matrix.exe (use Ctrl+C to quit)
[aaa777@r73 ]$ logout
qsub: job 215984.r-man2 completed
41 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Outline
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting

42 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Filesystems
Things to consider I
I
I
I
I
I
I

Transferring large data files to and from raijin: scp, rsync, filezilla
Use designated data mover nodes, not interactive login nodes.
r-dm.nci.org.au
How much data do you really need to keep?
Do you need metadata or a self-describing file format?
Decide on a structure for archived data before you start.
Staging in archived data from tape (Offline) to disk before starting
jobs.
Archiving results automatically at the end of batch jobs.

43 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

RAIJIN Filesystems Overview


The Filesystems section of the userguide has this table in greater detail:
Filesystem

Purpose

Quota

Backup

Availability

Time limit

/home

Irreproducible data eg.


source code

2GB (user)

Yes

raijin

None

/short

Input/output data files

72GB (project)

No

raijin

365 days

/g/data/

Processing large data

project dependent

No

Global

No

$PBS JOBFS

IO intensive data

100MB per node


default

No

Local to node

Duration of job

MDSS

Archiving
files

20GB

2
copies
in
two
different
locations

External
access
using
mdss
commands

No

large

data

Note
These limits can be changed on request.
44 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Monitoring disk usage


I
I
I

lquota gives lustre filesystem usage (/home, /short, /g/data).


nci account gives other filesystem usage (/short, /g/data, mdss)
short files report
gdata1 files report
gdata2 files report gives breakdown:
-G <project> lists files owned by group <project>.
-P <project> lists files in /short/<project>.

Caution!
/short and /g/data are not backed up so it is the users responsibility to
make sure that important files are archived to the MDSS or off-site.

45 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Input/Output Warning
I

Lots of small IO to /short (or /home) can be very slow and can
severely impact other jobs on the system.

Avoid dribbly IO, e.g. writing 2 numbers from your inner loop.
Writing to /short every second is far too often!

Avoid frequent opening and closing of files (or other file operations)

Use /jobfs instead of /short for jobs that do lots of file


manipulation

To achieve good IO performance, try to read or write binary files in


large chunks (of around 1MB or greater)

46 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Exercise 4: Writing to /short


I

Use the lquota and du commands to find how much disk space
you have available in your home, short and gdata directories.

Use the short files report or gdata1 files report to


see who uses most of the quota. Look at your projects /short
area. Anyone from your project can create their own directories and
files here. There will be a directory of your own under your project
area.

Note the different group ownership in the DATA directory.


ls -l /short/c25/DATA

47 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Exercise 4: Writing to /short (cont)


Change the permissions on your files and directories to allow/disallow
others in your group to access them.
man chmod
chmod g+r
chmod g-r
chmod g+w
chmod g+x

filename
filename
filename
filename

#
#
#
#

allow group read to filename


disallow group read to filename
allow group write to filename
allow group execute to filename

Verify with your neighbour that your file permissions are as expected.

Note
I
I

To be able to go into a directory requires execute permission


(chmod -R +X folder)
You may not want to share files by making your /home directory
world readable. For members of the same project you can use
/short/$PROJECT. Talk to us about alternatives if you need to
share source code, data files etc.
48 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

ACL Access Control Lists


ACLs are an addition to the standard Unix file permissions (r,w,x,-) for
User, Group, and Other for read, write, execute and deny permissions.
ACLs give users and administrators flexibility and direct fine-grained
control over who can read, write, and execute files.

Caution!
We strongly recommend that you consult with NCI before using ACLs.

49 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Using the MDSS


The Mass Data Store was migrated to a new SGI Hierarchical Storage
Management System in January 2012.
I
I

MDSS is used for long term storage of large datasets.


If you have numerous small files to archive - bundle into a tarfile
FIRST.
Watch our tape robot at work
Every project has a directory on the MDSS.
All members of the project group have read and write access to the
top project directory.
mdss dmls -l gives information what is online (on disk cache)
and what is on tape.

50 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Using the MDSS


I

The mdss command can be used to get and put data between
the login and copyq nodes of the raijin and the MDSS, and also
list files and directories on the MDSS.
netcp and netmv can be used from within batch jobs to
I
I

Generate a batch script for copying/moving files to the MDSS


Submit the generated batch script to the special copyq which runs
copy/move job on an interactive node.

netcp and netmv can also be used interactively to save you work
creating tarfiles and generating mdss commands.
I
I

-t create a tarfile to transfer


-z/-Z gzip/compress the file to be transferred

Caution!
Always use -l other=mdss when using mdss commands in copyq. This
is so that jobs only run when the the mdss system is available.
51 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Exercise 5: Using the MDSS


To see these commands in action do
cd /short/$PROJECT/$USER
mdss get Data/data.tar
ls -l
tar xvf data.tar
ls
rm data.tar
mdss mkdir $USER
netmv -t $USER.tar DATA $USER
watch qstat -u $USER
... (wait until job finishes, use Ctrl+C to quit)...
less DATA.o*
mdss ls $USER
mdss rm $USER/$USER.tar

52 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Using /jobfs
I

Only available through queueing system:


Request like -ljobfs=1GB
Access via $PBS JOBFS environment variable

All files are deleted at end of job. Copy what you need to /short
or other global filesystem in job script.

Request larger than 396GB will be automatically redirected to


/short (but will still be deleted at the end of the job).

Cannot use mdss or netcp commands for files on /jobfs.

53 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Exercise 6: Managing Files between /short, /jobfs and MDSS


Submit a batch job with a /jobfs request, where the job:
I
I
I
I

Copies an input file from /short to /jobfs


Runs a code to use the input file and generate some output
Saves the output data back to the /short area
Uses the netcp command to archive the data to the MDSS

54 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Exercise 6: Managing Files between /short, /jobfs and MDSS


Read the runjobfs script then submit it to the queueing system,
monitor the job with qstat, and examine the job output files:
cd /short/$PROJECT/$USER/INTRO_COURSE
qsub runjobfs
watch qstat -u $USER
... (wait until job finishes, use Ctrl+C to quit)...
cat runjobfs.e*
cat runjobfs.o*

Check out the output file that this job created on /short and the copy on
the MDSS
cd /short/$PROJECT/$USER
ls -ltr
less save_data.o*
mdss ls $USER
mdss rm -r $USER
55 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Outline
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting

56 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Troubleshooting
I
I
I

.e and .o stderr and stdout files - check your input!


PBS emails, MOTD and Notices and News
Read the FAQs
I
I

I
I

Why are my jobs not running?


Why does my job run fine on my local machine, but not work on
raijin?
My PBS job script generates the error message module: command
not found. Whats wrong?
How do I access files on NCI systems using a graphical user interface?
How do I transfer files between massdata and my local machine?

Read the /g/data FAQs

57 / 58

Introduction

Accounting

Connecting

UNIX

Job Scheduling

Filesystems

Troubleshooting

Issues with Running Jobs


I

CPU Over/Under subscription


I

I
I

I
I
I

I
I

Due to inconsistent number of ncpus=X request vs mpirun -np Y,


where Y != X.
OMP NUM THREADS != $PBS NCPUS
Use mpirun --bind-to-socket -npernode 2 <exe>-T 8
<args>
Use mpirun --bind-to-none program.exe for ncpus
<16 jobs
software specific keywords.
%nproc in gaussian.
NPAR in VASP. The recommended value should be somewhere
between SQRT( ncpus ) ... ncpus/2 and be a factor of 16.

Unbalanced %cpu usage


0% cpu usage (sleep, hung, or dead job). If its a file manipulation
job, use copyq instead.
58 / 58

You might also like