Queuing Systems
Node Configuration
When a job is submitted, it is placed in a queue. There are different queues available for different purposes. The user must select any one of the queues from the ones listed below which is appropriate for his/her computation need.
Queue | Details |
|
Name of Queue = serial No of nodes = 1 No of x86 Processors = 1 Name of node = gpu1 Walltime = 24:00:00 (HH:MM:SS) MaxJob = 2 per user |
|
Name of Queue = main No of nodes = 10 No of x86 Processors = 160 Name of node = node{1,2,3,4,5,6,7,8,9,10} Walltime = 48:00:00 (HH:MM:SS) MaxJob = 2 per user |
|
Name of Queue = gpu No of nodes = 2 No of x86 Processors = 32 Cuda cores = 2688 Name of node = gpu1 & gpu2 Walltime = 72:00:00 (HH:MM:SS) MaxJob = 2 per user |
|
Name of Queue = <advisor> No of nodes = 2 No of x86 Processors = 32 Name of node = node{9,10} Walltime = 72:00:00 (HH:MM:SS) MaxJob = infinite |
Node Configuration
Based on the queuing system given above, the node configurations can be summarized as follows:
Node name | Node type | Queue assignment | Queue priority |
node1 to node8 | Compute | main | main |
gpu1 | GPU | serial, gpu | serial |
gpu2 | GPU | main, gpu | gpu |
node9 and node10 | Compute | main, <advisor> | <advisor> |
Scheduler Details: We are using SLURM with version 14.03.7
: (colon) | Indicates a commented-out line that should be ignored by the scheduler. |
#SBATCH | Indicates a special line that should be interpreted by the scheduler. |
srun ./hello_parallel | This is a special command used to execute MPI programs. The command uses directions from SLURM to assign your job to the scheduled nodes. |
--job-name=hello_serial | This sets the name of the job; the name that shows up in the "Name" column in squeue's output. The name has no significance to the scheduler, but helps make the display more convenient to read. |
--output=slurm.out --error=slurm.err |
This tells SLURM where it should send your job's output stream and error stream, respectively. If you would like to prevent either of these streams from being written, set the file name to /dev/null |
--partition=batch | Set the partition in which your job will run. |
--qos=normal | Set the QOS in which your job will run. |
--nodes=4 | Request four nodes. |
--ntasks-per-node=8 | Request eight tasks to be run on each node. The number of tasks may not exceed the number of processor cores on the node. |
--time=1-12:30:00 | This option sets the maximum amount of time SLURM will allow your job to run before it is automatically killed. In the example shown, we have requested 1 day, 12 hours, 30 minutes, and 0 seconds. Several other formats are accepted such as "HH:MM:SS" (assuming less than a day). If your specified time is too large for the partition/QOS you've specified, the scheduler will not run your job. |
--mem-per-cpu=MB | Specify a memory limit for each process of your job. The default is 2944 |
--mem=MB | Specify a memory limit for each node of your job. The default is that there is a per-core limit |
--exclusive | Specify that you need exclusive access to nodes for your job. This is the opposite of "--share". |
--share | Specify that your job may share nodes with other jobs. This is the opposite of "--exclusive". |
--constraint=feature_name | Tell the scheduler that scheduled nodes for this job must have feature "feature_name" |
--gres=resource_name | Tell the scheduler that scheduled nodes for this job will use resource "resource_name" |
Sample Scripts to submit job for various queue:
Serial
#!/bin/bash
#SBATCH --job-name=<myjob>
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --error=myjob.%J.err
#SBATCH --output=myjob.%J.out
#SBATCH --partition=serial
#SBATCH -v
cd ~/<your-path>
MACHINEFILE=machinefile
scontrol show hostname $SLURM_JOB_NODELIST > $MACHINEFILE
<your path to binary> -batch -np 1 -machinefile $MACHINEFILE -rsh /usr/bin/ssh ~/<your-path>/input-file
Main
#!/bin/bash
#SBATCH --job-name=<myjob>
#SBATCH --nodes=6
#SBATCH --ntasks-per-node=16
#SBATCH --error=myjob.%J.err
#SBATCH --output=myjob.%J.out
#SBATCH --partition=main
#SBATCH -v
cd ~/<your-path>
MACHINEFILE=machinefile
scontrol show hostname $SLURM_JOB_NODELIST > $MACHINEFILE
<your path to binary> -batch -np 96 -machinefile $MACHINEFILE -rsh /usr/bin/ssh ~/<your-path>/input-file
GPU
#!/bin/bash
#SBATCH --job-name=<myjob>
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=16
#SBATCH --gres=gpu:1
#SBATCH --error=myjob.%J.err
#SBATCH --output=myjob.%J.out
#SBATCH --partition=gpu
#SBATCH -v
cd ~/<your-path>
MACHINEFILE=machinefile
scontrol show hostname $SLURM_JOB_NODELIST > $MACHINEFILE
<your path to binary> -batch -np 32 -machinefile $MACHINEFILE -rsh /usr/bin/ssh ~/<your-path>/input-file
Advisor
#!/bin/bash
#SBATCH --job-name=<myjob>
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=16
#SBATCH --error=myjob.%J.err
#SBATCH --output=myjob.%J.out
#SBATCH --partition=<advisor>
#SBATCH -v
cd ~/<your-path>
MACHINEFILE=machinefile
scontrol show hostname $SLURM_JOB_NODELIST > $MACHINEFILE
<your path to binary> -batch -np 32 -machinefile $MACHINEFILE -rsh /usr/bin/ssh ~/<your-path>/input-file
Useful Commands
- For submitting a job:
sbatch submit_script.sh
- For checking queue status:
squeue -l
- For checking node status:
sinfo
- For cancelling the job:
scancel <job-id>
Usage Guidelines
- Users are supposed to submit jobs only through scheduler.
- Users are not supposed to run any job on the master node.
- Users are not allowed to run a job by directly login to any compute node.