Jon Dehdari




GridEngine Logo




Using SGE / GridEngine

Please read all of the following before using a GridEngine cluster.

Current Status

You can get an overview of the cluster by typing qhost, and you can also see the status of the GPUs with qhost -F gpu.
You can view current jobs in the queue by typing qstat -F gpu.
If you’re going to use GPUs alot, you might as well alias the long version of the above commands (which include -F gpu) to the short version, in your .bashrc :

echo "alias qstat='qstat -F gpu'" >> ~/.bashrc
echo "alias qhost='qhost -F gpu'" >> ~/.bashrc
source ~/.bashrc

Submitting Jobs

The normal way to submit jobs to the cluster is using the qsub command. For example, qsub myscript.sh. The many options to the qsub command are described in the manpage, man qsub.
Any command-line argument for qsub can alternatively appear inside the shell script:

For example, either:

qsub -cwd -e /dev/null myscript.sh

Or:

qsub myscript.sh

with myscript.sh including the lines:

#$ -cwd
#$ -e /dev/null

Using GPUs

The basic system only knows about GPU utilization based on what people request within GridEngine. So if people run GPU jobs on their own (outside of GridEngine), then the system is not aware of them.
So, to request GPUs with qsub, add the following flag to qsub at the command-line

-l gpu=1

Or within your qsub script:

#$ -l gpu=1

This requests one GPU to be used. GridEngine does not enable your program to use a GPU. Rather, it just keeps track of how many GPUs are being used in the cluster.

Don't forget to export all relevant environment variables for CUDA, like PATH and LD_LIBRARY_PATH.

Network-wide Disk Space

Since you don’t know which server will compute your job, you need to use a disk mount point that is accessible to all servers.

NFS

If your local systems administrator configured the cluster to use NFS or something similar, then ensure that your computing jobs output to an NFS-mounted directory. You can see where your remote mount points are by typing df -h | grep : .

SSHFS

If NFS is not setup, you can use user-level SSHFS instead. The following steps only need to be done once.
  1. First setup SSH keys. On a computer where you plan on submitting jobs:
  2. ssh-keygen -b 8192 -t rsa

    Use an empty passphrase, and accept other defaults.

  3. For every remote computer, type:
  4. ssh-copy-id  yourusername@123.456.789.012

    Now you don’t need to type a password to login to these servers.

  5. Decide on a hard drive of a remote server that has lots of space, like /hd4 on server 123.456.789.012. You can use df -h to find out disk usage on a given computer. Then create a directory there:
  6. mkdir -p /hd4/myusername/sge/
    cd
    ln -s /hd4/myusername/sge/ sge

    Remember that files on these servers are not backed-up at all.

  7. Create a shell script like the following as mount-sge.sh, modifying it as necessary. It’s better to use IP addresses than hostnames:
  8. user=myusername
    hosts='123.456.789.011 123.456.789.013 123.456.789.014'
    mount_src="${user}@123.456.789.012:/hd4/${user}/sge/"
    mount_tgt="~/sge/"
    
    for host in $hosts; do
      ssh ${user}@$host sshfs -o allow_root $mount_src $mount_tgt  &&  \
        echo "Mounted $mount_tgt on $host"  ||  \
        echo "Mountpoint $mount_tgt on $host is probably already mounted."
    done

    Don’t forget to set executable permissions: chmod u+x mount-sge.sh . Running this script will mount all the necessary mount points, so that the output of all jobs can get sent to a common directory.
    If a server gets rebooted, just rerun this script.

  9. In your qsub script add the following line, to set your working directory to ~/sge:
  10. #$ -wd $HOME/sge/

    You can create subdirectories in this path, and modify the above line accordingly.
    You can alternatively just set the output & error path to ~/sge:

    #$ -o $HOME/sge/
    #$ -e $HOME/sge/

    If you don’t want any error output, set it to /dev/null:

    #$ -e /dev/null/

Example Script

Below is an example script. For arguments that you always want (like email notifications), you can put those in ~/.sge_request, omitting the #$ .

#!/bin/bash

## Inherit all environment variables
#$ -V

## Start in current working directory
#$ -cwd

## Stdout to the following dir
#$ -o $HOME/sge/

## Stderr to the following dir
#$ -e $HOME/sge/

## Specify job name
#$ -N test-3hr

## When will an email be sent.
## 'e'=end of job
## 'a'=if job is aborted
#$ -m ea

## Where to email info
#$ -M foo@example.com

## Which resources to use
#$ -l gpu=1

## RAM and swap limits in kilobytes. -v is a bashism
ulimit -m 8000000
ulimit -v 10000000

echo "Hello"
echo "The date is " `date`
echo "hostname is " `hostname`

Further Reading