FAQ¶
Help with the HPC¶
If you need help with the HPC or setting up software/analysis. The first thing to do is ask within our lab.
For most cases, someone else has already encountered the issue and solved it. Other times, it is something we can help with.
If nobody can help, you can ask Shanon Loveridge on the Baker-HPC slack channel.
Warning
Our lab is charged more than $100 for every question you ask of Shanon (the price is likely to go up!). Even if it is something simple. Do not ask Shanon simple questions!
How much does the HPC cost?¶
The HPC operates on a cost recovery basis. It needs to recoup the cost of infrastructure and personel. This means that running jobs on the HPC is not free.
We pay per CPU-hour and per memory-hour. Per hour, it costs about 2.2 cents per 2 CPU cores and 2.2 cents per 16 GB of memory. So, a job that uses 2 CPU cores, 16 GB of memory and takes one hour, it will cost 4.4 cents.
If your job is set up well, then it shouldn't matter whether you run it on 2 cores for 24 hours or 24 cores for 2 hours. The amount of work ends up being the same. However, if you create interactive jobs and let it sit there not doing anything, it is wasting money.
How to set up SSH keys¶
Logging into the HPC requires a password. But not all HPCs/servers allow passwords (some require SSH keys). So it is a good idea to set this up and get used to it.
If you have already created a SSH key, don't create another one, as it will overwrite it.
In both windows and mac, run the following command to create an SSH key (if it asks for a passphrase, just hit enter for no passphrase):
1 | |
This will create a few files here:
1 | |
1 | |
There should be two files: id_rsa and id_rsa.pub
You can write your id_rsa.pub file to the HPC using these commands (replace username with your username and you will need to enter your password for this):
1 | |
1 | |
After that has finished, you have set up your SSH keys. You now won't need to enter your password to log into the HPC (or any other server you do this with).
Don't share your id_rsa file
Do not give your id_rsa to anyone. It will allow them to log in as you!
What are partitions and what one do I use?¶
Partitions are ways to set limits on job parameters and dedicate nodes to these types of jobs. Use sinfo to see the partitions and node allocations. We have the following partitions:
- interactive: For interactive and short term jobs only. Jobs can only run for 4 hours maximum. Node 1 is dedicated to interactive jobs.
- standard: For jobs that take up to 24 hours to run. Most nodes are available for standard jobs.
- long: No time limit on job run time. Has lowest priority in the queue. Most nodes are available for long jobs.
- epigenetics: We can't use this partition.
- imaging: We can't use this partition.
Using Rstudio on the HPC¶
It is possible to use Rstudio on the HPC. This effectively starts a new job and runs Rstudio server, allowing you to access the Rstudio interface through a web interface (must be onsite or through VPN).
You can run this command to start the Rstudio job:
1 | |
This will create a file in your home directory: ~/rstudio.job.[JOBID]
Open it and follow the instructions to access the interface.
You can request additional CPU/Memory/Time using standard SBATCH commands:
1 | |
As an alternative, we have R functions to do this in the BakerMetabolomics package. This can be done on a laptop/desktop connected to the Baker network and it will start the job on the HPC (so don't run this on the HPC). You will need to have set up SSH Keys (see here)
You first start an Rstudio job, then you get the details, and lastly you clean it up:
1 2 3 4 5 6 7 8 | |
Make sure you stop the job afterwards!
We pay for HPC use (per hour CPU and memory), so don't leave Rstudio/interactive jobs running and doing nothing. Make sure to close the jobs after you have finished!
How do I use the GPUs?¶
Our HPC has several nodes with GPUs. You can see the nodes with GPUs with the following command:
1 2 3 4 5 | |
This outputs the nodes, the GPU type:count, and the number of nodes available.
For instance, gpu:2g.12gb:4 means:
gpu: GPU resource2g: 2 instances per node12gb: Each instance has 12 GB of VRAM4: There are 4 available in total
The above are 4x A32 NVIDIA GPUs. The other GPU is a T4 NVIDIA GPU.
Using the GPUs¶
To request one of the GPUs with your job, you must add this line to your SBATCH script:
1 | |
Multi-GPU workflows are difficult
The A32 GPUs are split into instances, meaning multiple people can use the same GPU at the same time. This complicates multi-GPU use (NVIDIA disallows multi-GPU loads on multi-instance GPUs). The only solution is to use GPUs across multiple nodes (which adds latency, so not worth while for our work).
Simple/Common Linux Commands¶
The HPC can only be accessed by command line, so it is good to have familiarity with linux commands. Some simple commands are listed below to help those less familiar.
- cd – just like in windows, you can use cd to move around directories. Use the ‘tab’ key to autocomplete directory/file names, double tapping the key to list possible options is also useful. Example: cd ~ will return you to your home directory. cd /projects/Metabolomics will return you to the Metabolomics projects directory.
- ls – this will list files/folders in the current directory.
- pwd – print the current directory location.
- nano – start a simple text editor for creating/copying code or scripts.
- cat – useful for viewing the contents of a file (print to screen) or merging files. Example: cat my_text_file.txt will print the contents of the file to the screen. cat file1.txt file2.txt - file3.txt > file_merged.txt will concatenate files into a single file.
- head,tail – useful for viewing the first/last few lines of a file.
- mkdir – create a directory. Example: mkdir my_new_dir this will create a new directory called my_new_dir in the current working folder.
- cp – copy files. Requires the file/folder name and the destination as arguments. Example: cp this_file /projects/Metabolomics/to_here this will copy and rename the file ‘this_file’ in the current directory to /projects/Metabolomics/, renaming it to ‘to_here’. cp ./. ~/my_folder/ This will copy all files in the current directory to the folder ‘my_folder’ in your home directory
- rm – remove files. Specify the file you want to remove as an argument. To delete directories with files inside, specify the -r option. Example: rm -r ~/my_accidental_folder/ this will delete the folder and all files inside.
- R – start an interactive R session. This allows you to create/troubleshoot R code. You could also run R scripts in a non-interactive session using the command Rscript. Again, don’t use R on the head node!
- CTRL-C – this can be used to interrupt/kill the current running process.
- CTRL-Z – this can be used to suspend the current process. You can view suspended jobs using the ‘jobs’ command. Then you can re-enter the suspended jobs by running ‘fg X’, with X being the jobs number (shown in brackets, usually 1 if you don’t have anything else running).
- exit – this command will end an interactive session. Running it on the head node will log you out of the HPC.
What are modules in HPC/SLURM?¶
Modules are a way to dynamically manage your software environment on HPC systems. The module command lets you load, unload, and switch between different software packages and versions without changing your shell configuration files.
To see which modules are currently loaded in your session, use:
module list
This will display all modules you have loaded.
To see all modules that are available to load, use:
module avail
This will show a list of all software packages and versions you can load on the HPC system.
For example, to load Python 3.8, you might use:
module load python/3.8
This makes the specified version of Python available in your session. Modules are especially useful in SLURM job scripts to ensure your jobs use the correct software versions.
Baker HPC documentation¶
There is some documentation on the Baker HPC here: http://hpc-docs.bhri.internal/.