|
|
|
The COVID-19 pandemic is placing all aspects of global society in unchartered territory and many aspects of day to day life have been put on hold. The latest measures being implemented by governments place further restrictions on how we can live our lives and make normality seem more distant. Science is at the forefront of understanding the disease and how we can counter its effects. HPC is one of the most powerful tools we have in the fight against disease and can give us a detailed insight into the building blocks of diseases.
|
|
|
|
|
|
|
|
As a high-performance systems integrator OCF help facilitate vital research at many organisations across the UK through hardware and software solutions. There is an opportunity for existing OCF customers and anyone with an x86 Slurm cluster to get involved. We are encouraging the usage of any spare capacity in existing solutions to be donated to the COVID-19 sequencing effort through Folding@Home. Spare capacity can be utilised when your users are not using all resources and your donation of clock cycles does not need to impact on any work you are currently using your solution for. GPU capacity is the most sought after at this time, but all donated resources help.
|
|
|
|
|
|
|
|
See Folding@Home for more - <https://foldingathome.org/covid19/>
|
|
|
|
|
|
|
|
How can I contribute?
|
|
|
|
---------------------
|
|
|
|
|
|
|
|
If you have a x86 solution using Slurm this can be configured with low priority partitions to not impact any existing workflows and still use capacity. There is a single partition to setup for CPU only solutions and another for GPU based jobs. To run GPU jobs a CUDA installation must be present either installed directly on each system running the client or as a loadable module.
|
|
|
|
|
|
|
|
**Please note, the following instructions are given to help get your cluster folding as soon as possible. All software is public domain and OCF own no intellectual property associated with mentioned software. All modifications made to production clusters are done at your own risk and OCF accept no responsibility for misconfiguration.**
|
|
|
|
|
|
|
|
Install and configure Folding@Home
|
|
|
|
----------------------------------
|
|
|
|
|
|
|
|
Folding@Home can be installed using a single RPM.
|
|
|
|
|
|
|
|
`rpm -ivh https://download.foldingathome.org/releases/public/release/fahclient/centos-6.7-64bit/v7.5/fahclient-7.5.1-1.x86_64.rpm`
|
|
|
|
|
|
|
|
Files will be installed to "/var/lib/fahclient", the FAHCoreWrapper/FAHClient binary files will be installed to "/usr/bin". The files installed here can be copied to a shared location or a home directory for runtime execution and to prevent having to install the RPM on multiple nodes.
|
|
|
|
|
|
|
|
For Shared Installations:
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
1. mkdir -p </shared/dir/>
|
|
|
|
|
|
|
|
2. mv /var/lib/fahclient </shared/dir/>
|
|
|
|
|
|
|
|
3. mv /usr/bin/FAH* </shared/dir/>
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
Folding@Home must now be configured for the different workload types. The configuration is very minimal and only requires one or more config.xml files adding to the directory containing FAHClient. See appendices for configuration files.
|
|
|
|
|
|
|
|
General Slurm configuration
|
|
|
|
---------------------------
|
|
|
|
|
|
|
|
Slurm must be configured with pre-emption and priority. The following settings are recommended in slurm.conf:
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
SelectType=select/cons_res
|
|
|
|
|
|
|
|
SelectTypeParameters=CR_Core_Memory
|
|
|
|
|
|
|
|
JobAcctGatherType=jobacct_gather/cgroup
|
|
|
|
|
|
|
|
JobAcctGatherFrequency=30
|
|
|
|
|
|
|
|
PreemptType=preempt/partition_prio
|
|
|
|
|
|
|
|
PreemptMode=SUSPEND,GANG
|
|
|
|
|
|
|
|
# For GPU jobs
|
|
|
|
|
|
|
|
GresTypes=gpu
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
If the above settings have changed restart slurmctld for things to take effect.
|
|
|
|
|
|
|
|
The following partition configuration is for CPU and GPU type jobs. If you do not currently use priority tiers please add "PriorityTier=2" to your existing partitions so your user submitted jobs have higher priority. After applying the partition configuration below run "scontrol reconfigure" if you have a shared slurm.conf or copy your configuration file to all your nodes and then run "scontrol reconfigure" to activate the new configuration.
|
|
|
|
|
|
|
|
CPU Partition Configuration
|
|
|
|
---------------------------
|
|
|
|
```
|
|
|
|
# Partitions
|
|
|
|
|
|
|
|
PartitionName=spot-folding State=UP Nodes=node[01-11] PriorityTier=1
|
|
|
|
PreemptMode=suspend DefMemPerCPU=4096 Default=NO
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
GPU Partition Configuration
|
|
|
|
---------------------------
|
|
|
|
|
|
|
|
Slurm Batch Scripts
|
|
|
|
-------------------
|
|
|
|
|
|
|
|
After installing and reconfiguring Slurm you should now be ready to submit jobs. Sample CPU and GPU batch scripts are available in the appendices.
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
# Partitions
|
|
|
|
|
|
|
|
PartitionName=spot-folding-gpu State=UP Nodes=gpu[01-06] PriorityTier=1
|
|
|
|
PreemptMode=suspend DefMemPerCPU=4096 Default=NO
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
Please remember to adjust the partition name to match what you named it and the following parameters on CPU jobs where X is either 1, 2, 4, 8 or 16:
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
#SBATCH --ntasks-per-node=**X**
|
|
|
|
|
|
|
|
--config ./config-**X**.xml
|
|
|
|
```
|
|
|
|
|
|
|
|
Kubernetes
|
|
|
|
----------
|
|
|
|
|
|
|
|
Not only can you do these items within Slurm on bare-metal clusters, but we have a working implementation on k8s with GPU support -- this is currently being ran on a DGX-1 POD and is proven working.
|
|
|
|
|
|
|
|
To run, simply clone the OCF gitlab repository at: <https://gitlab.ocf.co.uk/open/ocf-folding> and follow the readme which should give you all you need.
|
|
|
|
|
|
|
|
Appendices
|
|
|
|
----------
|
|
|
|
|
|
|
|
**Single core configuration**
|
|
|
|
```
|
|
|
|
[folding@ocf ~/fah]# cat config-1.xml
|
|
|
|
|
|
|
|
<config>
|
|
|
|
|
|
|
|
<user value="anonymous"/> <!-- Enter your user name here -->
|
|
|
|
|
|
|
|
<team value="248010"/> <!-- Team OCF! -->
|
|
|
|
|
|
|
|
<passkey value=""/> <!-- 32 hexadecimal characters if provided -->
|
|
|
|
|
|
|
|
<!-- Folding Slots -->
|
|
|
|
|
|
|
|
<slot id='0' type='CPU'/>
|
|
|
|
|
|
|
|
</config>
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**2** **core configuration**
|
|
|
|
```
|
|
|
|
[folding@ocf ~/fah]# cat config-2.xml
|
|
|
|
|
|
|
|
<config>
|
|
|
|
|
|
|
|
<user value="anonymous"/> <!-- Enter your user name here -->
|
|
|
|
|
|
|
|
<team value="248010"/> <!-- Team OCF! -->
|
|
|
|
|
|
|
|
<passkey value=""/> <!-- 32 hexadecimal characters if provided -->
|
|
|
|
|
|
|
|
<!-- Folding Slots -->
|
|
|
|
|
|
|
|
<slot id='0' type='CPU'/>
|
|
|
|
|
|
|
|
<slot id='1' type='CPU'/>
|
|
|
|
|
|
|
|
</config>
|
|
|
|
```
|
|
|
|
|
|
|
|
**4** **core configuration**
|
|
|
|
```
|
|
|
|
[folding@ocf ~/fah]# cat config-4.xml
|
|
|
|
|
|
|
|
<config>
|
|
|
|
|
|
|
|
<user value="anonymous"/> <!-- Enter your user name here -->
|
|
|
|
|
|
|
|
<team value="248010"/> <!-- Team OCF! -->
|
|
|
|
|
|
|
|
<passkey value=""/> <!-- 32 hexadecimal characters if provided -->
|
|
|
|
|
|
|
|
<!-- Folding Slots -->
|
|
|
|
|
|
|
|
<slot id='0' type='CPU'/>
|
|
|
|
|
|
|
|
<slot id='1' type='CPU'/>
|
|
|
|
|
|
|
|
<slot id='2' type='CPU'/>
|
|
|
|
|
|
|
|
<slot id='3' type='CPU'/>
|
|
|
|
|
|
|
|
</config>
|
|
|
|
```
|
|
|
|
|
|
|
|
**8 core configuration**
|
|
|
|
```
|
|
|
|
[folding@ocf ~/fah]# cat config-8.xml
|
|
|
|
|
|
|
|
<config>
|
|
|
|
|
|
|
|
<user value="anonymous"/> <!-- Enter your user name here -->
|
|
|
|
|
|
|
|
<team value="248010"/> <!-- Team OCF! -->
|
|
|
|
|
|
|
|
<passkey value=""/> <!-- 32 hexadecimal characters if provided -->
|
|
|
|
|
|
|
|
<!-- Folding Slots -->
|
|
|
|
|
|
|
|
<slot id='0' type='CPU'/>
|
|
|
|
|
|
|
|
<slot id='1' type='CPU'/>
|
|
|
|
|
|
|
|
<slot id='2' type='CPU'/>
|
|
|
|
|
|
|
|
<slot id='3' type='CPU'/>
|
|
|
|
|
|
|
|
<slot id='4' type='CPU'/>
|
|
|
|
|
|
|
|
<slot id='5' type='CPU'/>
|
|
|
|
|
|
|
|
<slot id='6' type='CPU'/>
|
|
|
|
|
|
|
|
<slot id='7' type='CPU'/>
|
|
|
|
|
|
|
|
</config>
|
|
|
|
```
|
|
|
|
|
|
|
|
**16 core configuration**
|
|
|
|
```
|
|
|
|
[folding@ocf ~/fah]# cat config-16.xml
|
|
|
|
|
|
|
|
<config>
|
|
|
|
|
|
|
|
<user value="anonymous"/> <!-- Enter your user name here -->
|
|
|
|
|
|
|
|
<team value="248010"/> <!-- Team OCF! -->
|
|
|
|
|
|
|
|
<passkey value=""/> <!-- 32 hexadecimal characters if provided -->
|
|
|
|
|
|
|
|
<!-- Folding Slots -->
|
|
|
|
|
|
|
|
<slot id='0' type='CPU'/>
|
|
|
|
|
|
|
|
<slot id='1' type='CPU'/>
|
|
|
|
|
|
|
|
<slot id='2' type='CPU'/>
|
|
|
|
|
|
|
|
<slot id='3' type='CPU'/>
|
|
|
|
|
|
|
|
<slot id='4' type='CPU'/>
|
|
|
|
|
|
|
|
<slot id='5' type='CPU'/>
|
|
|
|
|
|
|
|
<slot id='6' type='CPU'/>
|
|
|
|
|
|
|
|
<slot id='7' type='CPU'/>
|
|
|
|
|
|
|
|
<slot id='8' type='CPU'/>
|
|
|
|
|
|
|
|
<slot id='9' type='CPU'/>
|
|
|
|
|
|
|
|
<slot id='10' type='CPU'/>
|
|
|
|
|
|
|
|
<slot id='11' type='CPU'/>
|
|
|
|
|
|
|
|
<slot id='12' type='CPU'/>
|
|
|
|
|
|
|
|
<slot id='13' type='CPU'/>
|
|
|
|
|
|
|
|
<slot id='14' type='CPU'/>
|
|
|
|
|
|
|
|
<slot id='15' type='CPU'/>
|
|
|
|
|
|
|
|
</config>
|
|
|
|
```
|
|
|
|
|
|
|
|
**Single GPU configuration**
|
|
|
|
```
|
|
|
|
[folding@ocf ~/fah]# cat config-gpu.xml
|
|
|
|
|
|
|
|
<config>
|
|
|
|
|
|
|
|
<user value="anonymous"/> <!-- Enter your user name here -->
|
|
|
|
|
|
|
|
<team value="248010"/> <!-- Team OCF! -->
|
|
|
|
|
|
|
|
<passkey value=""/> <!-- 32 hexadecimal characters if provided -->
|
|
|
|
|
|
|
|
<!-- Folding Slots -->
|
|
|
|
|
|
|
|
<slot id='0' type='GPU'/>
|
|
|
|
|
|
|
|
</config>
|
|
|
|
```
|
|
|
|
|
|
|
|
Slurm Batch Scripts
|
|
|
|
-------------------
|
|
|
|
|
|
|
|
**CPU** **Job**
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
[folding@ocf ~/fah]# cat run1core.sh
|
|
|
|
|
|
|
|
#!/bin/bash
|
|
|
|
|
|
|
|
# OCF - www.ocf.co.uk
|
|
|
|
|
|
|
|
# set the number of nodes
|
|
|
|
|
|
|
|
#SBATCH --nodes=1
|
|
|
|
|
|
|
|
#SBATCH --ntasks-per-node=1
|
|
|
|
|
|
|
|
# set name of job
|
|
|
|
|
|
|
|
#SBATCH --job-name=folding-at-home-1core
|
|
|
|
|
|
|
|
#SBATCH --partition=spot-folding
|
|
|
|
|
|
|
|
cd ~/fah
|
|
|
|
|
|
|
|
mkdir ./$SLURM_JOBID
|
|
|
|
|
|
|
|
./FAHClient --config ./config-1.xml --checkpoint 10 --chdir ./$SLURM_JOBID --http-addresses 0:$(shuf -i 10000-11000 -n1) --command-port $(shuf -i 11000-12000 -n1) --log ~/fah/output/$SLURM_JOBID.log --smp --exit-when-done
|
|
|
|
|
|
|
|
rm -rf ./$SLURM_JOBID
|
|
|
|
```
|
|
|
|
|
|
|
|
**GPU Job**
|
|
|
|
```
|
|
|
|
[folding@ocf ~/fah]# cat runGPU.sh
|
|
|
|
|
|
|
|
#!/bin/bash
|
|
|
|
|
|
|
|
# OCF - www.ocf.co.uk
|
|
|
|
|
|
|
|
# set the number of nodes
|
|
|
|
|
|
|
|
#SBATCH --nodes=1
|
|
|
|
|
|
|
|
#SBATCH --ntasks-per-node=1
|
|
|
|
|
|
|
|
#SBATCH --gres=gpu:1
|
|
|
|
|
|
|
|
|
|
|
|
# set name of job
|
|
|
|
|
|
|
|
#SBATCH --job-name=folding-at-home-GPU
|
|
|
|
|
|
|
|
#SBATCH --partition=spot-folding-gpu
|
|
|
|
|
|
|
|
module load cuda/10.1
|
|
|
|
|
|
|
|
cd ~/fah
|
|
|
|
|
|
|
|
mkdir ./$SLURM_JOBID
|
|
|
|
|
|
|
|
./FAHClient --config ./config-gpu.xml --checkpoint 10 --chdir ./$SLURM_JOBID --http-addresses 0:$(shuf -i 10000-11000 -n1) --command-port $(shuf -i 11000-12000 -n1) --log ~/fah/output/$SLURM_JOBID.log --smp --gpu --exit-when-done
|
|
|
|
|
|
|
|
rm -rf ./$SLURM_JOBID
|
|
|
|
``` |
|
|
|
\ No newline at end of file |