Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • C covid19-fah
  • Project information
    • Project information
    • Activity
    • Planning hierarchy
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Deployments
    • Deployments
    • Releases
  • Analytics
    • Analytics
    • Value stream
    • Repository
  • Wiki
    • Wiki
  • Activity
  • Graph
  • Commits
Collapse sidebar
  • open
  • covid19-fah
  • Wiki
  • Home

Last edited by Lee Hobson Mar 30, 2020
Page history

Home

The COVID-19 pandemic is placing all aspects of global society in unchartered territory and many aspects of day to day life have been put on hold. The latest measures being implemented by governments place further restrictions on how we can live our lives and make normality seem more distant. Science is at the forefront of understanding the disease and how we can counter its effects. HPC is one of the most powerful tools we have in the fight against disease and can give us a detailed insight into the building blocks of diseases.

As a high-performance systems integrator OCF help facilitate vital research at many organisations across the UK through hardware and software solutions. There is an opportunity for existing OCF customers and anyone with an x86 Slurm cluster to get involved. We are encouraging the usage of any spare capacity in existing solutions to be donated to the COVID-19 sequencing effort through Folding@Home. Spare capacity can be utilised when your users are not using all resources and your donation of clock cycles does not need to impact on any work you are currently using your solution for. GPU capacity is the most sought after at this time, but all donated resources help.

See Folding@Home for more - https://foldingathome.org/covid19/

How can I contribute?

If you have a x86 solution using Slurm this can be configured with low priority partitions to not impact any existing workflows and still use capacity. There is a single partition to setup for CPU only solutions and another for GPU based jobs. To run GPU jobs a CUDA installation must be present either installed directly on each system running the client or as a loadable module.

Please note, the following instructions are given to help get your cluster folding as soon as possible. All software is public domain and OCF own no intellectual property associated with mentioned software. All modifications made to production clusters are done at your own risk and OCF accept no responsibility for misconfiguration.

Install and configure Folding@Home

Folding@Home can be installed using a single RPM.

rpm -ivh https://download.foldingathome.org/releases/public/release/fahclient/centos-6.7-64bit/v7.5/fahclient-7.5.1-1.x86_64.rpm

Files will be installed to "/var/lib/fahclient", the FAHCoreWrapper/FAHClient binary files will be installed to "/usr/bin". The files installed here can be copied to a shared location or a home directory for runtime execution and to prevent having to install the RPM on multiple nodes.

For Shared Installations:

1. mkdir -p </shared/dir/bin>

2. mv /var/lib/fahclient </shared/dir/bin>

3. mv /usr/bin/FAH* </shared/dir/bin>

Folding@Home must now be configured for the different workload types. The configuration is very minimal and only requires one or more config.xml files adding to the directory containing FAHClient. See appendices for configuration files.

General Slurm configuration

Slurm must be configured with pre-emption and priority. The following settings are recommended in slurm.conf:

SelectType=select/cons_res

SelectTypeParameters=CR_Core_Memory

JobAcctGatherType=jobacct_gather/cgroup

JobAcctGatherFrequency=30

PreemptType=preempt/partition_prio

PreemptMode=SUSPEND,GANG

# For GPU jobs

GresTypes=gpu

If the above settings have changed restart slurmctld for things to take effect.

The following partition configuration is for CPU and GPU type jobs. If you do not currently use priority tiers please add "PriorityTier=2" to your existing partitions so your user submitted jobs have higher priority. After applying the partition configuration below run "scontrol reconfigure" if you have a shared slurm.conf or copy your configuration file to all your nodes and then run "scontrol reconfigure" to activate the new configuration.

CPU Partition Configuration

# Partitions

PartitionName=spot-folding State=UP Nodes=node[01-11] PriorityTier=1
PreemptMode=suspend DefMemPerCPU=4096 Default=NO  

 

GPU Partition Configuration

Slurm Batch Scripts

After installing and reconfiguring Slurm you should now be ready to submit jobs. Sample CPU and GPU batch scripts are available in the appendices.

# Partitions

PartitionName=spot-folding-gpu State=UP Nodes=gpu[01-06] PriorityTier=1
PreemptMode=suspend DefMemPerCPU=4096 Default=NO     

Please remember to adjust the partition name to match what you named it and the following parameters on CPU jobs where X is either 1, 2, 4, 8 or 16:

#SBATCH --ntasks-per-node=**X**

--config ./config-**X**.xml

Kubernetes

Not only can you do these items within Slurm on bare-metal clusters, but we have a working implementation on k8s with GPU support -- this is currently being ran on a DGX-1 POD and is proven working.

To run, simply clone the OCF gitlab repository at: https://gitlab.ocf.co.uk/open/ocf-folding and follow the readme which should give you all you need.

Appendices

Single core configuration

[folding@ocf ~/fah]# cat config-1.xml

<config>

  <user value="anonymous"/> <!-- Enter your user name here -->

  <team value="248010"/>         <!-- Team OCF! -->

  <passkey value=""/>       <!-- 32 hexadecimal characters if provided -->

  <!-- Folding Slots -->

  <slot id='0' type='CPU'/>

</config>

2 core configuration

[folding@ocf ~/fah]# cat config-2.xml

<config>

  <user value="anonymous"/> <!-- Enter your user name here -->

  <team value="248010"/>         <!-- Team OCF! -->

  <passkey value=""/>       <!-- 32 hexadecimal characters if provided -->

  <!-- Folding Slots -->

  <slot id='0' type='CPU'/>

  <slot id='1' type='CPU'/>

</config>

4 core configuration

[folding@ocf ~/fah]# cat config-4.xml

<config>

  <user value="anonymous"/> <!-- Enter your user name here -->

  <team value="248010"/>         <!-- Team OCF! -->

  <passkey value=""/>       <!-- 32 hexadecimal characters if provided -->

  <!-- Folding Slots -->

  <slot id='0' type='CPU'/>

  <slot id='1' type='CPU'/>

  <slot id='2' type='CPU'/>

  <slot id='3' type='CPU'/>

</config>

8 core configuration

[folding@ocf ~/fah]# cat config-8.xml

<config>

  <user value="anonymous"/> <!-- Enter your user name here -->

  <team value="248010"/>         <!-- Team OCF! -->

  <passkey value=""/>       <!-- 32 hexadecimal characters if provided -->

  <!-- Folding Slots -->

  <slot id='0' type='CPU'/>

  <slot id='1' type='CPU'/>

  <slot id='2' type='CPU'/>

  <slot id='3' type='CPU'/>

  <slot id='4' type='CPU'/>

  <slot id='5' type='CPU'/>

  <slot id='6' type='CPU'/>

  <slot id='7' type='CPU'/>

</config> 

16 core configuration

[folding@ocf ~/fah]# cat config-16.xml

<config>

  <user value="anonymous"/> <!-- Enter your user name here -->

  <team value="248010"/>         <!-- Team OCF! -->

  <passkey value=""/>       <!-- 32 hexadecimal characters if provided -->

  <!-- Folding Slots -->

  <slot id='0' type='CPU'/>

  <slot id='1' type='CPU'/>

  <slot id='2' type='CPU'/>

  <slot id='3' type='CPU'/>

  <slot id='4' type='CPU'/>

  <slot id='5' type='CPU'/>

  <slot id='6' type='CPU'/>

  <slot id='7' type='CPU'/>

  <slot id='8' type='CPU'/>

  <slot id='9' type='CPU'/>

  <slot id='10' type='CPU'/>

  <slot id='11' type='CPU'/>

  <slot id='12' type='CPU'/>

  <slot id='13' type='CPU'/>

  <slot id='14' type='CPU'/>

  <slot id='15' type='CPU'/>

</config> 

Single GPU configuration

[folding@ocf ~/fah]# cat config-gpu.xml

<config>

  <user value="anonymous"/> <!-- Enter your user name here -->

  <team value="248010"/>         <!-- Team OCF! -->

  <passkey value=""/>       <!-- 32 hexadecimal characters if provided -->

  <!-- Folding Slots -->

  <slot id='0' type='GPU'/>

</config>

Multi GPU configuration

[folding@ocf ~/fah]# cat config-gpu.xml

<config>

  <user value="anonymous"/> <!-- Enter your user name here -->

  <team value="248010"/>         <!-- Team OCF! -->

  <passkey value=""/>       <!-- 32 hexadecimal characters if provided -->

  <!-- Folding Slots -->

  <slot id='0' type='GPU'/>
  <slot id='1' type='GPU'/>

</config>

Slurm Batch Scripts

CPU Job

[folding@ocf ~/fah]# cat run1core.sh

#!/bin/bash
# OCF - www.ocf.co.uk

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=<ntasks>
#SBATCH --job-name=CPU_Folding@Home
#SBATCH --partition=<partition_name>
#SBATCH --time=21-0

FAHDIR=<fah_directory>
cd $FAHDIR
mkdir $FAHDIR/bin/slurm_work/$SLURM_JOBID



$FAHDIR/bin/FAHClient --config $FAHDIR/bin/config-gpu.xml --checkpoint 10 --chdir $FAHDIR/bin/slurm_work/$SLURM_JOBID --http-addresses 0:$(shuf -i 10000-11000 -n1) --command-port $(shuf -i 11000-12000 -n1) --log $FAHDIR/output/$SLURM_JOBID.log --smp --exit-when-done

rm -rf $FAHDIR/bin/slurm_work/$SLURM_JOBID

GPU Job

[folding@ocf ~/fah]# cat runGPU.sh

#!/bin/bash
# OCF - www.ocf.co.uk

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --gres=gpu:2
#SBATCH --job-name=GPU_Folding@Home
#SBATCH --partition=<partition_name>
#SBATCH --time=21-0

FAHDIR=<fah_directory>
cd $FAHDIR
mkdir $FAHDIR/bin/slurm_work/$SLURM_JOBID



$FAHDIR/bin/FAHClient --config $FAHDIR/bin/config-gpu.xml --checkpoint 10 --chdir $FAHDIR/bin/slurm_work/$SLURM_JOBID --http-addresses 0:$(shuf -i 10000-11000 -n1) --command-port $(shuf -i 11000-12000 -n1) --log $FAHDIR/output/$SLURM_JOBID.log --smp --gpu --exit-when-done

rm -rf $FAHDIR/bin/slurm_work/$SLURM_JOBID
Clone repository
  • Instructions
  • Home