Skip to content

Instantly share code, notes, and snippets.

@LincolnBryant
Created July 22, 2021 16:39
Show Gist options
  • Select an option

  • Save LincolnBryant/fd1979ee81c3ed05206ea81ec47362f7 to your computer and use it in GitHub Desktop.

Select an option

Save LincolnBryant/fd1979ee81c3ed05206ea81ec47362f7 to your computer and use it in GitHub Desktop.

Logging in to the Analysis Facility

First you will need to sign up on the Analysis Facility website: https://af.uchicago.edu

Please use your institutional or CERN identity when signing up, as this will make the approval process smoother.

Once you have created your account and been approved, you will need to upload an SSH Public Key.

If you are not sure if you have generated an SSH Public Key before, try the following command (Mac/Linux) on your laptop:

cat ~/.ssh/id_rsa.pub

If the file exists, you should be able to copy the contents of this file to your profile on the AF website. Do not copy the contents of a file that does not end in .pub. You must only upload the public (.pub) part of the key.

If you do not have a public key, you can generate one via the following command (Mac/Linux):

ssh-keygen -t rsa

Upload the resulting public key (ending in .pub) to your profile.

Once you have uploaded a key, it will take a little bit of time to process your profile and add your account to the system. After 10-15 minutes, you ought to be able to login via SSH:

ssh login.af.uchicago.edu

If it does not work, please double check that you have been approved, have a public key uploaded and have waited at least 15 minutes. If you still have an issue, feel free to reach out to us for help.

Submitting to the Analyis Facility

The UChicago Analysis Facility uses HTCondor for batch workloads. You will need to define an executable script and a "submit" file that describes your job. A simple job that loads the ATLAS environment looks something like this:

Job script, called myjob.sh:

#!/bin/bash
export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase
export ALRB_localConfigDir=$HOME/localConfig
source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh
# at this point, you can lsetup root, rucio, athena, etc..

Submit file:

Universe = vanilla

Output = myjob.$(Cluster).$(Process).out
Error = myjob.$(Cluster).$(Process).err
Log = myjob.log

Executable = myjob.sh

request_memory = 1GB
request_cpus = 1

Queue 1

The condor_submit command is used to queue jobs:

$ condor_submit myjob.sub
Submitting job(s).
1 job(s) submitted to cluster 17.

And the condor_q command is used to view the queue:

[lincolnb@login01 ~]$ condor_q


-- Schedd: head01.af.uchicago.edu : <192.170.240.14:9618?... @ 07/22/21 11:28:26
OWNER    BATCH_NAME    SUBMITTED   DONE   RUN    IDLE  TOTAL JOB_IDS
lincolnb ID: 17       7/22 11:27      _      1      _      1 17.0

Total for query: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
Total for lincolnb: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
Total for all users: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended

Configuring your jobs to use an X509 Proxy Certificate

If you need to use an X509 Proxy, e.g. to access ATLAS Data, you will want to copy your X509 certificate to the Analysis Facility.

Store your certificate in $HOME/.globus and create a ATLAS VOMS proxy in the usual way:

export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase
export ALRB_localConfigDir=$HOME/localConfig
source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh
lsetup emi
voms-proxy-init -voms atlas -out $HOME/x509proxy

You will want to generate the proxy on, or copy it to, the shared $HOME filesystem so that the HTCondor scheduler can find and read the proxy. With the following additions to your jobscript, HTCondor will configure the job enviroment automatically for X509 authenticated data access:

use_x509userproxy = true
x509userproxy = /home/YOURUSERNAME/x509proxy

E.g., in the job above for the user lincolnb:

Universe = vanilla

Output = myjob.$(Cluster).$(Process).out
Error = myjob.$(Cluster).$(Process).err
Log = myjob.log

Executable = myjob.sh

use_x509userproxy = true
x509userproxy = /home/lincolnb/x509proxy

request_memory = 1GB
request_cpus = 1

Queue 1

Using Analysis Facility Filesystems

The UChicago analysis faciltiy has three filesystems that you should be aware of when running workloads. The table below describes their differences:

Filesystem Quota Path Backed up ? Notes
$HOME 100GB /home/$USER Yes Solid-state filesystem, shared to all worker nodes
$WORK 10 TB /work/$USER No CephFS filesystem, shared to all worker nodes
$SCRATCH n/a /scratch No Ephemeral storage for workloads, local to worker nodes

When submitting jobs, you should try to use the local scratch filesystem whenever possible. This will help you be a "good neighbor" to other users on the system, and reduce overall stress on the shared filesystems, which can lead to slowness, downtimes, etc. By default, jobs start in the $SCRATCH directory on the worker nodes. Output data will need to be staged to the shared filesystem or it will be lost!

In the following example, data is read from Rucio, we pretend to process it, and then push a small output copied back to the $HOME filesystem. It assumes your X509 proxy certificate is valid and in your home directory.

#!/bin/bash
export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase
export ALRB_localConfigDir=$HOME/localConfig
source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh
lsetup rucio
rucio --verbose download --rse MWT2_DATADISK data16_13TeV:AOD.11071822._001488.pool.root.1

# You can run things like asetup as well
asetup AnalysisBase,21.2.81

# This is where you would do your data analysis via AnalysisBase, etc. We will
# just pretend to do that, and truncate the file to simulate generating an
# output file. This is definitely not what you want to do in a real analysis!
cd data16_13TeV
truncate --size 10MB AOD.11071822._001488.pool.root.1
cp AOD.11071822._001488.pool.root.1 $HOME/myjob.output

It gets submitted in the usual way:

Universe = vanilla

Output = myjob.$(Cluster).$(Process).out
Error = myjob.$(Cluster).$(Process).err
Log = myjob.log

Executable = myjob.sh

use_x509userproxy = true
x509userproxy = /home/lincolnb/x509proxy

request_memory = 1GB
request_cpus = 1

Queue 1
$ condor_submit myjob.sub
Submitting job(s).
1 job(s) submitted to cluster 17.

Using Docker / Singularity containers (Advanced)

Some users may want to bring their own container-based workloads to the Analysis Facility. We support both Docker-based jobs as well as Singularity-based jobs. Additionally, the CVMFS repository unpacked.cern.ch is mounted on all nodes.

If, for whatever reason, you wanted to run a Debian Linux-based container on the Analysis Facilty, it would be as simple as the following Job file:

universe                = docker
docker_image            = debian
executable              = /bin/cat
arguments               = /etc/hosts
should_transfer_files   = YES
when_to_transfer_output = ON_EXIT
output                  = out.$(Process)
error                   = err.$(Process)
log                     = log.$(Process)
request_memory          = 1000M
queue 1

Similarly, if you would like to run a Singualrity container, such as the ones provided in th unpacked.cern.ch CVMFS repo, you can submit a normal vanilla universe job, with a job executable that looks something like the following:

#!/bin/bash
singularity run -B /cvmfs -B /home /cvmfs/unpacked.cern.ch/registry.hub.docker.com/atlas/rucio-clients:default rucio --version

Replacing the rucio-clients:default container and rucio --version executable with your preferred software.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment