Useful scripts¶
Analysis Scripting and Condor¶
Typical data analyses involve running large numbers of scripts in a sequence, likely with large groups of jobs that can all be run in parallel. Doing this on an interactive node will quickly use up all your available CPU cores and you will need to migrate your workflow to a "batch" environment. Condor is a standard system for "High Throughput Computing" where thousands of CPUs are organized into a cluster.
Getting started with Condor is rather intimidating (the usual quick start documentation is here) with several layers of abstraction beyond the usual interactive command line workflow. To submit a single job to Condor you must create a Job Description Language file (.jdl) which specifies the type of computer you want and what script to run with what arguments. To do anything useful that script will need to download the actual code you want to run, set up the environment, build the command you actually wanted to run in the first place, and make sure all input and output files end up where they are needed. Submitting a single Condor job is pointless, in the real world you will want to submit dozens or hundreds of jobs where some jobs need the output of others. Chaining together groups of Condor jobs can be accomplished with Condor DAGMan (Directed Acyclic Graph Manager).
Python Framework¶
A good way to manage all this complexity is by having a top-level python script that can run your commands locally or submit them to Condor if desired. We have developed python functions and classes to make this as easy as possible. You can get started with analysis scripting using the provided analysis_outline.py as a template.
Clone the nTupleAnalysis repository into your local CMSSW/src working area:
git clone git@github.com:patrickbryant/nTupleAnalysis.git
Copy the analysis_outline.py into your analysis scripts directory and edit it to script your analysis.
cp nTupleAnalysis/analysis_outline.py <yourAnalysis/scripts/analysis.py>
The idea is that you can run
python yourAnalysis/scripts/analysis.py --step1
to see all of the commands needed to run step1 of your analysis. To run them on condor simply use
python yourAnalysis/scripts/analysis.py --step1 --condor
If you run with -e it will actually execute the corresponding commands instead of just printing them out. It will automatically build a tarball of your CMSSW area if it doesn't already exist and copy it to EOS prior to submitting any condor jobs. Make sure you have a valid grid certificate and have initialized your proxy:
voms-proxy-init -voms cms -valid 192:00
DAG¶
Running
python nTupleAnalysis/analysis_outline.py --step1 --condor
automatically produces an example analysis.dag file and associated job description files (described in the next section):
cat analysis.dag
JOB A0 0121815414247.jdl
JOB A1 268401412459.jdl
JOB B0 538549115165.jdl
PARENT A0 A1 CHILD B0
The first three lines specify that we are going to be submitting three jobs to condor and the last line says that job B0 should run after jobs A0 and A1 are complete.
Using the python dag class is very simple. In your analysis.py script just create a DAG object
DAG = dag(fileName="analysis.dag") # gets a random file name unless you specify one
and add jobs to it by passing in a JDL object with the addJob member function
DAG.addJob(JDL)
You can add a generation to the DAG simply by calling the addGeneration member function
DAG.addGeneration()
and any jobs added after this will be executed after the jobs from the previous generation have finished. Finally, you can create the .dag file and submit it to condor simply by calling the submit member function:
DAG.submit()
JDLs and condor.sh¶
The key functionality of the JDL is to specify what command to run on the condor node. The jdl class from nTupleAnalysis /python/commandLineHelpers.py makes use of nTupleAnalysis/scripts/condor.sh to submit arbitrary commands utilizing your CMSSW and analysis code. Unless otherwise specified the python jdl class assigns a random file name:
class jdl:
def __init__(self, cmd=None, CMSSW=DEFAULTCMSSW, EOSOUTDIR="None", TARBALL=DEFAULTTARBALL, fileName=None, logPath = "./", logName = "condor_$(Cluster)_$(Process)"):
self.fileName = fileName if fileName else str(np.random.uniform())[2:]+".jdl"
...
We can easily create .jdl files using the python jdl class:
cmd = "python yourAnalysis/scripts/yourConfigScript.py -i yourAnalysis/fileLists/signalA.txt"
JDL = jdl(cmd)
JDL.make() # DAG.submit() will automatically make all the associated JDLs if they were not already made with this line
The DEFAULTCMSSW and DEFAULTTARBALL jdl arguments are set in commandLineHelpers.py as follows:
def getCMSSW():
return os.getenv('CMSSW_VERSION')
def getUSER():
return os.getenv('USER')
DEFAULTCMSSW = getCMSSW()
USER = getUSER()
DEFAULTTARBALL = "root://cmseos.fnal.gov//store/user/"+USER+"/condor/"+DEFAULTCMSSW+".tgz"
Let's look at one of the example .jdl files from the autogenerated analysis.dag:
cat 0121815414247.jdl
universe = vanilla
use_x509userproxy = true
Executable = nTupleAnalysis/scripts/condor.sh
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
Output = ./condor_$(Cluster)_$(Process).stdout
Error = ./condor_$(Cluster)_$(Process).stderr
Log = ./condor_$(Cluster)_$(Process).log
Arguments = CMSSW_11_1_0_pre5 None root://cmseos.fnal.gov//store/user/bryantp/condor/CMSSW_11_1_0_pre5.tgz python yourAnalysis/scripts/yourConfigScript.py -i yourAnalysis/fileLists/signalA.txt
+DesiredOS="SL7"
Queue 1
The Executable line specifies the script to be run while the Arguments line specifies the command line arguments that will be used. This .jdl will execute the following command on a condor node:
nTupleAnalysis/scripts/condor.sh CMSSW_11_1_0_pre5 None root://cmseos.fnal.gov//store/user/bryantp/condor/CMSSW_11_1_0_pre5.tgz yourAnalysis/scripts/yourConfigScript.py -i yourAnalysis/fileLists/signalA.txt
The condor.sh script uses the first argument (CMSSW_11_1_0_pre5) to specify the tarball name and the third argument (root://cmseos.fnal.gov//store/user/bryantp/condor/CMSSW_11_1_0_pre5.tgz) to know where to get the tarball on EOS. The second argument is None by default in the jdl python class and is used to specify an optional EOSOUTDIR to which output from your command can be copied. What's clever about the condor.sh script is that it executes all the remaining arguments as a single command string after setting up the CMSSW environment from your tarball so that you can run anything you want. In this example it will set up your CMSSW and try to execute
python yourAnalysis/scripts/yourConfigScript.py -i yourAnalysis/fileLists/signalA.txt
which you can see is the first command from analysis step1 in the template:
python nTupleAnalysis/analysis_outline.py --step1
# 0
python yourAnalysis/scripts/yourConfigScript.py -i yourAnalysis/fileLists/signalA.txt
# 1
python yourAnalysis/scripts/yourConfigScript.py -i yourAnalysis/fileLists/signalB.txt
hadd -f root://cmseos.fnal.gov//store/user/bryantp/condor/signal/hists.root root://cmseos.fnal.gov//store/user/bryantp/condor/signalA/hists.root root://cmseos.fnal.gov//store/user/bryantp/condor/signalB/hists.root