Differences

This shows you the differences between two versions of the page.

Link to this comparison view

en:proof_on_demand_pod [2016/12/16 10:16] (current)
Line 1: Line 1:
 +Last modified: Aug 16, 2016 by Calvat\\
 +\\
 +
 +====== Proof On Demand (PoD) ======
 +
 +\\
 +\\
 +**//WARNING : PoD is not working correctly at CCIN2P3. The reason is still unknown and the PoD developper is not working on it.//**
 +=====  Basics of PoD at CC-IN2P3 ​ =====
 +
 +PoD allows you to run a PROOF session on the batch farm rather than on dedicated machines.\\
 +The user submits an array job on Grid Engine, each subjob spawns a "PROOF worker",​ then the proof session can start.\\
 +It may look difficult to do so, but actually in only a few steps, a user can launch a PROOF session from scratch.\\
 +Project home page [[http://​pod.gsi.de|here]]
 +=====  How to use PoD at CC-IN2P3 ​ =====
 +
 +
 +====  Configuration ​ ====
 +
 +
 +===  Setup the environment ​ ===
 +
 +Set a temporary dir (not on AFS)
 +<​code>​
 +export TMPDIR=/​scratch/​$USER
 +</​code>​
 +You probably already know how to use ROOT on the interactives machines of CC-IN2P3, and you may already have your way to do it.\\
 +If it is not the case, you can do it using preinstalled ROOT binaries :
 +<​code>​
 +# connect to an interactive machine
 +ssh cca.in2p3.fr
 +# source the ROOT and xrootd environments
 +source /​usr/​local/​root/​new/​bin/​thisroot.sh
 +source /​usr/​local/​root/​new/​bin/​setxrd.sh \ /​usr/​local/​products/​xrootd/​root/<​version>​
 +# source the PoD environment
 +source /​usr/​local/​root/​PoD/<​version>/​PoD_env.sh
 +</​code>​
 +The last command above will create for you a preference directory ($HOME/​.PoD/​) at the very first time you launch it. 
 +===  Edit your preferences ​ ===
 +
 +Edit the file $HOME/​.PoD/​PoD.cfg and apply the following modifications.
 +
 +  ***IMPORTANT** : at the begining of this file, the line right below the "​[server]"​ section, set
 +<​code>​
 +work_dir=$TMPDIR
 +</​code>​
 +  ***IMPORTANT** : make sure that your environment contains a variable TMPDIR pointing to a non-AFS directory .
 +  ***OPTIONAL** : you can change your default GridEngine settings doing the following :
 +<​code>​
 +cp $POD_LOCATION/​etc/​Job.ge.option $HOME/​.PoD/​etc/​
 +</​code>​
 +Then edit the file $HOME/​.PoD/​PoD.cfg,​ look for the line right below the "​[ge_plugin]"​ and replace it with :
 +<​code>​
 +options_file=$HOME/​.PoD/​etc/​Job.ge.option
 +</​code>​
 +
 +You may then edit this new file ($HOME/​.PoD/​etc/​Job.ge.option) and set the values you want (project name, batch queue, time limit...). ​
 +====  How to launch a PROOF session on PoD  ====
 +
 +
 +===  Submit the array job  ===
 +
 +You must start the PoD server on your machine in order to be able to submit your job to the Grid Engine farm.
 +<​code>​
 +pod-server start
 +</​code>​
 +NB: if this command does not work, there is probably something wrong with your configuration. Check it.\\
 +\\
 +Then you should be able to submit the array job on the GridEngine.\\
 +Here is an example on how to spawn 4 PROOF workers :
 +<​code>​
 +pod-submit -r ge -n 4
 +</​code>​
 +A way to know the status of the job is to use the GridEngine command "​qstat"​. ​
 +===  Simple PROOF test  ===
 +
 +When your subjobs are running (at least one of them), you can test the PROOF connection with :
 +<​code>​
 +root -l -b
 +root [0] TProofBench pb("​pod://","​workers=2"​)
 +Starting master: opening connection ...
 +Starting master: OK                                                 
 +Opening connections to workers: OK (2 workers) ​                
 +Setting up worker servers: OK (2 workers) ​                
 +PROOF set to parallel mode (2 workers)
 + Run description:​ PROOF at , 2 workers
 +Info in <​TProofBench::​SetOutFile>:​ using default output file: '​proofbench-ccage017.in2p3.fr-2w-20131128-1814.root'​
 +root [1] pb.RunCPU()
 +...
 +root [2] .q
 +</​code>​
 +
 +===  Test file access on local storages ​ ===
 +
 +Here is a simple example to test the file access and the PROOF processing of those files : 
 +====  Good practice ​ ====
 +
 +When your PROOF session finishes, your job array is still be running.\\
 +If you are not using the workers anymore, you are invited to stop the PoD server with
 +<​code>​
 +pod-server stop
 +</​code>​
 +When you have finished your work and disconnect from the interactive machine, you are invited to remove the temporary scratch directory that you have used for your session. Indeed, when you reconnect to cca, you will likely be directed to another physical machine holding another scratch directory.
 +<​code>​
 +rm -r $TMPDIR
 +</​code>​
 +
 +====  Troubleshooting ​ ====
 +
 +  * pod-server does not start : check your server-workdir in $HOME/​.PoD/​PoD.cfg
 +  * job take time to spawn : try with a shorter queue
 +  * PROOF connection fails : check with qstat that your jobs are running or check there is no broken link in your $HOME/​.proof
 +
 +
  
  • en/proof_on_demand_pod.txt
  • Last modified: 2016/12/16 10:16
  • (external edit)