Différences

Ci-dessous, les différences entre deux révisions de la page.

Lien vers cette vue comparative

sam_jim_installation [2016/12/16 10:15] (Version actuelle)
Ligne 1: Ligne 1:
 +Modifié par Kurca, le 08 Jun 2009\\
 +\\
 +
 +====== SAM+JIM Installation ======
 +
 +\\
 +\\
 +
 +====== Chapitre 1 : ups/upd Installation ======
 +
 +http:​%%//​%%www.fnal.gov/​docs/​products/​ups/​ReferenceManual/​\\
 +\\
 +- links are not working .... don't know how to install ups/upd from those pages
 +=====  Copy it from existing installation on ccd0 (ccsvli02) ​ =====
 +
 +/​d0products/​products/​prd\\
 +/​d0products/​products/​db
 +<​code>​
 +sam@ccd01:​tcsh[226] pwd
 +/​d0products/​products
 +sam@ccd01:​tcsh[227] ll
 +total 12
 +drwxr-xr-x ​   5 sam      d0           4096 Sep 22 16:55 db
 +drwxr-xr-x ​   2 sam      d0           4096 Sep 22 14:56 etc
 +drwxr-xr-x ​   5 sam      d0           4096 Sep 22 14:58 prd
 +
 +sam@ccd01:​tcsh[230] cd db
 +
 +sam@ccd01:​tcsh[232] ls -al
 +total 20
 +drwxr-xr-x ​   5 sam      d0           4096 Sep 22 16:55 .
 +drwxr-xr-x ​   5 sam      d0           4096 Sep 22 14:56 ..
 +drwxr-xr-x ​   5 sam      d0           4096 Sep 22 16:55 .upsfiles
 +drwxr-xr-x ​   2 sam      d0           4096 Sep 22 14:56 upd
 +drwxr-xr-x ​   2 sam      d0           4096 Sep 22 14:55 ups
 +
 +</​code>​
 +Important .upsfiles for configuration and setup.\\
 +setup command defined here !!!!!! ​
 +====== Chapitre 2 : Globus error 76 ======
 +
 +<​code>​
 +Status ​  Held ("​Globus error 76: cannot access cache files in
 +  ~/​.globus/​.gass_cache,​ check permissions,​ quota, and disk space"​)
 +</​code>​
 +1. I am suspecting that Globus is trying to run your jobs as user sam and\\
 +is trying to access ~sam/​.globus/​.gass_cache\\
 +\\
 +Can you please clean up the /​etc/​grid-security/​grid-mapfile and map the\\
 +users to samgrid only and try again.\\
 +\\
 +... done, but this didn't solved the problem\\
 +\\
 +In the xinetd.d/​globus_gatekeeper I have defined\\
 +env += GLOBUS_GASS_CACHE_DEFAULT=/​samgrid/​gass-cache\\
 +The directory exists and is writeable for user samgrid !\\
 +Despite this globus somehow didn't get it!\\
 +Where should it be defined? How could I convince globus?\\
 +> > > > >\\
 +> >I wasn't aware of this configuration you have put in place. Have you\\
 +> > tried restarting globus after this changes were configured?​\\
 +> >\\
 +> > Yes, I have completely stopped sam_bootstrap,​ server_run and\\
 +restarted it.....\\
 +> > >\\
 +> > Neither server_run nor sam_bootstrap control globus. Globus is\\
 +> > controlled by xinetd. You need to restart xinetd as root to propagate\\
 +> > the changes to globus. Can you please try this?\\
 +>\\
 +> Done, restarted more times ... everything, but result is the same...\\
 +> I don't know what else...\\
 +\\
 +> so further iteration...\\
 +> I have removed from /​etc/​grid-security/​grid-mapfile,​ as you suggested,​\\
 +> the mappings to sam user. In fact after that I've seen in globus log file\\
 +> that really I was mapped before to sam and now it is to samgrid.\\
 +> Fine, but this didn't changed anything! The problem remained the same!\\
 +> After further digging and comparison with working ccd0 old installation\\
 +> I have found one interesting thing in the\\
 +> $GLOBUS_LOCATION/​etc/​globus-job-manager.conf\\
 +>\\
 +> On older ccd0 there are entries like:\\
 +> ...\\
 +> -scratch-dir-base /​samgrid/​globus_scratch\\
 +> -cache-location /​samgrid/​gass-cache\\
 +..... AND YES ! This makes the difference!!!!!!! ​
 +====== Chapitre 3 : sam_gridftp ======
 +
 +<​code>​
 +sam@ccd01:​tcsh[215] setup sam_gsi_config_util -q vdt
 +sam@ccd01:​tcsh[216] sam_gsi_get_gridmap ​
 +Running sam_gsi_get_gridmap script
 +append_cert_from_default true
 +overwrite_gridmap false
 +product_name gridftp
 +local_user
 +sam_gsi_get_gridmap:​ INFO: Using gridftp to transfer data
 +sam_gsi_get_gridmap:​ INFO: trying to download from the central gridftpd the updated grid-mapfile
 +sam_gsi_get_gridmap:​ Executing: gridftp d0rsam01.fnal.gov:/​sam/​cache1/​sam_gsi_config/​grid-security/​grid-mapfile /​tmp/​sam_gsi_get_gridmap_tmpdir.01061109100618219/​grid-mapfile
 +Executing as sam 2543
 +Using gridftp to transfer file.
 +gridftp: Local server subject is: /​DC=org/​DC=doegrids/​OU=Services/​CN=sam/​ccd01.in2p3.fr
 +gridftp: WARNING: cannot read domain name vs
 +gridftp: ​         gridftpd certificate subject map file.
 +gridftp: ​         Trying skeptically the default subject.
 +gridftp: Resolved remote sam server subject as: /​DC=org/​DC=doegrids/​OU=Services/​CN=sam/​d0rsam01.fnal.gov
 +gridftp: /​d0products/​products/​prd/​vdt/​v1_1_14_13/​Linux/​globus/​bin/​globus-url-copy -no-third-party-transfers -parallel 1 -s "/​DC=org/​DC=doegrids/​OU=Services/​CN=sam/​d0rsam01.fnal.gov"​ -nodcau
 +gsiftp://​d0rsam01.fnal.gov:​4567/​sam/​cache1/​sam_gsi_config/​grid-security/​grid-mapfile file://​localhost/​tmp/​sam_gsi_get_gridmap_tmpdir.01061109100618219/​grid-mapfile
 +error: the server sent an error response: 425 425 Can't open data connection. data_connect_failed() failed: a system call failed (No route to host).
 +
 +sam_gsi_get_gridmap:​ WARNING: cannot get up-to-date gridmap-file from the central gridftpd.
 +sam_gsi_get_gridmap: ​         Will use default gridmap-file.
 +sam_gsi_get_gridmap:​ INFO: local grid-mapfile already present at /​d0products/​products/​gsi/​gridftp/​grid-mapfile.gridftp
 +sam_gsi_get_gridmap: ​      Will add the missing subjects from the official grid-mapfile
 +sort: open failed: /​d0products/​products/​gsi/​gridftp/​grid-mapfile.gridftp:​ Permission denied
 +diff: /​tmp/​sam_gsi_get_gridmap_tmpdir.01061109100618219/​local-grid-mapfile:​ No such file or directory
 +sam_gsi_get_gridmap:​ INFO: local grid-mapfile was up-to-date
 +
 +sam_gsi_get_gridmap:​ IMPORTANT: Please check your grid-mapfile at
 +sam_gsi_get_gridmap: ​           /​d0products/​products/​gsi/​gridftp/​grid-mapfile.gridftp :
 +sam_gsi_get_gridmap: ​           this file contains the list of subjects authorized
 +sam_gsi_get_gridmap: ​           to use your local resources.
 +
 +sam_gsi_get_gridmap:​ Cleaning up...
 +</​code>​
 +vi /​d0products/​products/​db/​sam_gridftp/​sam_gridftp.config\\
 +\\
 +.... edit TCPPortRange set in setup....????????:​\\
 +#Data port range. Useful when running gridftp behind a firewall.\\
 +#The port range is comma separated e.g. 50001,​50100\\
 +#This option initilizes the GLOBUS_TCP_PORT_RANGE environment variable\\
 +#The default does not impose any restriction on the ports for the data channel.\\
 +#​TCPPortRange default\\
 +TCPPortRange 60001,​60200 ​
 +====== Chapitre 4 : sam_cp and sam_client ======
 +
 +=====  CC-IN2P3 specific rfcp  =====
 +
 +<​code>​
 +sam@ccd01:​tcsh[221] pwd
 +/​d0products/​products/​no_ups/​bin
 +
 +sam@ccd01:​tcsh[220] less  Readme
 +Do setup sam_cp -q vdt
 + and in $SAM_CP_DIR/​lib/​SamCpClasses.py we have to 
 +redefine the command of rfcp to use COS correctly
 +Check if no difference exists between hpss.py and myhpss.py
 +
 +class SamRfio(SamRfioBase):​
 +    def __init__(self,​ args={}):
 +        SamCp.__init__(self,​ args)
 +#        self.command = "​rfcp"​
 +# New command where the cos are defined
 +        self.command = "/​d0products/​products/​no_ups/​bin/​rfcp.py"​
 +
 +</​code>​
 +For this we need
 +<​code>​
 +/​d0products/​products/​no_ups/​
 +</​code>​
 +=====  sam_cp_config.py ​ =====
 +
 +<​code>​
 +> Looking at WN when the job is starting, I see that sam_cp_config.py there
 +    > is the one coming from the current sam_client. And doesn'​t correspond
 +    > to the $SAM_CP_CONFIG_FILE which is located in
 +    > $PRODUCTS/​sam_cp_config/​Config/​
 +    >
 +    >     For our setup it doesn'​t help if I have jim_gridftp defined in the
 +    > sam_client sam_cp_config.py. We need there '​rfio'​.
 +</​code>​
 +$SAM_CP_CONFIG_FILE is usually taken from the older, existing configuration\\
 +of sam_cp. ​
 +====== Chapitre 5 : sam_fcp ======
 +
 +tailoring of sam_fcp :\\
 +fcp_queue name= = "​fssBuffer"​\\
 +...... and not default ? ....... from where to know ????? 
 +====== Chapitre 6 : samgrid_batch_adapter ======
 +
 +=====  Installation ​ =====
 +
 +0. setup samgrid_batch_adapter\\
 +\\
 +1. Copy $SAMGRID_BATCH_ADAPTER_DIR/​etc/​handlers/​sam_bqs_handler.sh to\\
 +$SAMGRID_BATCH_ADAPTER_CONFIG_DIR. Go through the script and make sure\\
 +that all the paths/​commands in this file are correct and are valid for\\
 +your installation.\\
 +\\
 +2. Copy $SAMGRID_BATCH_ADAPTER_CONFIG_DIR/​ccin2p3_analysis%%__%%config%%__%%.py\\
 +to $SAMGRID_BATCH_ADAPTER_CONFIG_DIR/​ccin2p3_grid2%%__%%config%%__%%.py\\
 +Make changes to the ccin2p3_grid2%%__%%config%%__%%.py by replacing the old\\
 +station name with new one, correct the path to sam_bqs_handler in the\\
 +file, and any other changes that references to old setup.\\
 +\\
 +3. add an entry for your V7 station to the file\\
 +$SAMGRID_BATCH_ADAPTER_CONFIG_DIR/​samgrid_batch_adapter_config.py in the\\
 +same way to other entries.\\
 +\\
 +4. Run following command and see if everything looks ok -\\
 +\\
 +setup sam\\
 +sambatch display station config --station=ccin2p3-grid2\\
 +\\
 +Now try to submit the job.
 +<​code>​
 +ccd01:​db/​samgrid_batch_adapter/​v7_0_2> ​  ​sambatch display station config ​       ​
 +  Exception Class: BatchConfig.UnknownStationNameError
 +  Error Message: ​  ​Unknown station name: ccin2p3-grid2.
 +
 +</​code>​
 +In order to avoid this problem do:
 +<​code>​
 +ccd01:​db/​samgrid_batch_adapter/​v7_0_2> ​  ​sambatch add station config
 +Reusing old configuration for station ccin2p3-grid2.
 +Added configuration for station ccin2p3-grid2.
 +
 +</​code>​
 +Now you can do:
 +<​code>​
 +ccd01:​db/​samgrid_batch_adapter/​v7_0_2> ​  ​sambatch display station config
 +Station: ccin2p3-grid2 ​
 +  Default Adapter: grid
 +  Available Adapters: ['​grid'​]
 +    Adapter: grid (greed)
 +      Default Queue: A
 +      Available Queues: ['​A'​]
 +        Project Queue: A (A)
 +      Available Commands: ['job lookup command',​ 'job submit command',​ 'job kill command'​]
 +        Command: $SAMGRID_BATCH_ADAPTER_CONFIG_DIR/​sam_bqs_handler.sh job_lookup --project=%__USER_PROJECT__ --local-job-id=%__BATCH_JOB_ID__
 +          Type: job lookup command
 +          Known Outcomes:
 +            Exit Status: 0
 +            Outcome Description:​ Success
 +            Exit Status: 0
 +            Expected Output: JobId=%__BATCH_JOB_ID__ Status=%__BATCH_JOB_STATUS__
 +            Exit Status: 1
 +            Outcome Description:​ Failure
 +        Command: $SAMGRID_BATCH_ADAPTER_CONFIG_DIR/​sam_bqs_handler.sh job_submit --project=%__USER_PROJECT__ --executable=%__USER_SCRIPT__ --arguments=%__USER_SCRIPT_ARGS__ ​ --stdout=%__USER_JOB_OUTPUT__ --stderr=%__USER_JOB_ERROR__
 +          Type: job submit command
 +          Known Outcomes:
 +            Exit Status: 0
 +            Outcome Description:​ Success
 +            Exit Status: 0
 +            Expected Output: job %__BATCH_JOB_ID__ submitted to
 +            Exit Status: 1
 +            Outcome Description:​ Failure
 +        Command: $SAMGRID_BATCH_ADAPTER_CONFIG_DIR/​sam_bqs_handler.sh job_kill --project=%__USER_PROJECT__ --local-job-id=%__BATCH_JOB_ID__
 +          Type: job kill command
 +          Known Outcomes:
 +            Exit Status: 0
 +            Outcome Description:​ Success
 +            Exit Status: 1
 +            Outcome Description:​ Failure
 +ccd01:​db/​samgrid_batch_adapter/​v7_0_2> ​    
 +
 +</​code>​
 +In fact problem was due to the missing sam_bqs_handler.sh in the directory\\
 +${SAM_BATCH_ADAPTER_CONFIG_DIR}\\
 +In previous versions it was in the ${SAM_BATCH_ADAPTER_HANDLER_DIR}.\\
 +\\
 +> > When you user configures samgrid_batch_adapter,​ he/she is expected to\\
 +> > create/​verify the existence of the handler file. Even in old\\
 +> > configuration,​ CONFIG_DIR is the default dir we use in samgrid. You are\\
 +> > not forced to use the defaults. These files are purely execution sites'​\\
 +> > local configuration and you can put it anywhere on the machine. Just\\
 +> > change the location appropriately in the batchadapter configuration and\\
 +> > make sure user samgrid can read and execute this file.
 +<​code>​
 +Date: Thu, 09 Nov 2006 11:57:24 -0600
 +From: Parag Mhashilkar <​parag@fnal.gov>​
 +To: Tibor Kurca <​kurca@in2p3.fr>​
 +Cc: garzogli@fnal.gov
 +Subject: Re: [SAM-IT/​2363] ​ Globus error 76 (#​3/​comment)
 +Parts/​attachments:​
 +   1 Shown  ~118 lines  Text                                                  ​
 +   ​2 ​        3.8 KB     ​Application ​                                          
 +----------------------------------------
 +
 +The reason for preferring the handler in CONFIG_DIR over
 +${SAM_BATCH_ADAPTER_HANDLER_DIR} -
 +
 +Contents of ${SAM_BATCH_ADAPTER_HANDLER_DIR} change with the new
 +samgrid_batch_adapter installation. However,
 +${SAM_BATCH_ADAPTER_CONFIG_DIR} is synchronized with the earlier
 +samgrid_batch_adapter installation. The rationale is that CONFIG_DIR has
 +site specific configurations. There may be some other site using bqs and
 +their way to access bqs is different than yours.
 +${SAM_BATCH_ADAPTER_HANDLER_DIR} is supposed to contain only sample
 +handlers and provide as a starting point for the users. We cannot
 +guarantee that sample files will remain same in future releases of
 +samgrid_batch_adapter. If they change, your installation will be broken
 +and we do not want that. I hope I have cleared any confusion you are
 +having.
 +
 +</​code>​
 +
 +====== Chapitre 7 : vdt ======
 +
 +<​code>​
 +[root@ccd01 db]$ ups InstallAsRoot vdtinstallAsRoot.sh:​ trying to configure the system to run globus gatekeeper via xinetd
 +/​d0products/​products/​prd/​vdt/​v1_3_2_3/​Linux/​ups/​installAsRoot.sh:​ line 11: /​d0products/​products/​prd/​vdt/​v1_3_2_3/​Linux//​vdt/​setup/​configure_globus.sh:​ No such file or directory
 +installAsRoot.sh:​ FAILED!
 +[root@ccd01 db]$ 
 +
 +</​code>​
 +<​code>​
 +[root@ccd01 ~]$ source /​d0products/​products/​etc/​setups.csh
 +[root@ccd01 ~]$ setup vdt -z /​d0products/​products/​db
 +ERROR: Error occurred for product '​UPSACT:​ ' '​vdt'​ '​v1_1_14_13'​ '​Linux'​.
 +INFORMATIONAL:​ No '​PROD_DIR'​ keyword for product '​vdt'​
 +ERROR: Error occurred for product '​UPSACT:​ ' '​vdt'​ '​v1_1_14_13'​ '​Linux'​.
 +ERROR: Error when writing to temp file while processing dodefaults action
 +[root@ccd01 ~]$ ups InstallAsRoot vdt
 +INFORMATIONAL:​ There is no ACTION=InstallAsRoot section in this table file.
 +[root@ccd01 ~]$ 
 +</​code>​
 +<​code>​
 +sam@ccd01:​tcsh[238] ups list -aK+ vdt
 +sam@ccd01:​tcsh[239] setup upd
 +sam@ccd01:​tcsh[240] upd install vdt -G-c
 +informational:​ tcl v8_3_1 already exists on local node, skipping.
 +informational:​ tk v8_3_1 already exists on local node, skipping.
 +informational:​ blt v2_4u already exists on local node, skipping.
 +informational:​ python v2_1 already exists on local node, skipping.
 +informational:​ pacman v2_116 already exists on local node, skipping.
 +informational:​ perl v5_8 already exists on local node, skipping.
 +Creating version link in /​d0products/​products/​db/​vdt/​Symlinks for vdt v1_1_14_13.
 +Creating current link in /​d0products/​products/​db/​vdt/​Symlinks for vdt v1_1_14_13.
 +informational:​ installed vdt v1_1_14_13.
 +WARNING: product tcl not in local dependency list
 +WARNING: product tk not in local dependency list
 +WARNING: product blt not in local dependency list
 +Warning: For product "​python"​local node flavor Linux+2 does not match distribution node flavor Linux+2.4 ​
 +Execute the following to resolve chains: ​
 +ups declare -f NULL -q ""​ -g current pacman v2_116 -z /​d0products/​products/​db
 +upd install succeeded.
 +sam@ccd01:​tcsh[241] ups list -aK+ vdt
 +"​vdt"​ "​v1_1_14_13"​ "​Linux"​ ""​ "​current" ​
 +sam@ccd01:​tcsh[242] ups tailor vdt
 +You can install VDT in the ups product area, or configure this product to use a VDT that is already installed on your system. Do you want to install VDT in the ups product area [default: yes]? 
 +You answered yes
 +After the installation you must read the product licences at /​d0products/​products/​prd/​vdt/​v1_1_14_13/​Linux. If you do not agree with the licences, will you delete VDT from your system [default: yes]? 
 +You answered yes
 +Do you want to start a new Pacman installation here: [/​d0products/​products/​prd/​vdt/​v1_1_14_13/​Linux]?​ (y or n): Do you want to trust the unregistered cache [http://​www.cs.wisc.edu/​vdt/​vdt_1114_cache]?​ (y or n): Done.
 +Fetching binaries for Condor Globus...
 +Can't find package [Condor] in any of these caches:
 +     ​http://​www.cs.wisc.edu/​vdt/​vdt_1114_cache
 +     ​http://​physics.bu.edu/​~youssef/​pacman/​sample_cache/​
 +Pacman is exiting. Saving work so far...
 +sam@ccd01:​tcsh[243] ups tailor vdt
 +You can install VDT in the ups product area, or configure this product to use a VDT that is already installed on your system. Do you want to install VDT in the ups product area [default: yes]? 
 +You answered yes
 +After the installation you must read the product licences at /​d0products/​products/​prd/​vdt/​v1_1_14_13/​Linux. If you do not agree with the licences, will you delete VDT from your system [default: yes]? 
 +You answered yes
 +Done.
 +Fetching binaries for Condor Globus...
 +Can't find package [Condor] in any of these caches:
 +     ​http://​www.cs.wisc.edu/​vdt/​vdt_1114_cache
 +     ​http://​physics.bu.edu/​~youssef/​pacman/​sample_cache/​
 +Pacman is exiting. Saving work so far...
 +sam@ccd01:​tcsh[244] ups declare -f NULL -q ""​ -g current pacman v2_116 -z /​d0products/​products/​db
 +sam@ccd01:​tcsh[245] ups tailor vdt
 +You can install VDT in the ups product area, or configure this product to use a VDT that is already installed on your system. Do you want to install VDT in the ups product area [default: yes]? yes
 +You answered yes
 +After the installation you must read the product licences at /​d0products/​products/​prd/​vdt/​v1_1_14_13/​Linux. If you do not agree with the licences, will you delete VDT from your system [default: yes]? 
 +You answered yes
 +After the installation you must read the product licences at /​d0products/​products/​prd/​vdt/​v1_1_14_13/​Linux. If you do not agree with the licences, will you delete VDT from your system [default: yes]? 
 +You answered yes
 +Checking for updates...
 +Done.
 +Fetching binaries for Condor Globus...
 +Can't find package [Globus] in any of these caches:
 +     ​http://​www.cs.wisc.edu/​vdt/​vdt_1114_cache
 +     ​http://​physics.bu.edu/​~youssef/​pacman/​sample_cache/​
 +Pacman is exiting. Saving work so far...
 +sam@ccd01:​tcsh[249] ups tailor vdt
 +You can install VDT in the ups product area, or configure this product to use a VDT that is already installed on your system. Do you want to install VDT in the ups product area [default: yes]? 
 +You answered yes
 +After the installation you must read the product licences at /​d0products/​products/​prd/​vdt/​v1_1_14_13/​Linux. If you do not agree with the licences, will you delete VDT from your system [default: yes]? 
 +You answered yes
 +Checking for updates...
 +Done.
 +Fetching binaries for Condor Globus...
 +Can't find package [Globus] in any of these caches:
 +     ​http://​www.cs.wisc.edu/​vdt/​vdt_1114_cache
 +     ​http://​physics.bu.edu/​~youssef/​pacman/​sample_cache/​
 +Pacman is exiting. Saving work so far...
 +sam@ccd01:​tcsh[250] ​
 +
 +</​code>​
 +.... condor, globus and other packages not installed and are missing in\\
 +/​d0products/​products/​prd/​vdt/​v1_1_14_13/​Linux/​\\
 +\\
 +..... done by hand....:\\
 +scp -r ccd0.in2p3.fr:/​d0products/​products/​prd/​vdt/​v1_1_14_13/​Linux/​ . 
 +====== Chapitre 8 : Only jim_client site ======
 +
 +Installing jim_client site for submitting v7 jobs you need:
 +<​code>​
 +"​jim_client"​ "​v2_2_12"​ "​NULL"​ ""​ "​current"​
 +"​python"​ "​v2_4_2_sam"​ "​Linux+2"​ ""​ "​current"​
 +"​sam"​ "​v7_5_3_jim7"​ "​NULL"​ ""​ "​current"​
 +"​samgrid_util"​ "​v3_0_0"​ "​NULL"​ ""​ "​current"​
 +"​jim_config"​ "​v2_0_0"​ "​NULL"​ ""​ "​current"​
 +"​xml_common_lib"​ "​v1_0"​ "​NULL"​ ""​ "​current"​
 +"​pyxml"​ "​v0_8_4"​ "​Linux+2.4"​ ""​ "​current"​
 +"​xmldb_client"​ "​v3_0_2"​ "​NULL"​ ""​ "​current"​
 +"​xml_meta_configurator"​ "​v2_0_1"​ "​NULL"​ ""​ "​current"​
 +"​samgrid_logger_client"​ "​v0_4"​ "​NULL"​ ""​ "​current"​
 +"​sam_gsi_config"​ "​v2_2_26"​ "​NULL"​ "​vdt"​ "​current"​
 +
 +and any dependencies they might bring in. You do not need to have any 
 +sam/jim servers running.
 +
 +</​code>​
 +If you want to install a v7 native samgrid exec site then you need\\
 +all the packages you listed and must run the sam/jim servers.\\
 +You only need jim_gridftp and sam_fcp if you are installing the v7 native exec\\
 +site. To tailor them and get the info into the local db you must have the\\
 +xmldb_server running while tailoring. ​
 +====== Chapitre 9 : add disk ======
 +
 +<​code>​
 +samadmin add disk location
 +    --mountPoint=cchpssd0.in2p3.fr:/​hpss/​in2p3.fr/​group/​d0
 +    --connect=kinyip/​xxx@d0ofprd1 --relativePath=grid2/​upload
 +    New locationId = 5542
 +    For henrik.nilsen@physik.uni-freiburg.de perhaps, since the data disk 
 +location,
 +    cchpssd0.in2p3.fr:/​hpss/​in2p3.fr/​group/​d0,​ one can just "add disk 
 +location";​
 +    otherwise, one needs "​samadmin add data disk".
 +    And one can check that they are there, you may go to http://​d0db-
 +    prd.fnal.gov/​sam_data_browsing/ ​ and click "File Location"​ on the left to 
 +look
 +    for it.  There, you may need to edit to set the no. of rows (some 
 +thousands) in
 +    order to see everything at once.
 +
 +
 +</​code>​
 +
 +====== Chapitre 10 : Grid-Mapfiles ======
 +
 +GT4: Security: Pre-WS Authentication & Authorization Admin Guide\\
 +http:​%%//​%%www-unix.globus.org/​toolkit/​docs/​4.0/​security/​prewsaa/​admin-index.html#​id2537164\\
 +\\
 +\\
 +ccd01.in2p3.fr as a scheduler\\
 +Reason of the error below:\\
 +missing entry in the\\
 +/​d0products/​products/​gsi/​jim_broker_client/​grid-mapfile.jim_broker_client\\
 +\\
 +Error message during job submission:
 +<​code>​
 +[kurca@ccd01 ~/samgrid]$ samg submit prodtest.jdf
 +samg: 11/07/06 14:52:47: INFO : Checking Grid credentials...
 +samg: 11/07/06 14:52:47: INFO : Checking Grid credentials ... DONE
 +samg_submit:​ 11/07/06 14:52:47: INFO : Verifying the syntax of JDF
 +samg_submit:​ 11/07/06 14:52:47: INFO : Verifying the syntax of JDF ... DONE
 +***Processing job type dzero_reconstruction
 +recojob: 11/07/06 14:52:47: INFO : Getting latest snapshot id for dataset d0repro_jobfiles_p17.05.01_samgridV7-2
 +recojob: 11/07/06 14:52:49: INFO : Getting latest snapshot id for dataset d0repro_jobfiles_p17.05.01_samgridV7-2 ... DONE
 +recojob: 11/07/06 14:52:49: INFO : Getting latest snapshot id for dataset parag-test-d0reco-dataset-5
 +recojob: 11/07/06 14:52:51: INFO : Getting latest snapshot id for dataset parag-test-d0reco-dataset-5 ... DONE
 +/​d0products/​products/​prd/​jim_client/​v2_2_12/​NULL/​lib/​jim_client/​samgjob.py:​473:​ RuntimeWarning:​ tmpnam is a potential security risk to your program
 +  tokenValue=os.tmpnam()+TEMPORARY_FILE_SUFFIX
 +recojob: 11/07/06 14:52:51: INFO : Querying SAM to get the sam username ...
 +recojob: 11/07/06 14:52:53: INFO : Querying SAM to get the sam username ... DONE
 +jim_client.jim_client_util:​ 11/07/06 14:52:53: INFO : Querying SAM to get dataset definition details ...
 +jim_client.jim_client_util:​ 11/07/06 14:52:56: INFO : Querying SAM to get dataset definition details ... DONE
 +jim_client.jim_client_util:​ 11/07/06 14:52:56: INFO : You have your proxy located at /​tmp/​x509up_u2794
 +recojob: 11/07/06 14:52:56: INFO : You have your proxy located at /​tmp/​x509up_u2794
 +recojob: 11/07/06 14:52:56: INFO : Executing: $JIM_CLIENT_DIR/​bin/​get_myproxy.sh fermigrid4.fnal.gov /​tmp/​x509up_u2794
 +recojob: 11/07/06 14:52:58: INFO : myproxy at server is valid 64
 +
 +recojob: 11/07/06 14:52:58: INFO : Checking the version of the DZero code in the binary dataset d0repro_jobfiles_p17.05.01_samgridV7-2...
 +jim_client.jim_client_util:​ 11/07/06 14:52:58: INFO : Querying SAM to get dataset definition details ...
 +jim_client.jim_client_util:​ 11/07/06 14:52:58: INFO : Querying SAM to get dataset definition details ... DONE
 +recojob: 11/07/06 14:52:58: INFO : Checking the version of the DZero code in the binary dataset d0repro_jobfiles_p17.05.01_samgridV7-2 ... DONE
 +recojob: 11/07/06 14:52:58: INFO : Checking files in input dataset (press ^C to skip) ... 
 +recojob: 11/07/06 14:52:58: INFO : Checking files in input dataset ... DONE
 +samg_submit:​ 11/07/06 14:52:58: INFO : Executing the submission command:
 +Executing submission command:
 +['​condor_submit',​ '​-s',​ '/​tmp/​COND_JDF_0_30265.sub'​]
 +stdout:​Submitting job(s)
 +stderr:
 +ERROR: Failed to connect to local queue manager
 +AUTHENTICATE:​1003:​Failed to authenticate with any method
 +AUTHENTICATE:​1004:​Failed to authenticate using GSI
 +GSI:​5004:​Failed to get authorization from server. ​ Either the server does not trust your certificate,​ or you are not in the server'​s authorization file (grid-mapfile)
 +
 +ERROR: Job submission failed. ​
 +
 +ERROR: Failed to connect to local queue manager
 +AUTHENTICATE:​1003:​Failed to authenticate with any method
 +AUTHENTICATE:​1004:​Failed to authenticate using GSI
 +GSI:​5004:​Failed to get authorization from server. ​ Either the server does not trust your certificate,​ or you are not in the server'​s authorization file (grid-mapfile)
 +
 +
 +Leaving submission file... /​tmp/​COND_JDF_0_30265.sub
 +[kurca@ccd01 ~/​samgrid]$ ​
 +
 +</​code>​
 +If scheduler configured to samgrid.fnal.gov the job is submitted but then "​HELD":​
 +<​code>​
 +Job Information of project kurca_ccd01_143515_26477  ​
 +
 + 
 +
 + Job Summary ​     ​
 + 
 + ​InternalJobID ​  27156 [ Details ] [  Remote Monitoring ]  ​
 + 
 + ​Submission Time   Nov 07 2006, 14:35:27 GMT  ​
 + 
 + ​Status ​  Held ("​Globus error 76: cannot access cache files in ~/​.globus/​.gass_cache,​ check permissions,​ quota, and disk space"​) (Download Output)  ​
 + 
 + ​Status Entered On   Nov 07 2006, 14:41:31 GMT  ​
 + 
 + ​Output Machine ​  ​samgrid.fnal.gov  ​
 + 
 + ​Output Directory ​  To be announced  ​
 +
 +  ​
 +</​code>​
 +
 +====== Chapitre 11 : SAM semaphores ======
 +
 +Each "ups start sam_bootstrap"​ starts 3 semaphores....\\
 +.. and "ups stop sam_bootstrap"​ doesn'​t stop them !!!!!\\
 +It means that each start/stop adds additional 3 semaphores.\\
 +During installation/​debugging it could lead to problems if more than\\
 +100 semaphores are active.
 +<​code>​
 +sam@ccd01:​tcsh[292] ipcs -s
 +
 +------ Semaphore Arrays --------
 +key        semid      owner      perms      nsems     
 +0x00000000 4882432 ​   sam       ​755 ​       1         
 +0x00000000 4915201 ​   sam       ​755 ​       1         
 +0x00000000 4947970 ​   sam       ​755 ​       1         
 +0x00000000 4980739 ​   sam       ​755 ​       1         
 +0x00000000 5013508 ​   sam       ​755 ​       1         
 +0x00000000 5046277 ​   sam       ​755 ​       1         
 +
 +sam@ccd01:​tcsh[293] ​
 +
 +ipcrm -s 4882432
 +
 +</​code>​
 +
 +====== Chapitre 12 : Automatic Services Restart ======
 +
 +After machine rebooting the automatic restart of SAM+JIM services\\
 +is enabled.
 +<​code>​
 +/​etc/​init.d/​sam ​      ..... start_samgrid.sh is called
 +...# chkconfig: 345 99 99
 +   # description:​ start SAM (D0) processes
 +
 +/​d0products/​server_home/​start_samgrid.sh ​
 +log-file ..... /​d0products/​server_home/​start_samgrid.log
 +</​code>​
 +
 +====== Chapitre 13 : SAMGrid-LCG:​ Open Ports on SAM station ======
 +
 +fal-pygrid-36.lancs.ac.uk\\
 +ccsvli02.in2p3.fr\\
 +\\
 +ccpn-core.current:​access-list 199 permit tcp host 194.80.35.35 host\\
 +134.158.105.56 range 4501 4505
 +=====  Towards the Forwarding Node  =====
 +
 +SAM : 4501,4505\\
 +griftp server : 4567\\
 +job manager : 61001,​61200\\
 +condor scheduler : 61501,​61700\\
 +tomcat : 7081
 +
  
  • sam_jim_installation.txt
  • Dernière modification: 2016/12/16 10:15
  • (modification externe)