Provided by: gridengine-common_8.1.9+dfsg-11build3_all bug

NAME

       sge_pe - Grid Engine parallel environment configuration file format

DESCRIPTION

       Parallel  environments  are  parallel  programming  and  runtime environments supporting the execution of
       shared memory or distributed memory parallelized applications. Parallel environments usually require some
       kind of setup to be operational before starting  parallel  applications.   Examples  of  common  parallel
       environments  are  OpenMP on shared memory multiprocessor systems, and Message Passing Interface (MPI) on
       shared memory or distributed systems.

       sge_pe allows for the definition of interfaces to  arbitrary  parallel  environments.   Once  a  parallel
       environment  is  defined  or modified with the -ap or -mp options to qconf(1) and linked with one or more
       queues via pe_list in queue_conf(5) the environment can be requested for a job  via  the  -pe  switch  to
       qsub(1)  together  with  a  request for a numeric range of parallel processes to be allocated by the job.
       Additional -l options may be used to specify more detailed job requirements.

       Note, Grid Engine allows backslashes (\) be used to escape newline  characters.  The  backslash  and  the
       newline are replaced with a space character before any interpretation.

FORMAT

       The format of a sge_pe file is defined as follows:

   pe_name
       The  name  of  the  parallel  environment  in  the format for pe_name in sge_types(1).  To be used in the
       qsub(1) -pe switch.

   slots
       The total number of slots (normally one per parallel process or thread) allowed to be filled concurrently
       under the parallel environment.  Type is integer, valid values are 0 to 9999999.

   user_lists
   xuser_lists
       A comma-separated list of user access list names (see access_list(5)).

       Each user contained in at  least  one  of  the  user_lists  access  lists  has  access  to  the  parallel
       environment.  If  the  user_lists  parameter  is  set  to  NONE  (the default) any user has access if not
       explicitly excluded via the xuser_lists parameter.

       Each user contained in at least one of the xuser_lists access lists is not allowed to access the parallel
       environment. If the xuser_lists parameter is set to NONE (the default) any user has access.

       If a user is contained both in an access list in xuser_lists and user_lists the user is denied access  to
       the parallel environment.

   start_proc_args
   stop_proc_args
       The  command  line  respectively of a startup or shutdown procedure (an executable command, plus possible
       arguments) for the parallel environment, or "none" for no procedure  (typically  for  tightly  integrated
       PEs).   The  command  line is started directly, not in a shell.  An optional prefix "user@" specifies the
       username under which the procedure is to be started.   In  that  case  see  the  SECURITY  section  below
       concerning security issues running as a privileged user.

       The  startup procedure is invoked by sge_shepherd(8) on the master node of the job prior to executing the
       job script. Its purpose is to setup the parallel  environment  according  to  its  needs.   The  shutdown
       procedure  is  invoked  by  sge_shepherd(8) after the job script has finished. Its purpose is to stop the
       parallel environment and to remove it from  all  participating  systems.   The  standard  output  of  the
       procedure  is  redirected  to  the  file REQUEST.poJID in the job's working directory (see qsub(1)), with
       REQUEST being the name of the job as displayed by  qstat(1),  and  JID  being  the  job's  identification
       number.  Likewise, the standard error output is redirected to REQUEST.peJID.  If the -e or -o options are
       given on job submission, the PE error and standard output is merged into the paths specified.

       The  following  special variables, expanded at runtime, can be used (besides any other strings which have
       to be interpreted by the start and stop procedures) to constitute a command line:

       $pe_hostfile
              The pathname of a file containing a detailed description of the layout of the parallel environment
              to be setup by the start-up procedure. Each line of the file refers to a host  on  which  parallel
              processes  are  to be run. The first entry of each line denotes the hostname, the second entry the
              number of parallel processes to be run on the host, the third entry the name of  the  queue.   The
              entries are separated by spaces.  If -binding pe is specified on job submission, the fourth column
              is  the  core  binding specification as colon-separated socket-core pairs, like "0,0:0,1", meaning
              the first core on the first socket and the second core  on  the  first  socket  can  be  used  for
              binding.   Otherwise it will be "UNDEFINED".  With the obsolete queue processors specification the
              fourth entry could be a multi-processor configuration (or "<NULL>").

       $host  The name of the host on which the startup or stop procedures are run.

       $ja_task_id
              The array job task index (0 if not an array job).

       $job_owner
              The user name of the job owner.

       $job_id
              Grid Engine's unique job identification number.

       $job_name
              The name of the job.

       $pe    The name of the parallel environment in use.

       $pe_slots
              Number of slots granted for the job.

       $processors
              The processors string as contained in the queue configuration (see queue_conf(5))  of  the  master
              queue (the queue in which the startup and stop procedures are run).

       $queue The cluster queue of the master queue instance.

       $sge_cell
              The SGE_CELL environment variable (useful for locating files).

       $sge_root
              The SGE_ROOT environment variable (useful for locating files).

       $stdin_path
              The standard input path.

       $stderr_path
              The standard error path.

       $stdout_path
              The standard output path.

       $merge_stderr

       $fs_stdin_host

       $fs_stdin_path

       $fs_stdin_tmp_path

       $fs_stdin_file_staging

       $fs_stdout_host

       $fs_stdout_path

       $fs_stdout_tmp_path

       $fs_stdout_file_staging

       $fs_stderr_host

       $fs_stderr_path

       $fs_stderr_tmp_path

       $fs_stderr_file_staging

       The  start  and  stop commands are run with the same environment setting as that of the job to be started
       afterwards (see qsub(1)).

   allocation_rule
       The allocation rule is interpreted by the scheduler thread and helps  the  scheduler  to  decide  how  to
       distribute  parallel  processes among the available machines. If, for instance, a parallel environment is
       built for shared memory applications only, all parallel  processes  have  to  be  assigned  to  a  single
       machine,  no  matter  how  many  suitable  machines are available.  If, however, the parallel environment
       follows the distributed memory paradigm,  an  even  distribution  of  processes  among  machines  may  be
       favorable, as may packing processes onto the minimum number of machines.

       The current version of the scheduler only understands the following allocation rules:

       int    An  integer,  fixing the number of processes per host. If it is 1, all processes have to reside on
              different hosts. If the special name $pe_slots is used, the full range of processes  as  specified
              with  the  qsub(1) -pe switch has to be allocated on a single host (no matter what value belonging
              to the range is finally chosen for the job to be allocated).

       $fill_up
              Starting from the best suitable host/queue, all available slots are allocated. Further  hosts  and
              queues are "filled up" as long as a job still requires slots for parallel tasks.

       $round_robin
              From  all suitable hosts, a single slot is allocated until all tasks requested by the parallel job
              are dispatched. If more tasks are requested than suitable hosts are found, allocation starts again
              from the first host.  The allocation scheme walks through suitable hosts in a  most-suitable-first
              order.

   control_slaves
       This parameter can be set to TRUE or FALSE (the default). It indicates whether Grid Engine is the creator
       of  the  slave  tasks  of  a  parallel application via sge_execd(8) and sge_shepherd(8) and thus has full
       control over all processes in a parallel application  ("tight integration").  This enables:

       •      resource limits are enforced for all tasks, even on slave hosts;

       •      resource consumption is properly accounted on all hosts;

       •      proper control of tasks, with no need to write a customized terminate method to ensure that  whole
              job  is  finished  on  qdel  and  that  tasks  are  properly  reaped  in  the case of abnormal job
              termination;

       •      all tasks are started with the appropriate nice value which was  configured  as  priority  in  the
              queue configuration;

       •      propagation  of  the  job environment to slave hosts, e.g. so that they write into the appropriate
              per-job temporary directory specified by TMPDIR, which  is  created  on  each  host  and  properly
              cleaned up.

       To gain control over the slave tasks of a parallel application, a sophisticated PE interface is required,
       which works closely together with Grid Engine facilities, typically interpreting the Grid Engine hostfile
       and  starting  remote  tasks  with qrsh(1) and its -inherit option.  See, for instance, the $SGE_ROOT/mpi
       directory and the howto pages ⟨http://arc.liv.ac.uk/SGE/howto/
       #Tight%20Integration%20of%20Parallel%20Libraries⟩.

       Please set the control_slaves parameter to false for all other PE interfaces.

   job_is_first_task
       The job_is_first_task parameter can be set to TRUE or FALSE. A value of  TRUE  indicates  that  the  Grid
       Engine  job script already contains one of the tasks of the parallel application (and the number of slots
       reserved for the job is the number of slots requested with the -pe switch).  FALSE indicates that the job
       script (and its child processes) is not part of the parallel program, just being used  to  kick  off  the
       tasks that do the work; then the number of slots reserved for the job in the master queue is increased by
       1, as indicated by qstat/qhost.

       This  should  be  TRUE for the common modern MPI implementations with tight integration.  Consider if the
       allocation rule is $fill_up, and a job is allocated only a single slot on the master host;  then  one  of
       the  MPI  processes  actually runs in that slot, and should be accounted as such, so the job is the first
       task.

       If wallclock accounting is used (execd_params ACCT_RESERVED_USAGE
        and/or SHARETREE_RESERVED_USAGE Is TRUE) and control_slaves  is  set  to  FALSE,  the  job_is_first_task
       parameter  influences  the  accounting  for  the  job:  A value of TRUE means that accounting for CPU and
       requested memory gets multiplied by the number of slots requested with the -pe switch.  FALSE  means  the
       accounting information gets multiplied by number of slots + 1.  Otherwise, the only significant effect of
       the parameter is on the display of the job.

   urgency_slots
       For  pending  jobs  with  a slot range PE request with different minimum and maximum, the number of slots
       they will actually use is not determined. This setting specifies the method to be used by Grid Engine  to
       assess the number of slots such jobs might finally get.

       The  assumed  slot  allocation  has  a  meaning  when  determining  the  resource-request-based  priority
       contribution for numeric resources as described in sge_priority(5) and is displayed when qstat(1) is  run
       without -g t option.

       The following methods are supported:

       int    The specified integer number is directly used as prospective slot amount.

       min    The slot range minimum is used as prospective slot amount. If no lower bound is specified with the
              range, 1 is assumed.

       max    The  slot  range  maximum is used as prospective slot amount.  If no upper bound is specified with
              the range, the absolute maximum possible due to the PE's slots setting is assumed.

       avg    The average of all numbers occurring within the job's PE range request is assumed.

   accounting_summary
       This parameter is only checked if control_slaves (see above) is set to TRUE and thus Grid Engine  is  the
       creator of the slave tasks of a parallel application via sge_execd(8) and sge_shepherd(8).  In this case,
       accounting information is available for every single slave task started by Grid Engine.

       The  accounting_summary  parameter  can  be  set  to TRUE or FALSE. A value of TRUE indicates that only a
       single accounting record is written to the accounting(5) file, containing the accounting summary  of  the
       whole job, including all slave tasks, while a value of FALSE indicates an individual accounting(5) record
       is written for every slave task, as well as for the master task.

       Note:  When  running  tightly  integrated  jobs with SHARETREE_RESERVED_USAGE set, and accounting_summary
       enabled in the parallel environment, reserved usage will only be reported  by  the  master  task  of  the
       parallel  job.   No  per-parallel  task  usage  records  will  be  sent  from execd to qmaster, which can
       significantly reduce load on the qmaster when running large, tightly integrated parallel jobs.   However,
       this removes the only post-hoc information about which hosts a job used.

   qsort_args library qsort-function [arg1 ...]
       Specifies  a  method for specifying the queues/hosts and order that should be used to schedule a parallel
       job.  For details, and the API, consult the header file $SGE_ROOT/include/sge_pqs_api.h.  library is  the
       path  to  the  qsort dynamic library, qsort-function is the name of the qsort function implemented by the
       library, and the args are arguments passed to qsort.  Substitutions from the hard requested resource list
       for the job are made for any strings of the form $resource, where  resource  is  the  full  name  of  the
       resource  as  defined  in the complex(5) list.  If resource is not requested in the job, a null string is
       substituted.

RESTRICTIONS

       Note that the functionality of the start and stop procedures  remains  the  full  responsibility  of  the
       administrator  configuring  the  parallel  environment.   Grid  Engine  will  invoke these procedures and
       evaluate their exit status.  A non-zero exit status will put the queue into an error state.  If the start
       procedure has a non-zero exit status, the job will be re-queued.

SECURITY

       If start_proc_args, or stop_proc_args is specified with a user@ prefix, the same considerations apply  as
       for the prolog and epilog, as described in the SECURITY section of sge_conf(5).

SEE ALSO

       sge_intro(1),  sge__types(1),  qconf(1), qdel(1), qmod(1), qrsh(1), qsub(1), access_list(5), sge_conf(5),
       sge_qmaster(8), sge_shepherd(8).

FILES

       $SGE_ROOT/include/sge_pqs_api.h

COPYRIGHT

       See sge_intro(1) for a full statement of rights and permissions.

SGE 8.1.3pre                                       2012-09-11                                          SGE_PE(5)