Provided by: slurm-client_23.11.4-1.2ubuntu5_amd64 bug

NAME

       gres.conf - Slurm configuration file for Generic RESource (GRES) management.

DESCRIPTION

       gres.conf  is  an  ASCII  file  which  describes  the configuration of Generic RESource(s) (GRES) on each
       compute node.  If the GRES information in the slurm.conf file does not fully  describe  those  resources,
       then  a  gres.conf  file  should be included on each compute node. For cloud nodes, a gres.conf file that
       includes all the cloud nodes must be on all cloud nodes and the  controller.  The  file  will  always  be
       located in the same directory as slurm.conf.

       If  the  GRES information in the slurm.conf file fully describes those resources (i.e. no "Cores", "File"
       or "Links" specification is required for that GRES type or that information is  automatically  detected),
       that  information  may  be  omitted from the gres.conf file and only the configuration information in the
       slurm.conf file will be used.  The  gres.conf  file  may  be  omitted  completely  if  the  configuration
       information in the slurm.conf file fully describes all GRES.

       If using the gres.conf file to describe the resources available to nodes, the first parameter on the line
       should be NodeName. If configuring Generic Resources without specifying nodes, the first parameter on the
       line should be Name.

       Parameter names are case insensitive.  Any text following a "#" in the configuration file is treated as a
       comment  through  the  end  of  that line.  Changes to the configuration file take effect upon restart of
       Slurm daemons, daemon receipt of the SIGHUP signal, or execution of the  command  "scontrol  reconfigure"
       unless otherwise noted.

       NOTE:  Slurm  support  for  gres/[mps|shard]  requires  the  use of the select/cons_tres plugin. For more
       information on how to configure MPS, see  https://slurm.schedmd.com/gres.html#MPS_Management.   For  more
       information on how to configure Sharding, see https://slurm.schedmd.com/gres.html#Sharding.

       For more information on GRES scheduling in general, see https://slurm.schedmd.com/gres.html.

       The overall configuration parameters available include:

       AutoDetect
              The  hardware  detection  mechanisms  to  enable for automatic GRES configuration.  Currently, the
              options are:

              nrt    Automatically detect AWS Trainium/Inferentia devices.

              nvml   Automatically detect NVIDIA GPUs. Requires the NVIDIA Management Library (NVML).

              off    Do not automatically detect any GPUs. Used to override other options.

              oneapi Automatically detect Intel GPUs. Requires the Intel Graphics  Compute  Runtime  for  oneAPI
                     Level Zero and OpenCL Driver (oneapi).

              rsmi   Automatically  detect  AMD  GPUs.  Requires the ROCm System Management Interface (ROCm SMI)
                     Library.

              AutoDetect can be on a line by itself, in which case it  will  globally  apply  to  all  lines  in
              gres.conf  by  default.  In  addition,  AutoDetect  can be combined with NodeName to only apply to
              certain nodes. Node-specific  AutoDetects  will  trump  the  global  AutoDetect.  A  node-specific
              AutoDetect  only  needs  to  be  specified once per node. If specified multiple times for the same
              nodes, they must all be the same value. To unset AutoDetect for a node when a global AutoDetect is
              set, simply set it to "off" in a node-specific  GRES  line.   E.g.:  NodeName=tux3  AutoDetect=off
              Name=gpu File=/dev/nvidia[0-3].  AutoDetect cannot be used with cloud nodes.

              AutoDetect  will  automatically detect files, cores, links, and any other hardware. If a parameter
              such as File, Cores, or Links are specified when AutoDetect is used, then the specified values are
              used to sanity check the auto detected values. If there is a mismatch, then the  node's  state  is
              set to invalid and the node is drained.

       Count  Number  of  resources  of  this name/type available on this node.  The default value is set to the
              number of File values specified (if any), otherwise the default value is one.  A  suffix  of  "K",
              "M",  "G",  "T"  or  "P"  may  be  used  to multiply the number by 1024, 1048576, 1073741824, etc.
              respectively.  For example: "Count=10G".

       Cores  Optionally specify the core index numbers for the specific cores which can use this resource.  For
              example, it may be strongly preferable to use specific cores with specific GRES devices (e.g. on a
              NUMA architecture).  While Slurm can track and assign resources at the CPU or  thread  level,  its
              scheduling  algorithms  used  to co-allocate GRES devices with CPUs operates at a socket level (or
              NUMA level with numa_node_as_socket) for  job  allocations.   Therefore  it  is  not  possible  to
              preferentially  assign  GRES  with  different  specific  CPUs  on  the  same  socket (or NUMA with
              numa_node_as_socket) and this option should generally be  used  to  identify  all  cores  on  some
              socket.  Though,  job  step allncations that request a portion of the job's resources with --exact
              and task binding through --gpu-d will both look at cores directly for  which  more  specific  core
              identification may be useful.

              Multiple  cores  may be specified using a comma-delimited list or a range may be specified using a
              "-" separator (e.g. "0,1,2,3" or "0-3").  If a job  specifies  --gres-flags=enforce-binding,  then
              only  the  identified cores can be allocated with each generic resource. This will tend to improve
              performance of jobs, but delay the allocation of resources to them.  If specified and a job is not
              submitted with the --gres-flags=enforce-binding option the identified cores will be preferred  for
              scheduling with each generic resource.

              If  --gres-flags=disable-binding is specified, then any core can be used with the resources, which
              also increases the  speed  of  Slurm's  scheduling  algorithm  but  can  degrade  the  application
              performance.   The --gres-flags=disable-binding option is currently required to use more CPUs than
              are bound to a GRES (e.g. if a GPU is bound to the CPUs on one socket, but resources on more  than
              one  socket are required to run the job).  If any core can be effectively used with the resources,
              then do not specify the cores option for improved speed in the Slurm scheduling logic.  A  restart
              of the slurmctld is needed for changes to the Cores option to take effect.

              NOTE:  Since  Slurm  must  be able to perform resource management on heterogeneous clusters having
              various processing unit numbering schemes, a logical core index must be specified instead  of  the
              physical  core  index.   That  logical core index might not correspond to your physical core index
              number.  Core 0 will be the first core on the first socket, while core 1 will be the  second  core
              on  the  first  socket.   This  numbering coincides with the logical core number (Core L#) seen in
              "lstopo -l" command output.

       File   Fully qualified pathname of the device files associated with a resource.  The name can  include  a
              numeric range suffix to be interpreted by Slurm (e.g. File=/dev/nvidia[0-3]).

              This field is generally required if enforcement of generic resource allocations is to be supported
              (i.e.  prevents users from making use of resources allocated to a different user).  Enforcement of
              the file allocation relies upon Linux Control Groups (cgroups)  and  Slurm's  task/cgroup  plugin,
              which will place the allocated files into the job's cgroup and prevent use of other files.  Please
              see Slurm's Cgroups Guide for more information: https://slurm.schedmd.com/cgroups.html.

              If  File  is  specified then Count must be either set to the number of file names specified or not
              set (the default value is the number of files specified).  The exception to this is  MPS/Sharding.
              For either of these GRES, each GPU would be identified by device file using the File parameter and
              Count  would  specify  the number of entries that would correspond to that GPU. For MPS, typically
              100 or some multiple of 100. For  Sharding  typically  the  maximum  number  of  jobs  that  could
              simultaneously share that GPU.

              If  using  a  card  with  Multi-Instance  GPU  functionality,  use MultipleFiles instead. File and
              MultipleFiles are mutually exclusive.

              NOTE: File is required for all gpu typed GRES.

              NOTE: If you specify the File parameter for a resource on some node, the option must be  specified
              on all nodes and Slurm will track the assignment of each specific resource on each node. Otherwise
              Slurm  will  only  track  a  count of allocated resources rather than the state of each individual
              device file.

              NOTE: Drain a node before changing the count of records with File parameters (e.g. if you want  to
              add  or  remove  GPUs from a node's configuration).  Failure to do so will result in any job using
              those GRES being aborted.

              NOTE: When specifying File, Count is limited in size (currently 1024) for each node.

       Flags  Optional flags that can be specified to change configured behavior of the GRES.

              Allowed values at present are:

              CountOnly           Do not attempt to load a plugin of the GRES type as this  GRES  will  only  be
                                  used to track counts of GRES used. This avoids attempting to load non-existent
                                  plugin  which can affect filesystems with high latency metadata operations for
                                  non-existent files.

              explicit            If the flag is set, GRES is not allocated to the job as  part  of  whole  node
                                  allocation (--exclusive or OverSubscribe=EXCLUSIVE set on partition) unless it
                                  was explicitly requested by the job.

              one_sharing         To  be used on a shared gres. If using a shared gres (mps) on top of a sharing
                                  gres (gpu) only allow one of the sharing gres to be used by the  shared  gres.
                                  This is the default for MPS.

                                  NOTE: If a gres has this flag configured it is global, so all other nodes with
                                  that  gres  will  have  this  flag  implied.  This flag is not compatible with
                                  all_sharing for a specific gres.

              all_sharing         To be used on a shared gres. This is the opposite of one_sharing  and  can  be
                                  used  to  allow  all  sharing  gres (gpu) on a node to be used for shared gres
                                  (mps).

                                  NOTE: If a gres has this flag configured it is global, so all other nodes with
                                  that gres will have this flag implied.   This  flag  is  not  compatible  with
                                  one_sharing for a specific gres.

              nvidia_gpu_env      Set  environment  variable  CUDA_VISIBLE_DEVICES for all GPUs on the specified
                                  node(s).

              amd_gpu_env         Set environment variable ROCR_VISIBLE_DEVICES for all GPUs  on  the  specified
                                  node(s).

              intel_gpu_env       Set  environment  variable  ZE_AFFINITY_MASK  for  all  GPUs  on the specified
                                  node(s).

              opencl_env          Set environment variable GPU_DEVICE_ORDINAL for  all  GPUs  on  the  specified
                                  node(s).

              no_gpu_env          Set  no  GPU-specific environment variables. This is mutually exclusive to all
                                  other environment-related flags.

              If no environment-related flags are specified, then  nvidia_gpu_env,  amd_gpu_env,  intel_gpu_env,
              and  opencl_env  will be implicitly set by default.  If AutoDetect is used and environment-related
              flags are not specified, then AutoDetect=nvml will set nvidia_gpu_env,  AutoDetect=rsmi  will  set
              amd_gpu_env,    and    AutoDetect=oneapi    will   set   intel_gpu_env.    Conversely,   specified
              environment-related flags will always override AutoDetect.

              Environment-related flags set on one GRES line will be inherited by the GRES line  directly  below
              it if no environment-related flags are specified on that line and if it is of the same node, name,
              and type. Environment-related flags must be the same for GRES of the same node, name, and type.

              Note that there is a known issue with the AMD ROCm runtime where ROCR_VISIBLE_DEVICES is processed
              first,  and  then  CUDA_VISIBLE_DEVICES  is  processed.  To  avoid  the issues caused by this, set
              Flags=amd_gpu_env for AMD GPUs so only ROCR_VISIBLE_DEVICES is set.

       Links  A comma-delimited list of numbers identifying the number of connections between  this  device  and
              other devices to allow coscheduling of better connected devices.  This is an ordered list in which
              the  number  of  connections  this  specific  device  has to device number 0 would be in the first
              position, the number of connections it has to device number 1 in the second position, etc.   A  -1
              indicates  the  device  itself  and a 0 indicates no connection.  If specified, then this line can
              only contain a single GRES device (i.e. can only contain a single file via File).

              This is an optional value and is usually automatically determined if  AutoDetect  is  enabled.   A
              typical  use  case  would be to identify GPUs having NVLink connectivity.  Note that for GPUs, the
              minor number assigned by the OS and used in the device file (i.e. the X in  /dev/nvidiaX)  is  not
              necessarily  the same as the device number/index. The device number is created by sorting the GPUs
              by  PCI  bus  ID  and  then  numbering  them   starting   from   the   smallest   bus   ID.    See
              https://slurm.schedmd.com/gres.html#GPU_Management

       MultipleFiles
              Fully  qualified  pathname  of  the device files associated with a resource.  Graphics cards using
              Multi-Instance GPU (MIG) technology will present multiple device files that should be managed as a
              single generic resource. The file names can be a comma separated list or it can include a  numeric
              range suffix (e.g. MultipleFiles=/dev/nvidia[0-3]).

              Drain  a  node before changing the count of records with the MultipleFiles parameter, such as when
              adding or removing GPUs from a node's configuration.  Failure to do so  will  result  in  any  job
              using those GRES being aborted.

              When not using GPUs with MIG functionality, use File instead.  MultipleFiles and File are mutually
              exclusive.

       Name   Name  of  the  generic  resource.  Any  desired  name may be used.  The name must match a value in
              GresTypes in slurm.conf.   Each  generic  resource  has  an  optional  plugin  which  can  provide
              resource-specific functionality.  Generic resources that currently include an optional plugin are:

              gpu    Graphics Processing Unit

              mps    CUDA Multi-Process Service (MPS)

              nic    Network Interface Card

              shard  Shards of a gpu

       NodeName
              An  optional  NodeName  specification  can be used to permit one gres.conf file to be used for all
              compute nodes in a cluster by specifying the node(s) that each line should apply to.  The NodeName
              specification can use a Slurm hostlist specification as shown in the example below.

       Type   An optional arbitrary string identifying the type of generic resource.  For example, this might be
              used to identify a specific model of GPU, which users can then specify  in  a  job  request.   For
              changes  to  the  Type  option to take effect with a scontrol reconfig all affected slurmd daemons
              must be responding to the slurmctld.  Otherwise a restart of the slurmctld and slurmd  daemons  is
              required.

              NOTE:  If  using  autodetect  functionality and defining the Type in your gres.conf file, the Type
              specified should match or be a substring of the value that is detected,  using  an  underscore  in
              lieu of any spaces.

EXAMPLES

       ##################################################################
       # Slurm's Generic Resource (GRES) configuration file
       # Define GPU devices with MPS support, with AutoDetect sanity checking
       ##################################################################
       AutoDetect=nvml
       Name=gpu Type=gtx560 File=/dev/nvidia0 COREs=0,1
       Name=gpu Type=tesla  File=/dev/nvidia1 COREs=2,3
       Name=mps Count=100 File=/dev/nvidia0 COREs=0,1
       Name=mps Count=100  File=/dev/nvidia1 COREs=2,3

       ##################################################################
       # Slurm's Generic Resource (GRES) configuration file
       # Overwrite system defaults and explicitly configure three GPUs
       ##################################################################
       Name=gpu Type=tesla File=/dev/nvidia[0-1] COREs=0,1
       # Name=gpu Type=tesla  File=/dev/nvidia[2-3] COREs=2,3
       # NOTE: nvidia2 device is out of service
       Name=gpu Type=tesla  File=/dev/nvidia3 COREs=2,3

       ##################################################################
       # Slurm's Generic Resource (GRES) configuration file
       # Use a single gres.conf file for all compute nodes - positive method
       ##################################################################
       ## Explicitly specify devices on nodes tux0-tux15
       # NodeName=tux[0-15]  Name=gpu File=/dev/nvidia[0-3]
       # NOTE: tux3 nvidia1 device is out of service
       NodeName=tux[0-2]  Name=gpu File=/dev/nvidia[0-3]
       NodeName=tux3  Name=gpu File=/dev/nvidia[0,2-3]
       NodeName=tux[4-15]  Name=gpu File=/dev/nvidia[0-3]

       ##################################################################
       # Slurm's Generic Resource (GRES) configuration file
       # Use NVML to gather GPU configuration information
       # for all nodes except one
       ##################################################################
       AutoDetect=nvml
       NodeName=tux3 AutoDetect=off Name=gpu File=/dev/nvidia[0-3]

       ##################################################################
       # Slurm's Generic Resource (GRES) configuration file
       # Specify some nodes with NVML, some with RSMI, and some with no AutoDetect
       ##################################################################
       NodeName=tux[0-7] AutoDetect=nvml
       NodeName=tux[8-11] AutoDetect=rsmi
       NodeName=tux[12-15] Name=gpu File=/dev/nvidia[0-3]

       ##################################################################
       # Slurm's Generic Resource (GRES) configuration file
       # Define 'bandwidth' GRES to use as a way to limit the
       # resource use on these nodes for workflow purposes
       ##################################################################
       NodeName=tux[0-7] Name=bandwidth Type=lustre Count=4G Flags=CountOnly

COPYING

       Copyright  (C) 2010 The Regents of the University of California.  Produced at Lawrence Livermore National
       Laboratory (cf, DISCLAIMER).
       Copyright (C) 2010-2022 SchedMD LLC.

       This   file   is   part   of   Slurm,   a   resource    management    program.     For    details,    see
       <https://slurm.schedmd.com/>.

       Slurm  is  free  software;  you  can  redistribute it and/or modify it under the terms of the GNU General
       Public License as published by the Free Software Foundation; either version 2 of the License, or (at your
       option) any later version.

       Slurm is distributed in the hope that it will be useful, but  WITHOUT  ANY  WARRANTY;  without  even  the
       implied  warranty  of  MERCHANTABILITY  or  FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
       License for more details.

SEE ALSO

       slurm.conf(5)

August 2023                                 Slurm Configuration File                                gres.conf(5)