Provided by: slurm-client_23.11.4-1.2ubuntu5_amd64 bug

NAME

       sdiag - Scheduling diagnostic tool for Slurm

SYNOPSIS

       sdiag

DESCRIPTION

       sdiag  shows  information  related  to  slurmctld  execution about: threads, agents, jobs, and scheduling
       algorithms. The goal is to obtain data from slurmctld behavior helping to adjust configuration parameters
       or queues policies. The main reason  behind  is  to  know  Slurm  behavior  under  systems  with  a  high
       throughput.

       It has two execution modes. The default mode --all shows several counters and statistics explained later,
       and there is another execution option --reset for resetting those values.

       Values are reset at midnight UTC time by default.

       The first block of information is related to global slurmctld execution:

       Server thread count
              The  number  of  current active slurmctld threads. A high number would mean a high load processing
              events like job submissions, jobs dispatching, jobs completing, etc. If this  is  often  close  to
              MAX_SERVER_THREADS it could point to a potential bottleneck.

       Agent queue size
              Slurm  design  has scalability in mind and sending messages to thousands of nodes is not a trivial
              task. The agent mechanism helps to control communication between slurmctld and the slurmd  daemons
              for  a  best effort. This value denotes the count of enqueued outgoing RPC requests in an internal
              retry list.

       Agent count
              Number of agent threads. Each of these agent threads can create in turn a  group  of  up  to  2  +
              AGENT_THREAD_COUNT active threads at a time.

       Agent thread count
              Total count of active threads created by all the agent threads.

       DBD Agent queue size
              Slurm queues up the messages intended for the SlurmDBD and processes them in a separate thread. If
              the SlurmDBD, or database, is down then this number will increase.

              The  max queue size is configured in the slurm.conf with MaxDBDMsgs. If this number begins to grow
              more than half of the max queue size,  the  slurmdbd  and  the  database  should  be  investigated
              immediately.

       Jobs submitted
              Number of jobs submitted since last reset

       Jobs started
              Number of jobs started since last reset. This includes backfilled jobs.

       Jobs completed
              Number of jobs completed since last reset.

       Jobs canceled
              Number of jobs canceled since last reset.

       Jobs failed
              Number of jobs failed due to slurmd or other internal issues since last reset.

       Job states ts:
              Lists the timestamp of when the following job state counts were gathered.

       Jobs pending:
              Number of jobs pending at the given time of the time stamp above.

       Jobs running:
              Number of jobs running at the given time of the time stamp above.

       Jobs running ts:
              Time stamp of when the running job count was taken.

       The  next  block  of  information  is  related  to  main scheduling algorithm based on jobs priorities. A
       scheduling cycle implies to get the job_write_lock lock, then trying to get resources for  jobs  pending,
       starting  from  the most priority one and going in descending order. Once a job can not get the resources
       the loop keeps going but just for jobs requesting other partitions. Jobs with  dependencies  or  affected
       by accounts limits are not processed.

       Last cycle
              Time in microseconds for last scheduling cycle.

       Max cycle
              Maximum time in microseconds for any scheduling cycle since last reset.

       Total cycles
              Total  run  time  in  microseconds  for  all  scheduling  cycles  since last reset.  Scheduling is
              performed periodically and (depending upon configuration) when a job is  submitted  or  a  job  is
              completed.

       Mean cycle
              Mean time in microseconds for all scheduling cycles since last reset.

       Mean depth cycle
              Mean of cycle depth. Depth means number of jobs processed in a scheduling cycle.

       Cycles per minute
              Counter of scheduling executions per minute.

       Last queue length
              Length of jobs pending queue.

       The  next  block of information is related to backfilling scheduling algorithm.  A backfilling scheduling
       cycle implies to get locks for jobs, nodes and partitions objects then trying to get resources  for  jobs
       pending.  Jobs are processed based on priorities. If a job can not get resources the algorithm calculates
       when it could get them obtaining a future start time for the job.  Then next job  is  processed  and  the
       algorithm  tries  to  get  resources  for that job but avoiding to affect the previous ones, and again it
       calculates the future start time if not current resources available. The backfilling algorithm takes more
       time for each new job to process since more priority jobs can not be affected. The algorithm itself takes
       measures for avoiding a long execution cycle and for taking all the locks for too long.

       Total backfilled jobs (since last slurm start)
              Number of jobs started thanks to backfilling since last slurm start.

       Total backfilled jobs (since last stats cycle start)
              Number of jobs started thanks to backfilling since last time stats where reset.  By default  these
              values are reset at midnight UTC time.

       Total backfilled heterogeneous job components
              Number of heterogeneous job components started thanks to backfilling since last Slurm start.

       Total cycles
              Number of backfill scheduling cycles since last reset

       Last cycle when
              Time  when  last  backfill  scheduling  cycle  happened  in  the  format  "weekday  Month MonthDay
              hour:minute.seconds year"

       Last cycle
              Time in microseconds of last backfill scheduling cycle.  It counts only execution  time,  removing
              sleep  time  inside  a  scheduling  cycle when it executes for an extended period time.  Note that
              locks are released during the sleep time so that other work can proceed.

       Max cycle
              Time in microseconds of maximum backfill scheduling cycle execution since last reset.   It  counts
              only  execution  time,  removing  sleep  time  inside  a  scheduling cycle when it executes for an
              extended period time.  Note that locks are released during the sleep time so that other  work  can
              proceed.

       Mean cycle
              Mean time in microseconds of backfilling scheduling cycles since last reset.

       Last depth cycle
              Number  of  processed  jobs  during last backfilling scheduling cycle. It counts every job even if
              that job can not be started due to dependencies or limits.

       Last depth cycle (try sched)
              Number of processed jobs during last backfilling scheduling cycle. It  counts  only  jobs  with  a
              chance to start using available resources. These jobs consume more scheduling time than jobs which
              are found can not be started due to dependencies or limits.

       Depth Mean
              Mean  count  of  jobs  processed  during all backfilling scheduling cycles since last reset.  Jobs
              which are found to be ineligible to run when examined by the backfill scheduler  are  not  counted
              (e.g.  jobs submitted to multiple partitions and already started, jobs which have reached a QOS or
              account limit such as maximum running jobs for an account, etc).

       Depth Mean (try sched)
              The subset of Depth Mean that the backfill scheduler attempted to schedule.

       Last queue length
              Number of jobs pending to be processed by backfilling algorithm.  A job is counted once  for  each
              partition  it is queued to use.  A pending job array will normally be counted as one job (tasks of
              a job array which have already been started/requeued or individually modified  will  already  have
              individual job records and are each counted as a separate job).

       Queue length Mean
              Mean  count  of  jobs pending to be processed by backfilling algorithm.  A job is counted once for
              each partition it requested.  A pending job array will normally be counted as one job (tasks of  a
              job  array  which  have  already  been started/requeued or individually modified will already have
              individual job records and are each counted as a separate job).

       Last table size
              Count of different time slots tested by the backfill scheduler in its last iteration.

       Mean table size
              Mean count of different time slots tested by the backfill scheduler.  Larger counts  increase  the
              time  required  for  the  backfill  operation.   The  table  size is influenced by many scheduling
              parameters, including: bf_min_age_reserve, bf_min_prio_reserve, bf_resolution, and bf_window.

       Latency for 1000 calls to gettimeofday()
              Latency of 1000 calls to the gettimeofday() syscall in microseconds,  as  measured  at  controller
              startup.

       The  next  blocks  of  information report the most frequently issued remote procedure calls (RPCs), calls
       made for the Slurmctld daemon to perform some action.  The  fourth  block  reports  the  RPCs  issued  by
       message  type.   You  will need to look up those RPC codes in the Slurm source code by looking them up in
       the file src/common/slurm_protocol_defs.h.  The report includes the number of times each RPC is  invoked,
       the  total time consumed by all of those RPCs plus the average time consumed by each RPC in microseconds.
       The fifth block reports the RPCs issued by user ID, the total number of RPCs they have issued, the  total
       time  consumed  by  all  of  those RPCs plus the average time consumed by each RPC in microseconds.  RPCs
       statistics are collected for the life of the slurmctld process unless explicitly --reset.

       The sixth block of information, labeled Pending RPC Statistics, shows information about pending  outgoing
       RPCs  on the slurmctld agent queue.  The first section of this block shows types of RPCs on the queue and
       the count of each. The second section shows up to the first 25  individual  RPCs  pending  on  the  agent
       queue,  including  the type and the destination host list.  This information is cached and only refreshed
       on 30 second intervals.

OPTIONS

       -a, --all
              Get and report information. This is the default mode of operation.

       -M, --cluster=<string>
              The cluster to issue commands to. Only one cluster name may be specified.  Note that the  SlurmDBD
              must be up for this option to work properly.

       -h, --help
              Print description of options and exit.

       --json, --json=list, --json=<data_parser>
              Dump  information  as  JSON  using  the  default  data_parser  plugin or explicit data_parser with
              parameters. Sorting and formatting arguments will be ignored.

       -r, --reset
              Reset scheduler and RPC counters to 0. Only supported for Slurm operators and administrators.

       -i, --sort-by-id
              Sort Remote Procedure Call (RPC) data by message type ID and user ID.

       -t, --sort-by-time
              Sort Remote Procedure Call (RPC) data by total run time.

       -T, --sort-by-time2
              Sort Remote Procedure Call (RPC) data by average run time.

       --usage
              Print list of options and exit.

       -V, --version
              Print current version number and exit.

       --yaml, --yaml=list, --yaml=<data_parser>
              Dump information as YAML using  the  default  data_parser  plugin  or  explicit  data_parser  with
              parameters. Sorting and formatting arguments will be ignored.

PERFORMANCE

       Executing  sdiag  sends  a  remote procedure call to slurmctld. If enough calls from sdiag or other Slurm
       client commands that send remote procedure calls to the slurmctld daemon come in at once, it  can  result
       in a degradation of performance of the slurmctld daemon, possibly resulting in a denial of service.

       Do  not run sdiag or other Slurm client commands that send remote procedure calls to slurmctld from loops
       in shell scripts or other programs. Ensure that programs limit calls to sdiag to  the  minimum  necessary
       for the information you are trying to gather.

ENVIRONMENT VARIABLES

       Some  sdiag  options  may be set via environment variables. These environment variables, along with their
       corresponding options, are listed  below.   (Note:  Command  line  options  will  always  override  these
       settings.)

       SLURM_CLUSTERS      Same as --cluster

       SLURM_CONF          The location of the Slurm configuration file.

COPYING

       Copyright (C) 2010-2011 Barcelona Supercomputing Center.
       Copyright (C) 2010-2022 SchedMD LLC.

       Slurm  is  free  software;  you  can  redistribute it and/or modify it under the terms of the GNU General
       Public License as published by the Free Software Foundation; either version 2 of the License, or (at your
       option) any later version.

       Slurm is distributed in the hope that it will be useful, but  WITHOUT  ANY  WARRANTY;  without  even  the
       implied  warranty  of  MERCHANTABILITY  or  FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
       License for more details.

SEE ALSO

       sinfo(1), squeue(1), scontrol(1), slurm.conf(5),

May 2023                                         Slurm Commands                                         sdiag(1)