Provided by: oar-node_2.6.1-1_amd64 bug

NAME

       oarnodecheck - OAR node health check mechanism

SYNOPSIS

       oarnodecheckrun

       oarnodecheckquery

       oarnodechecklist

DESCRIPTION

       oarnodecheck is composed of 3 commands:

       oarnodecheckrun
           oarnodecheckrun  must  be run as root by cron or a systemd timer (on an hourly basis for instance) to
           execute all check scripts in the //etc/oar/check.d/ directory.

           The %%OARCONFDIR/check.d/ directory contains admin defined scripts, which perform checks with  regard
           to possible node health problems.

           If  and  only  if a problem is detected, a check-log file is to be created by the check script in the
           check-log directory. The check script must use the CHECKLOGFILE environment variable,  which  provide
           the pathname to the check-log to eventually create.

           If  the OAR cpuset mechanism is enabled, oarnodecheckrun does not launch checks when jobs are running
           on the node. A stamp file is created or updated when the scripts are actually run.

       oarnodecheckquery
           oarnodecheckquery is meant to be called by the OAR ping checker, to report the node health status.

           It can be configured so in the /etc/oar/oar.conf file of the OAR server:

               PINGCHECKER_TAKTUK_ARG_COMMAND="-t 3 broadcast exec [ /usr/bin/oarnodecheckquery ]

           The OAR node health status is reported bad as soon as  a  check-log  file  exists  in  the  check-log
           directory: /var/lib/oar/checklogs/.

           oarnodecheckquery  checks  for the existence and modification date of the oarnodecheckrun stamp file.
           If non-existent or older than one hour, oarnodecheckrun  is  run.  Then,  finally,  oarnodecheckquery
           reports an error if any check-log exists in the check-log directory.

           Since  oarnodecheckquery  may  run  the  check  scripts,  the  OAR ping checker timeout must be tuned
           accordingly in the OAR server configuration.

       oarnodechecklist
           oarnodechecklist lists the current recorded check-logs.

EXAMPLE OF CHECK SCRIPT

       The following is an example of check script to place in the check scripts directory: /etc/oar/check.d

           #!/bin/bash
           ###############################################################################
           # Perform a check and report to CHECKLOGFILE
           # WARNING:
           # The CHECKLOGFILE file must not be created unless the check really unveiled
           # a problem.

           # Print to stderr if CHECKLOGFILE is not defined yet (e.g. as the script is
           # not called from oarnodecheckcron the CHECKLOGFILE environment variable is
           # not defined)
           [ -n "$CHECKLOGFILE" ] || CHECKLOGFILE=/dev/stderr

           ###############################################################################
           # YOUR CHECK SCRIPT GOES BELOW

           # Example of check
           [ -d /var/lib/oar ] || echo "OAR runtime directory (/var/lib/oar) does not exist)" > $CHECKLOGFILE

COPYRIGHTS

        Copyright 2003-2025 Laboratoire d'Informatique de Grenoble (http://www.liglab.fr). This software is licensed under the GNU General Public License Version 2 or above. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

oarnodecheckrun                                    2025-03-24                                 oarnodecheckrun(8)