Provided by: lam-runtime_7.1.4-7.1build2_amd64 bug

NAME

       lamboot - Start a LAM multicomputer.

SYNOPSIS


       lamboot [-b] [-d] [-h] [-H] [-l] [-s] [-v] [-V] [-x] [-nn] [-np] [-c conf file] [-prefix
              /lam/install/path/] [-sessionprefix value] [-sessionsuffix value] [-withlamprefixpath value] [-ssi
              key value] [bhost]

OPTIONS

       -b      Assume  local and remote shell are the same.  This means that only one remote shell invocation is
               used to each node.  If -b is not used, two remote shell invocations are used to each node.

       -d      Turn on debugging output.  This implies -v.

       -h      Print the command help menu.

       -l      Delay hostname-to-IP-address resolution.

       -prefix Use the LAM installation specified in /lam/install/path/.  Not compatible with  LAM/MPI  versions
               prior to 7.1.

       -s      Close stdio on the local node.

       -ssi key value
               Send arguments to various SSI modules.  See the "SSI" section, below.

       -v      Be verbose.

       -x      Run in fault tolerant mode.

       -H      Do not display the command header.

       -nn     Don't add "-n" to the remote agent command line

       -np     Do not force the execution of $HOME/.profile on remote hosts

       -session-prefix value
               Set the session prefix, overriding LAM_MPI_SESSION_PREFIX.

       -session-suffix value
               Set the session suffix, overriding LAM_MPI_SESSION_SUFFIX.

       -withlamprefixpath value
               Override  the internal installation path.  For internal use only, do not use unless you know what
               you are doing.

ENVIRONMENT VARIABLES

       LAM_MPI_SESSION_PREFIX

       LAM_MPI_SESSION_SUFFIX
                 It is possible to change the session directory used by LAM/MPI, normally of the form:

       tmpdir/lam-username@hostname[-suffix]

       tmpdir    will be set to LAM_MPI_SESSION_PREFIX if set.  Otherwise, it will fall back to the value of TM‐
                 PDIR.  If neither of these are set, the default is /tmp.

       suffix    can be overridden by the LAM_MPI_SESSION_SUFFIX environment variable.  If  LAM_MPI_SESSION_SUF‐
                 FIX  is not set and LAM is running under a supported batch scheduling system, $suffix will be a
                 value unique to the currently running job.

DESCRIPTION

       The lamboot tool starts the LAM software on each of the machines specified in  the  boot  schema,  bhost.
       The  boot schema specifies the hostnames of nodes to be used in the run-time MPI environment, and option‐
       ally lists how may CPUs LAM may used on each node.  The user may wish to first run the recon(1)  tool  to
       verify that LAM can be started.

       Starting  LAM is a three step procedure.  In the first step, hboot(1) is invoked on each of the specified
       machines.  Then each machine allocates a dynamic port and communicates it back to lamboot which  collects
       them.   In the third step, lamboot gives each machine the list of machines/ports in order to form a fully
       connected topology.  If any machine was not able to start, or if a  timeout  period  expires  before  the
       first step completes, lamboot invokes lamwipe(1) to terminate LAM and reports the error.

       The  bhost file is a LAM boot schema written in the host file syntax.  See bhost(5).  Instead of the com‐
       mand line, a boot schema can be specified in the LAMBHOST  environment  variable.   Otherwise  a  default
       file,  lam-bhost.def,  is  used.  LAM searches for bhost first in the local directory and then in the in‐
       stallation directory under etc/.

       In addition, lamboot uses a process schema for the individual LAM nodes.  A process schema (see  conf(5))
       is  a description of the processes which constitute the operating system on a node.  In general, the sys‐
       tem administrator maintains this file -- LAM/MPI users will generally not need to change this  file.   It
       is also possible for the user to customize the LAM software with a private process schema.

   The bhost file
       The format of the bhost file is documented in the bhost(5) man page.

       lamboot  will resolve all names in bhost on the node in which lamboot was invoked (the origin node).  Af‐
       ter that, LAM will only use IP addresses, not names.  Specifically, the name resolution configuration  on
       all  other  nodes is not used.  Hence, the the origin node must be able to resolve all the names in bhost
       to addresses that are reachable by all other nodes.

       A common mistake is to list localhost (or any name that resolves to the special address 127.0.0.1 --  the
       loopback  TCP/IP  device) in a bhost file that contains other nodes.  In this case, the address 127.0.0.1
       would be sent to each of the other nodes as the address of the origin node.  If the other  nodes  try  to
       use  127.0.0.1 to contact the origin node, they will actually be contacting themselves, and would eventu‐
       ally timeout and fail.

       The IP addresses obtained from bhost are used for LAM's meta messages: startup and shutdown of jobs, out-
       of-band messages used for coordination, etc.  The amount of traffic  is  fairly  low  (unless  using  the
       "lamd"  mode  of MPI message passing, in which case all MPI traffic will also utilize LAM's meta messages
       for transport -- see mpirun(1)).  When using the TCP RPI, these IP addresses are also used for  MPI  mes‐
       sage passing via direct sockets between each pair of nodes.

       A  common  case  is where a "master" node has multiple network interface cards (NICs) -- one that is con‐
       nected to a public network, and one that is connected to a private network where parallel jobs are to  be
       run.  To include the master node in a bhost file, the IP name (or address) of the NIC on the private net‐
       work  should  be listed in bhost.  This ensures that all the other nodes can reach the master node on the
       private network.

       As another example, some configurations have multiple TCP/IP NICs in each node of a  parallel  job.   One
       NIC is considered "slow" (e.g., 10Mbps), while the other is considered "fast" (e.g., 100Mbps).  It is de‐
       sirable  to  allow  LAM to take advantage of the higher bandwidth on the "fast" network for MPI messages.
       As such, bhost should list the IP names (or addresses) of all the "fast" NICs.  However, if the  LAM  RPI
       does  not  use  TCP/IP (e.g., the Myrinet/GM RPI), the bhost file should probably list the "slow" NICs so
       that LAM's meta message traffic does not cause overhead and potentially detract from performance  on  the
       "fast" network from other high-performance applications.

   Delaying hostname lookups
       Normally,  name  resolution  of hostnames is done on the machines where lamboot is invoked.  This is done
       for optimization reasons, so that the list of hostnames only needs to be resolved once (potentially mini‐
       mizing the amount of DNS or other hostname-lookup network traffic).

       However, in some non-uniform networking environments, this is not sufficient because each host may have a
       different IP address on each of its peers.  For example, host A may have address Z on host  B,  but  have
       address Y on host C.

       The -l option to lamboot will cause LAM to distribute hostnames to each node rather than a fully resolved
       set of IP addresses.  Hence, each node where LAM is booted will do its own name resolution on the list of
       hostnames.

   SSI (System Services Interface)
       The -ssi switch allows the passing of parameters to various SSI modules.  LAM's SSI modules are described
       in  detail in lamssi(7).  SSI modules have direct impact on MPI programs because they allow tunable para‐
       meters to be set at run time (such as which boot device driver to use, what parameters to  pass  to  that
       driver, etc.).

       The  -ssi switch takes two arguments: key and value.  The key argument generally specifies which SSI mod‐
       ule will receive the value.  For example, the key "boot" is used to select  which  RPI  to  be  used  for
       starting processes on remote nodes.  The value argument is the value that is passed.  For example:

       lamboot -ssi boot tm
           Tells  LAM  to use the "tm" boot module for native launching in PBSPro / OpenPBS environments (the tm
           boot module does not require a boot schema).

       lamboot -ssi boot rsh -ssi rsh_agent "ssh -x" boot_schema
           Tells LAM to use the "rsh" boot module, and tells the rsh module to use  "ssh  -x"  as  the  specific
           agent to launch executables on remote nodes.

       And  so  on.   LAM's boot SSI modules are described in lamssi_boot(7).  This page should be consulted for
       specific actions that are taken by, and how to tweak the run-time behavior of each boot module.

       The -ssi switch can be used multiple times to specify different key and/or value arguments.  If the  same
       key is specified more than once, the values are concatenated with a comma (",") separating them.

       Note that the -ssi switch is simply a shortcut for setting environment variables.  The same effect may be
       accomplished  by setting corresponding environment variables before running lamboot.  The form of the en‐
       vironment variables that LAM sets are: LAM_MPI_SSI_key=value.

       Note that the -ssi switch overrides any previously set environment variables.  Also note that unknown key
       arguments are still set as environment variable -- they are not checked  (by  lamwipe)  for  correctness.
       Illegal or incorrect value arguments may or may not be reported -- it depends on the specific SSI module.

   Remote Executable Invocation
       All  tweakable aspects of launching executables on remote nodes during lamboot are discussed in lamssi(7)
       and lamssi_boot(7).  Topics include (but are not limited to): discovery of remote shell,  run-time  over‐
       rides of the agent use to launch remote executables (e.g., rsh and ssh), etc.

   Closing stdio
       The  stdio of each LAM daemon on a remote host that is launched by lamboot is closed by default.  Normal‐
       ly, the stdio of the LAM daemon launched on the local host is left open so that  the  internal  LAM  tst‐
       dio(3)  package  works  properly.  However, it is sometimes desirable to close the stdio of the local LAM
       daemon as well.  For example:

              rsh somenode lamboot -s hostfile

       This is because rsh waits for two conditions before exiting: lamboot to exit, and stdout / stderr  to  be
       closed.  Without -s, stdout / stderr would not be closed, and rsh (and ssh) will hang even though lamboot
       had completed.  -s causes the stdout / stderr of the local LAM daemon to be closed upon invocation, which
       will  allow  rsh to complete.  Using -s will not affect lamboot in any other way, but it will prevent the
       tstdio(3) package from working properly.

   Fault Tolerance
       If the -x option is given, LAM runs in fault tolerant mode.  In this mode, nodes exchange ``heart  beat''
       messages  periodically  to make sure all nodes are running and the links connecting them are operational.
       When a node's heart beats stop, it is declared ``dead'' and all LAM nodes (and processes)  are  notified.
       This  allows  users to write fault tolerant applications that can degrade gracefully, or fully recover by
       replacing the defunct node with another (see lamgrow(1)).   Since  this  mode  introduces  a  performance
       penalty, it is not activated by default.

EXAMPLES

       lamboot -v
           Start LAM on the machines described in the default boot schema.  Report about important steps as they
           are done.

       lamboot -d hostfile
           Start LAM on the machines described in file hostfile.  Provide incredibly detailed reports on what is
           happening at each stage in the boot process.

       lamboot mynodes
           Start LAM on the machines described in the boot schema mynodes.  Operate silently.

FILES

       laminstalldir/etc/lam-bhost.def   default  boot schema file, where "laminstalldir" is the directory where
                                         LAM/MPI was installed

       laminstalldir/etc/lam-conf.lamd   default process schema file for LAM nodes

SEE ALSO

       recon(1), lamwipe(1), hboot(1), tstdio(3), bhost(5), conf(5), lam-helpfile(5), lamssi(7), lamssi_boot(7)

LAM 7.1.4                                          July, 2007                                         LAMBOOT(1)