Provided by: collectl_4.3.1-1_all bug

NAME

        colmux - multiplex communications to multiple systems running collectl from a single system

SYNOPSIS

       colmux [-command "collectl-switches... [-p filespec]]" [-address addr1[,addr2,...]|-addr filename] [-cols
       col1[,col2...]] | [-column num]

DESCRIPTION

       This utility gathers up data generated by collectl from multiple systems and multiplexes it into a single
       consolidated  format.   It runs in essentially 2 distinct modes, the first is known as real-time, because
       data is retrieved and displayed in real time.  The second is playback mode because data  is  played  back
       from existing collectl data files.

       There  are  also  2  general  formats for the data being displayed.  The first is a multi-line display in
       which the data is displayed in the native form that collectl displays  it,  except  it  is  sorted  by  a
       distint  column,  essentially  allowing one to see the TOP producers of that data. The second format is a
       single line display in which one or more distinct data elements from each source is displayed on the same
       line.  This latter format is never sorted, but rather positionally organized by the name  of  the  system
       that generated it.

       Collectl  will  be  then  be  executed, using any optional switches specified by -command, on each of the
       systems specified by -address OR read those addresses from a file it the  target  of  that  switch  is  a
       filename  rather  than a list of hosts OR on the local system if -address is not specified.  See collectl
       for details of the various switches.  In some cases certain collectl switches will not make  sense  in  a
       colmux  environment and if chosen will generate an error.  Further, if hosts are specified with -address,
       they should be a individual addresses or hostnames separated by commas.  In turn, any of them can  be  in
       what those familiar with pdsh would recognize as -w format.

       Colmux  will  then  execute  the  collectl  command, gather the results from all sources for a particular
       interval and display them one result per line, sorted by the specified column OR all on the same line  in
       groups  specified  by  -cols.  The number of lines displayed is set to the size of the terminal window by
       default, but can be changed using -lines.  The one exception is the use of -nosort which only applies  to
       the  playback of existing collectl raw files.  In this mode all records for a particular interval will be
       displayed and the sorting bypassed, making this a speedy and convenient mechanism for gathering all  data
       from all systems in one place for potential further processing.

       Colmux  will never modify the size of the terminal window so to see more or wider lines either expand the
       window or override the number of display lines and run it again.  If the  number  display  lines  is  set
       greater  then  the terminal height or 0, colmux will no longer overlay the previous window and simply run
       in a continuous scrolling mode.

       Common Switches

       -address list|pdsh|filename
              Specify any combination of addresses as hostnames OR in pdsh -w format OR a filename containing  a
              list  of  hostnames/addresses,  1 per line.  You MUST have passwordless ssh access to these nodes.
              If a different username is required, be sure to specify addresses in username@host  format  noting
              you  do  not  have  to  have  the  same username on each host.  If specified, these usernames will
              override those specified with the -username switch.  rsh access is not supported.

       -command switches
              One can specify virtually any collectl command here, both in real-time  or  playback  mode.   Some
              switches may only be used during one mode or the other and colmux will usually let you know if you
              specify  an  invalid combination or an otherwise restricted switch.  Only those directly affecting
              colmux are listed below:

              --from, --thru
                     Limit the timeframe for data being played back, noting you can include both  the  from  and
                     thru times with the --from switch if you separate then with a hyphen.

              -o time-format
                     This  is a "magic" switch in that it not only tells collectl how to display dates/times (no
                     other options are permitted using -o other than those from the set [dDTm]), it  also  tells
                     colmux how to display dates/times too.

                     In  single line mode, the timestamp will either come from the host system in real-time mode
                     OR the first host when run in playback mode.  This is the most  common  use/need  for  this
                     switch.   But  be careful in choosing column numbers with -cols as the position of the data
                     shifts by 1 when time is included and by  2  if  date  and  time  are.   Using  -test  will
                     correctly  show  the  shifted  positions but only if you include -o with the command at the
                     same time you use -test.

                     In real-time/top mode this switch is not allowed since colmux simply  reports  the  current
                     time of the system it is running on.

                     When  playing  back  data multi-line formatted data from one or more files, a timestamp for
                     each interval is reported, consisting of the time of that interval.  When  this  switch  is
                     included,  each  line  will be tagged with an appropriate timestamp since on rare occasions
                     they may not necessarily all be identical.

              -p playback-file
                     This switch tells colmux to  run  in  playback  mode.   The  filename  should  include  the
                     directory  location and is usually specified with wild cards, limiting the selected file(s)
                     to a specific date.  When those files are on the same host  (-address  is  not  specified),
                     they may be for multiple hosts, but when the files are on remote hosts they must all be for
                     be that unique host.  If the file specification includes the string TODAY or YESTERDAY they
                     will be replaced with *yyyymmdd* for that date.

              -P
                     Run  collectl  in  plot-format.   This  allows one to specify just about any combination of
                     subsystems since all data is always displayed on a single line.  However, due to  the  lack
                     of  formatting,  this  also  makes  no  sense for multi-line displays and is therefore only
                     supported in single-line format.

       -help
              Show a brief help message and exit.

       -hostwidth n
              By default, colmux set the hostwidth to 8, unless it sees something wider and for most  situations
              this  is sufficient.  However, if one specifies hostnames that are aliases of the longer hostname,
              colmux has no way of knowing the real hostlengths  until  after  it  starts  receiving  data  from
              collectl and the formatting will be off if the hostnames are longer than the default.  To overcome
              this problem, use this switch to force the hostname to be wider.

       -lines
              Change  the  number of lines that are displayed for each interval in multi-line mode.  The default
              will be determined by the terminal size returned by the linux resize command if present.  If  that
              command  is  not  present,  the  size  will be initially set to 24.  If -lines is greater than the
              terminal size or 0, top-like behavior will not be used when in real-time mode.

              Single-line format controls the number of lines displayed between headers.  A value of 0 will only
              display the header one time.

       -noescape
              Colmux uses brute-force screen formatting, that is it generates its own VT100 escape sequences  to
              clear  lines  and/or move the cursor.  On some occasions you may want to disable this sequences if
              you wish to recode the output and do your own post-processing of it.  This  switch  will  do  just
              that.

       -port
              Sometimes  a  remote  version of collectl is already using the default socket.  This allows one to
              start another instance and override that value.

       -test
              This tells colmux to execute the specified collectl command either locally or on the first  remote
              system  specified by -address, print the associated header with the selected column(s) highlighted
              and also include each column name along with its ordinal number, making it  fairly  easy  to  make
              sure you've selected the right column(s).

       -username name
              Use  this  username  for  ALL ssh commands.  It can be overridden for specific hosts by specifying
              them with the -address switch with the desired hostnames.

       -version
              Display the version and exit.  It will also report if Term::ReadKey is installed and  if  so  what
              its version number is.

       Playback Mode Specific

       The  following  additional  switches  only  apply to playback mode.  There are no real-time mode specific
       switches.

       -delay seconds
              Introduce a delay between intervals in seconds.  You can specify  fractional  values.   Not  using
              this switch will cause the output to be displayed as fast as it can be rendered.

       -home
              Move  the  cursor  to  the home position (upper left-hand corner) of the display to use a top-like
              display format.  This ONLY applies to multi-line  mode  when  in  playback  mode  and  provides  a
              mechanism for displaying recorded data in a top-like fashion.

       -hostfilter addr[,addr]
              When  playing back files for multiple hosts on the local system, sometimes you do not want to play
              back ALL the host files.  This filter allows you to specify only those hosts  which  you  want  to
              process.  The format of the list of addresses is specified in the same way as -address except that
              you cannot specify a filename.

       -nosort
              Intended  primarily  for  output  that  would  be redirected to a file, do not sort or include any
              escape sequences in the output.

       Multi-Line Format

              When there is more output then will fit on the screen, colmux includes the text:
                     Displaying: lines xx thru yy out of zz
              on the right-side of the top line of the display, where xx is typically 1.

              However, once colmux is running, one might want to look at subsequent lines, ie  those  below  the
              bottom  of the screen and therefore invisible.  If the ReadKey module is installed, one can simply
              use the PageDown key to move down the display and the PageUp key to move in the  other  direction.
              If  ReadKey is not installed, typing the multi-key sequences pd<ENTER> or pu<ENTER> will cause the
              same thing to happen.

       -colhelp
              When you wish to change the sort column and the arrow keys aren't available  to  you,  it  may  be
              cumbersome  to identify the number of the column to type in followed by RETURN.  This tells colmux
              to display the numbers over each column eliminating the need to manually count them and  find  the
              one you want.

       -column num
              Set the sort column to this number.  The column numbering is determined by the columns returned by
              collectl for the requested command.  Since date/time columns are optional for non-plot data, their
              inclusion will change the numbering of the columns so if you are not sure you selected the correct
              column, you should first execute your command with -test included.

              You  can also change the column number interactively with the RIGHT/LEFT arrow keys IF the ReadKey
              module is installed (see colmux -version) OR simply type it in followed by the <ENTER> key.

       -finalcr
              There is a real odd case in which you might want to pipe colmux real-time output to a  script  for
              further  processing.   However,  if  you do this you can't read the final line with a routine that
              expects a terminating CR, like python's readline().  Rather, that  last  line  and  the  one  that
              follows  will  be  returned  as one long string.  This switch tell colmux to insert that final CR,
              which WILL mess up the screen under normal operations, so be forewarned.

       -hostformat char:pos
              There are times one has long hostnames which can either take up valuable screen real estate or are
              simply painful to look at.  This switch may  evolve  over  time  and  is  currently  targetted  as
              hostnames  that  have repeating parts along with a unique part, separated by a character such as a
              hyphen.  This switch allows you to specify a  single  character  followed  by  the  piece  of  the
              hostname  you'd  like  to  see displayed.  For example, if you have a hostname like aaa-bbbb-cccc-
              dddd, -hostformat -:3 will cause the cccc piece to be displayed.

       -nobold
              Do not highlight the selected column.  This may be useful when redirecting output to  a  file  and
              you do not want the associated escape sequences to be written to it.

       -reverse
              Reverse  the default sort order.  You can also change the direction of the sort interactively with
              the UP/DOWN arrow keys IF the ReadKey module is installed (see colmux -version)
               OR simply type the r key and <ENTER>.

       -zero
              Do not display any rows with 0 in the sort column.  You can also type z<ENTER>interactively.

       Single-Line Format

       -col1000
              Divide each column by 1000 before display

       -colk
              Divide each column by 1024 before display

       -collog10
              Remap large numbers to a smaller number of  values  by  taking  the  log10  of  them  and  further
              transforming by the followign mapping: 0,1 to 0, 10 to 10, 100 to 20, 1000 to 30, 10000 to 40, ...
              1e9 to 90.

       -cols num,...
              Group  all  data together for each host by column number(s).  As with -column, you can confirm the
              correct column(s) have been selected by first running with -test.

       -colnodet
              Do not show data for individual hosts, just display the totals.

       -colnodiv num,...
              Do not divide the specified column numbers by 1000 or 1024 when  col1000  or  colk  or  apply  the
              colllog10  transformation  when specified.  A typical usage is if you want to look at cpu loads as
              well as network or disk stats in which case you may want to divide the latter by 1024 but not  the
              cpu.

       -colnoinst
              Do no include instance portion (and surrounding brackets) in totals column headers.

       -coltotal
              Include the totals for each column to the right.

       -colwidth
              Set the output columns to this width, typically used in conjunction with -col1000 or colk to allow
              more  hosts  to  fit onto the same line.  It can also be used if the host names are too narrow for
              column headers and you have room to display wider names.

       Exception Reporting Specific

       In single-line format, rather than wait for all hosts to report their data,  colmux  simply  reports  the
       last  data seen when the time to generate a line of output has come.  In most cases, these do reflect the
       most recent data values but in times of load, the data may be late getting to colmux and  so  a  previous
       value  may  be  reported.   If the age of that data exceeds a defined number of intervals, the default is
       currently 2, an exception value will be  reported  of  -1.   At  other  times  it  has  been  seen  where
       kernel/driver  bugs  may  cause  incorrect values to be reported as negative numbers and those values are
       also reported as -1.  Both the age and exception values can be changed with the following switches.

       -age number
              When initially starting up and all hosts have not yet reported any data, colmux will display a  -1
              to  indicate  no  data  has  been  seen  yet.  If during processing a host fails to report in -age
              intervals, the default is 2, colmux will also report a -1 indicating the data is stale.

       -negdataval val
              In some cases, there could be erroneous  data  reported  as  negative  numbers  (though  sometimes
              negative numbers are valid).  When specified, replace any negative numbers with this value.

       -nodataval val
              This switch allows you to change the -1 that is normally reported for missing or stale data to the
              specified value, most commonly 0.

       Diagnostics

       The  following  switches are intended more for diagnostic purposes than normal operation, though are also
       worth using on appropriate occasions.

       -debug val
              This switch is for generating diagnostic information at various levels.   It  is  actually  a  bit
              mask,  whose values are listed in the beginning on colmux itself. Perhaps the most useful value is
              1 as it will cause colmux to display all the remote commands issues to each host  in  the  address
              list and can often reveal problems when things don't seem to be working correctly

       -nocheck
              This  switch  was  initially  included in an earlier version when remote host checking was causing
              problem in some cases and by skipping those checks, colmux would run more reliably.  While  it  is
              felt  that as of V3.2.0 these reachability checks are now reliable and should not be skipped, this
              switch has been left in place.

       -quiet
              By default and when -nocheck not specified, colmux checks the versions of all  collectl  instances
              against  that  of  the  first  node  found  to  be  running collectl and if different, reports the
              mismatch.  This switch suppresses that warning.

              When a connection is received from an unexpected address, a  warning  is  also  reported  and  the
              request  promptly  ignored.   This  switch  also  suppresses  those  messages  as  well.  For more
              information on problems connecting, see CONNECTION PROBLEMS.

       -reachable
              By default, when a node is found to not be reachable, colmux will remove it from its list of hosts
              and continue execution.  This switch will tell colmux to exit when all hosts are not reachable.

       Miscellaneous

       There are 2 switches whose descriptions don't really fit anywhere else:

       -colbin path
              On rare occasions, such as testing a patch to collectl in a copy NOT in /usr/bin, you may want  to
              tell  colmux to use that copy instead of the standard one.  Use this switch to point to that copy.
              Naturally that copy must exist in that location on all systems.

       -keepalive secs
              Colmux uses ssh to start collectl on each remote machine and then communications between  collectl
              and  colmux  occur  over  a  socket.   Normally, ssh is configured to timeout after an interval of
              inactivity, such as 30 minutes, which means a long-running  colmux  session  will  begin  to  lose
              connections when this interval is reached.  By specifying a keepalive interval, you're telling the
              ssh to send a periodic keepalive to the other end so that connection doesn't get dropped.

       -retaddr addr
              Tell  remote  collectls to open a socket on this address instead of the preselected one.  For more
              details on this, see CONNECTION PROBLEMS.

       -timeout secs
              By default, collectl waits up to 10 seconds for remote instances of collectl to connect back.   On
              slower  networks  or  when  a  very  large number of instances have been started, they may fail to
              connect back in time.  This switch will extend that timeout, but it also requires collectl  V3.6.4
              be used because earlier version do not support this feature.

       -timerange secs
              When  colmux starts up and checks the connectivity to all the machines specified by -addr, it also
              gets their current date/time and using that computes the range of system times across  all  nodes.
              If  that  time  is  found  to  be more then -timerange seconds, colmux generates a warning as this
              difference could cause reporting probems.  One can increase the range to get rid  of  the  message
              (not  recommended  unless other factors are preventing nodes from responding quickly enough to the
              date command) OR suppress the warning with -quiet.

PLAYBACK MODE RESTRICTIONS

       All logs being played back must have been collected using the same interval as colmux only looks  at  the
       first file/host to determine the appropriate value.

       It  is assumed all clocks are reasonably well synchronized as colmux uses time to determine which data is
       to be displayed as a set.

       All files must be in the same directory on all systems  and  that  directory  must  be  included  in  the
       playback file specification

       All files on a remote host must be for that host only

EXAMPLES

       Run  collectl  on 3 nodes, showing CPU, Disk and Network statistics once a second and sorted by column 1,
       which happens to be total cpu.

       colmux -addr abc,def,xyz

       Dynamically display top processes on nodes n1-n10 of a cluster once a second, sorted by column 5.

       colmux -addr n[1-10] -command "-sZ :1" -column 5

       Do the same for yesterday, between the hours of 5AM and 6AM, being sure to stall for 1/2  second  between
       intervals.   Note,  if you leave off -addr you could put all the logs into /var/log/collectl on the local
       host and play them back from there.

       colmux -addr n[1-10] -command "-sZ -p/var/log/collectl/YESTERDAY -from 05:00-06:00" -column 5 -delay .5

       Look at the amount of mapped and slab memory consumed on nodes n1-n10  and  n15  in  real-time,  every  2
       seconds  using  single-line  format.   Include  totals and preface each line with the time.  Since memory
       sizes tend to be rather large, divide each by 1024 so we see MB rather than KB.  Note  that  the  columns
       numbers  are  always  displayed are ascending order regardless of their order in -cols. To be sure, first
       test the column numbers.

       colmux -addr n[1-10,15] -command "-sm -i2 -oT" -cols 6,7 -coltot -colk -test
       colmux -addr n[1-10,15] -command "-sm -i2 -oT" -cols 6,7 -coltot -colk

       Display most active disks, based on KB written, on nodes n1, n4 and n5.

       colmux -addr n1,n4,n5 -command "-sD" -column 6

       Here is a cool trick.  Collectl currently lets you look at top processes with the --top switch  and  even
       choose  a  sort  column  by name.  However, if you want to change the column you need to exit, then rerun
       collectl with a different sort column name.  But if you run it like this example, you get  the  power  of
       colmux  to  dynamically  change the sort columns with the arrow keys!  You can also use this technique to
       have collectl dynamically sort any local multi-line data such as slabs or  even  detail  data  like  CPU,
       Disk,  Lustre  and  Networks  too!  Naturally this technique works just as well with playing back data as
       well.

       colmux -command "-sZ -i:1"

RESTRICTIONS

       colmux requires passwordless ssh between the node it is running on those it is monitoring.  also be  sure
       the port you are using for communications, the default is 2655, if open

CONNECTION PROBLEMS

       The  way  colmux  works  is  to  choose an address it wants to communicate over and starts up one or more
       remote copies of collectl, telling them to connect back to colmux using that address.  The easiest way to
       see this, is to run colmux with -noesc, which tells it NOT to issue any escape  sequences  and  therefore
       not  to  run  in  full screen mode.  The addional switch of -debug 1 tells it to show the remote collectl
       startup command.  When there is a communications problem you will typically see  'connection  timed  out'
       messages displayed.

       There  are  actually a couple of possibilities here, one of which is a firewall is preventing connections
       and the easiest way to test this is run collectl on the local machine like this: collectl -Aserver.  This
       tells collectl run as a server, listening for connections just like  colmux.   Then  log  into  a  remote
       machine  and run /usr/share/collectl/util/client.pl addr-of-server which tells client.pl to open a socket
       to that copy of collectl.  It should fail just like when it was run  via  colmux,  so   try  opening  the
       firewall  and  try  it  again.   If  it fixes the problem, it was indeed the firewall blocking things and
       colmux should now work just fine.

       Sometimes there are multiple interfaces defined on the machine hosting colmux and in some cases only some
       addresses will allow socket connections.  Again, using client.pl on the  remote  machine  try  connecting
       back  to  collectl  over  different  addresses  and when you find one that works, tell colmux to use that
       address for communication via the -retaddr switch.

AUTHOR

       This program was written by Mark Seger (mjseger@gmail.com).
       Copyright 2016 Hewlett-Packard Development Company, L.P.

SEE ALSO

       http://collectl-utils.sourceforge.net/colmux.html

LOCAL                                             DECEMBER 2010                                        COLMUX(1)