Provided by: fitsh_0.9.4-1_amd64 bug

NAME

       grcollect - performing transposition on the input tabulated data

SYNOPSIS

       grcollect [options] <input> [...] [-o <output>|-b <basename>]

DESCRIPTION

       The main purpose of the program `grcollect` is twofold. First, it is intended to do data transposition on
       the  input  data,  i.e.  the input (which is read from files or standard input) is sorted and splitted to
       separate files where the splitting is based on a respective key. These keys  are  taken  from  the  input
       data.  In  such  a  case  where the input is from more files and each key is unique in a given file, this
       process is called data transposition (since it is similar when a 2 dimensional data matrix is  stored  in
       the  form  as  each  row  is in a separate file, and one intends to transpose the matrix, i.e. store each
       column in a separate file). The other feature of `grcollect` is to do some sort  of  statistics  on  data
       associated to different keys. These statistics include average (mean, median, mode) and scatter (standard
       deviation  or  median  deviance)  estimations with the optional deselection of outlier points, summation,
       count statistics and so on.

OPTIONS

   General options:
       -h, --help
              Give general summary about the command line options.

       --long-help, --help-long
              Gives a detailed list of command line options.

       --wiki-help, --help-wiki, --mediawiki-help, --help-mediawiki
              Gives a detailed list of command line options in Mediawiki format.

       --version, --version-short, --short-version
              Give some version information about the program.

       <input> [,<input>, ...]
              Name of the input file. At least, one file should be specified. Reading from standard input can be
              forced using a single dash "-" as input file name. More dashes are silently ignored.

       -c, --col-base <key column index>
              Column index for the key.

   Data transposition specific options:
       -b, --basename <base-%b-name>
              Base name of the output files. The base name string should conatain at least one "%b"  tag,  which
              is replaced by the respective key string on the creation of the file.

       -x, --extension <extension>, -p, --prefix <prefix>
              Equivalent  to  "-b|--basename  <prefix>%b.<extension>".  Note that in practice, <prefix> might be
              some sort of directory name and extension is a regular file extension, but the above  substitution
              is  done literally. Therefore, the "dot" between the key and the <extension> is always inserted in
              the final name of the output files but a trailing slash is required at the end of <prefix> if  the
              files  are  to  be  created  in  that  particular  directory. Note also that this case, the target
              directory must exist before the invocation of `grcollect`, otherwise the output  files  cannot  be
              created.

       -C, --comment
              Insert  a  commented line (starting with "#") containing information about the version and command
              line invocation syntax of `grcollect` to the beginning of the transposed files.

       -S, --additional-comment <...>
              Insert an additional commented lines (starting with "#") to the beginning of the transposed files.

   Options for cumulative statistics:
       -d, --col-stat <>[,...]
              Comma-separated list of column indices on which the statistics are to be calculated. Columns  with
              non-numerical  contents  are ignored.Note that this option imply the cumulative statistics mode of
              `grcollect`.

       -o, --output <filename>
              The name of the output file to which the output  statistics  are  written.  The  total  number  of
              columns  in this file will be 1+C*N, where C is the number of columns (see -d|--col-stat) on which
              the statistics are calculated and N is the number of statistic quantities (see --stat). The  first
              column  in  the output file is the key, which is followed by the per-column list of statistics, in
              the same order as the user defined after -d|--col-stat and --stat.

       -s, --stat <list of statistics>
              Comma-separated list of statistics to be estimated on the input data. These can be one or more  of
              the following:

       count  Total number of records, for the given key.

       rcount The number of records after rejecting outliers (i.e. it is always the same as the "count" value if
              no "--rejection" was used).

       mean, median, mode
              Mean, median or mode statistics of the data.

       rmean, rmedian, rmode
              Mean, median or mode, after rejecting outliers.

       {mean|median|mode}stddev, {mean|median|mode}meddev, stddev
              Scatter  of the data around the mean, median or mode. The scatter can either be standard deviation
              (stddev) or median deviance (meddev). The literal "stddev"  is  the  classic  standard  deviation,
              equivalent to "meanstddev".

       r{mean|median|mode}stddev, r{mean|median|mode}meddev, rstddev
              The same scatters as above but after rejecting outliers.

       sum, rsum
              Sum of the data, esp. total sum and sum after rejecting outliers.

       sum2, rsum2
              Sum of the squares, total and after rejecting outliers.

       min, max
              Minimal and maximal data values.

       rmin, rmax
              Minimal and maximal data values after the rejection of outliers.

       -r, --rejection column=<index>,<rejection parameters>
              Comma-separated  directives  for  outlier  rejection  for  the  specified  column.  The  rejection
              parameters are:

       iterations=<n>
              Maximum number of iterations to reject outliers.

       mean, median, mode
              Use the mean, median or mode for the center of the rejection.

       stddev, meddev, absolute=<limit>
              Use the standard deviation or median deviance  for rejection limit units  or  define  an  absolute
              limit for rejection level.

       Note  that  each column can have different kind of rejection method, thus more than one "--rejection ..."
       command line option can be used at the invocation of `grcollect`.

   Other options:
       -m, --max-memory <memory>[kmg]
              Maximum amount of memory available for `grcollect`. The prefixes "k", "m" or "g" can be  used  for
              kilobytes,  megabytes and gigabytes, respectively. On 32bit systems, the maximum memory is limited
              to 3gigabytes. Note that `grcollect` does not use any kind of operating system specific methods to
              determine the maximum amount of memory, it always should be set by the user. The default value  of
              8 megabytes is somewhat small, so upon massive data transposition (tens or hundreds of gigabytes),
              this limit is worth to be set accordingly to the physical memory available.

       -t, --tmpdir <directory>
              Directory  for  temporary  file  storage.  Note that the default temporary directory is always the
              current one (which is is equivalent to define "--tmpdir ./"), since in a usual  configuration  the
              /tmp  directory is small, moreover, it can be some sort of "tmpfs", temporary file system mount on
              the physical memory itself.

REPORTING BUGS

       Report bugs to <apal@szofi.net>, see also https://fitsh.net/.

COPYRIGHT

       Copyright © 1996, 2002, 2004-2008, 2010-2016, 2018-2020; Pal, Andras <apal@szofi.net>

grcollect 0.9.4 (2021.01.24)                      January 2021                                      GRCOLLECT(1)