Provided by: fitsh_0.9.4-1_amd64 bug

NAME

       grmatch - pairing lines by involving identifier or cross matching

SYNOPSIS

       grmatch [options] -r <reference> -i <input> [-o <output>]

DESCRIPTION

       The  program `grmatch` matches lines read from two input files, namely from a reference and from an input
       file. All implemented algorithms are symmetric, in the manner that the result should be the same if these
       two files are swapped. The only case when the order of these files is important  is  when  a  geometrical
       transformation  is  also  returned  (see  point  matching  below), in this case the swapping of the files
       results the inverse form of the original transformation. The lines (rows) can be  matched  using  various
       criteria.  1.  Lines  can  be  matched  by  identifier,  where the identifier can be any concatenation of
       arbitrary, space-separated columns found in the files. Generally, the  identifier  is  represented  by  a
       single  column (e.g. it is an astronomical catalog identifier). The behaviour of the program can be tuned
       for the cases when there are more than one rows with the same identifier. 2. Lines can be matched using a
       2-dimensional point matchig algorithm. In this method, the program expects two-two columns both from  the
       reference and input files which can be treated as X and Y coordinates. If both point lists are known, the
       program  tries  to  find  the appropriate geometrical transformation which transforms the points from the
       frame of the reference list to the frame of the input list and,  simultaneously, tries to  find  as  many
       pairs  as  possible.  The   parameters  of  the geometrical transformation and the whole algorithm can be
       fine-tuned. 3. Lines can be matched using arbitrary- (N-) dimensional coordinate matching algorithm. This
       method expects N-N columns both from the reference and input files which can be treated as X_1, ...,  X_N
       Cartesian  coordinates  and  the  method  assumes both of the point sets in the same reference frame. The
       point 'A' from the reference list and  the point 'P' from the input list forms  a  pair  if  the  closest
       point to 'A' from the input list is 'P' and vice versa.

OPTIONS

   General options:
       -h, --help
              Give general summary about the command line options.

       --long-help, --help-long
              Gives a detailed list of command line options.

       --wiki-help, --help-wiki, --mediawiki-help, --help-mediawiki
              Gives a detailed list of command line options in Mediawiki format.

       --version, --version-short, --short-version
              Give some version information about the program.

       -C, --comment
              Comment the output (both the transformation file and the match file).

   Options for input/output specifications:
       -r <file>, --reference <file> --input-reference <file>
              Mandatory, name of the reference file.

       <inputfile>, -i <inputile>, --input <inputfile>
              Name  of  the  input file. If this switch is omitted, the input isread from stdin (specifying some
              input is mandatory).

       --separator-reference <char>|space, --separator-input <char>|space
              Character for separating the fields of the reference and the input input files,  respectively.  By
              default,  the  separation is done using whitespaces, it can be ephasized by defining 'space' here.
              Otherwise,  the  character  <char>  should  only  be  a  single  character.  For   instance,   use
              '--separator-reference ,' and/or '--separator-input ,' to process CSV files.

       -o <output>, --output <output>, --output-matched <output>
              Name  of  the  output  file, containing the matched lines. The matched lines are pasted lines, the
              first part is from the reference file and the second part is from the input file, these two  parts
              are  concatenated  by  a  TAB character. This switch is  optional, if it is not specified, no such
              output will be generated.

       --output-matched-reference <out>, --output-matched-input <out>
              Name of the output file, containing the lines corresponding to matches but only from the reference
              file or from the input file, respectively.

       --output-excluded-reference <out>, --output-excluded-input <out>
              Names of the files which contain the valid but excluded lines from  the  reference  and  from  the
              input.  These  outputs  are  disjoint  from the previous output and altogether contaions all valid
              lines.

       --output-id <out>
              Name of the file which contaions only the  identifiers  of  the  matched  lines.  If  the  primary
              matching  method  was  not  identifier  matching,  one  should  specify  the column indices of the
              identifiers by --col-ref-id and --col-inp-id also.

       --output-transformation <output-transformation-file>
              Name of the output file containing the  geometrical  transformation, in human-readable format,  if
              the  matching  method  was  point  matching  (in  other  case,  this  option has no  effect).  The
              commented  version  of this file includes some statistics about the matching  (the  total   number
              of   lines   used   and   matched,   the required CPU time, the final triangulation level, the fit
              residuals and other things like these).

       In all of the above input/output file specifications,  the  replacement of  the   file  name  by  "-"  (a
       single  minus  sign)  forces  the reading from stdin or writing to stdout. Note that all parts of the any
       line  after "#" (hashmark) are treated as a comment, therefore ignored.

   General options for point matching:
       --match-points
              This  switch  forces  the usage of the point matching method. By default, this method is   assumed
              to  be  used,  therefore  this switch can be omitted.

       --col-ref <x>,<y>, --col-inp <x>,<y>
              The  column  indices containing the X and Y coordinates, for the reference and for the input file,
              respectively.  The  index  of the first  column  is  always 1, the index of the second is 2 and so
              on. Lines in which these columns do not contain valid real numbers bers are omitted.

       -a <order>, --order <order>
              This  switch specifies the polynomial order of the resulted geometrical transformation. It can  be
              arbitrary  positive  integer. Note that if the order is A, at least (A+1)*(A+2)/2 valid points are
              needed both from the reference and both from the input  file to fit the transformation.

       --max-distance <maxdist>
              The   maximal  accepted  distance  between the matched points in the coordinate frame of the input
              coordinate list (and not in the coordinate frame of the reference coordinate list). Possible pairs
              (which are valid pairs due to the  symmetric  coordinate  matching   algorihms)  are  excluded  if
              their  Eucledian  distance  is  larger  than maxdist. Note that  this option has no initial value,
              therefore, if omitted, all possible pairs due to the symmetric matching are  resulted,  which,  in
              certain  cases   in   practice,   can  result  unexpected  behaviour.  One should always specify a
              reasonable maximal distance which can be estimated  only  by  the  knowledge  of  the  physics  of
              the input files.

       See  more  options  concerning  to  point  matching  in  the  section "Fine-Tuning   of  Point  Matching"
       below. That  section  also describes the tuning of the  triangulation   used   by   the   point  matching
       algorithm.   For   a  more  detailed description about the point matching algorithms based on pattern and
       triangle matching see [1], [2] or [3].

   General options for coordinate matching:
       --match-coord, --match-coords
              This  switch forces the usage of the coordinate matching method. Note that because of  the  common
              options  with the point  matching method, one should specify this switch to force the usage of the
              coordinate matching method (the default method is  point  matching, see above).

       --col-ref <x>[,<y>,[<z>...]] --col-inp <x>[,<y>,[<z>...]]
              The  column  indices containing the spatial coordinates, for the reference and for the input file,
              respectively. The index of the first  column  is  always 1, the index of the second is  2  and  so
              on. Lines in which these columns do not contain valid real  numbers are  omitted.  Note  that  the
              dimension  of  the  coordinate  matching  space  is specified indirectly, by the number of  column
              indices  listed  here.  Because  of  this,  the number of column indices should be  the  same  for
              the reference and input, in other case,  when  the  dimensions  are  mismatched, the program exits
              unsuccessfully.

       --max-distance <maxdist>
              The  maximal  accepted distance between the matched points. Possible  pairs (which are valid pairs
              due to the symmetric coordinate matching algorihms) are excluded if  their  Eucledian distance  is
              larger than maxdist. Note that this option has  no  initial  value,  therefore,  if  omitted,  all
              possible pairs due to the  symmetric  matching  are  resulted (see also point matching, above).

   General options for identifier matching:
       --match-id, --match-identifiers
              This switch forces the usage of the identifier matching  method.

       --col-ref-id <i>[,<j>,[<k>...]] --col-inp-id <i>[,<j>,[<k>...]]
              Column   index   or   indices   containing  the identifiers, from the reference and from the input
              file, respectively.

       --no-ambiguity, --first-ambiguity, --any-ambiguity, --full-ambiguity
              These options tune the behaviour of the matching when  there is more  than  one  occurrence  of  a
              given  identifier  in  the  reference  and/or  input  file.  If --no-ambiguity is specified, these
              identifiers  are discarded, this is the default method.  If --first-ambiguity is  specified,  only
              the  first  occurence is treated as a matched  line, independently from the number of occurrences.
              If the switch --any-ambiguity is specified, the lines  are  paired sequentally, until there is any
              left from the reference and from the input.  For  example,  if  there  is  4  occurrences  in  the
              reference   and   6   in  the  input  file  of  a  given identifier, 4 matched pairs are returned.
              Otherwise, if  --full-ambiguity  is  specified,  all  possible  combinations  of   the  lines  are
              treated  as matched lines. For example, if there is  4  occurrences  in  the reference  and  6  in
              the input file of a given identifier, all 4*6=24 combinations are returned as matched pairs.

   Fine-tuning of point matching:
       --triangulation <parameters>
              This switch  is  followed  by  comma-separated  directives, which specify the  parameters  of  the
              triangulation-based point matching algorithm:

       delaunay, level=<level>, full, auto, unitarity=<U>
              These   directives  specify the triangulation level used for point matching. "delaunay" forces the
              usage only of the Delaunay-triangles.  This is the fastest method, however, it is only working  if
              the  points  in  the  reference  and input lists are almost  competely  overlapping  and  describe
              almost  the  same point sets (within a ratio of common   points   above   60-70%).   The   "level"
              specifies   the  level  of the expansion of the Delaunay-triangulation (see [1] for more details).
              In  practice,  the  lower  the ratio  of common points and/or the ratio of  the  overlapping,  the
              higher  level  should  be  used.   Specifying "level=1" or  "level=2" gives  a  robust  but  still
              fast method for general usage. The directive "full"  forces  full  triangulation.   This  can   be
              overwhelmingly   slow   and   annoying   and  requires tons of memory if there are more than 40-50
              points (the amounts of these resources are  proportional  to  the 6th(!)  and  3rd  power  of  the
              number  of  the  points,  respectively).  The  directive   "auto"   increases   the level  of  the
              triangulation  expansion  automatically  until a proper match is found. A match is considered as a
              good match if the unitarity of the transformation is less than the unitarity U  specified  by  the
              "unitarity=U" directive (see also  the  section Notes/Unitarity below).

       mixed, conformable, reverse
              These  directives  define  the chirality of the triangle spaces to be used.  Practically, it means
              the following. If we don't  know whether the input and reference lists are inverted respecting  to
              each  other,  one  should use "mixed" triangle  space.  If  we  are sure  about that the input and
              reference lists are not inverted, we can use "conformable" triangle space. If  we  know  that  the
              input  and  reference  lists  are inverted,  we  can  use  "reverse"  space.  Note  that  although
              "mixed"   triangle   space   can   always  result   a   good  match,  it is a wise idea to fix the
              chirality by specifying "conformable" or "reverse" if we really know that  the  point   sets   are
              not   inverted  or  inverted respecting to each other. If the  chirality  is  fixed,  the  program
              yields more matched  pairs,  the  appropriate  triangulation  level  can  be smaller and in "auto"
              mode, the program returns the match  definitely faster.

       maxnumber=<max>, maxref=<mr>, maxinp=<mi>
              These directives specify the maximal number of points which are used for triangulation  (for   any
              type   of  triangulation). If "maxnumber"  is  specified,  it is equivalent to define "maxref" and
              "maxinp" with the same values. Then,  the  first  <mr>  points from  the  reference and the  first
              <mi>  points  from the input list are used to generate the triangle sets. The "first"  points  are
              selected  using  the  optional  information  found in  one  of  the  columns,  see  the  following
              switches.

       (Note  that  there should be only one --triangulation switch, all desired directives  should  be  written
       in the same argument, separated by commas.)

       --col-ref-ordering [-]<w>, --col-inp-ordering [-]<w>.
              These switches specify one-one column index from  the  reference and from the  input  files  which
              are used to order these lists and select the first "maxref" and "maxinp" points  (see  above)  for
              the   generation   of  the  two  triangle meshes. Both columns should contain valid real  numbers,
              otherwise  the  whole(!)  line  is excluded (not only from sorting but  from  the  whole  matching
              procedure).  If  there  is  no  negative  sign  before the  column  index, the  data are sorted in
              descending(!) order, therefore the lines with the lines with the highest(!)  values  are  selected
              for   triangulation.   If   there   is  a  negative  sign before the index, the data are sorted in
              ascending order by  these  values,  therefore the  lines with the smallest(!) values are  selected
              for  triangulation.  For  example, if we want to match star  lists,  we  might want  to  use  only
              the brightest ones to generate the triangle sets. If the brightnesses of the stars are   specified
              by   their  fluxes,   we should not use the negative sign (the list should be sorted in descending
              order to select the first few lines as  the brightest  stars),  and if the brightness is known  by
              the magnitude, we have to use the negative sign.

       --fit iterations=<N>,firstrejection=<F>,sigma=<S>
              Like  --triangulation, this switch is  followed  by  some  directives.  These  directives  specify
              the  number <N> of iterations ("iterations=<N>")  for   point   matching.   The   "firstrejection"
              directive   speciy   the   serial   number  <F> of the first iteration where points farer than <S>
              "sigma" level are excluded in the next  iteration.   Note   that   in   practice   these  type  of
              iteration  is  really  not important (due to, for instance, the limitations of the outliers by the
              --max-distance switch), however, some suspicious users can be convinced by such arguments.

       --weight reference|input,column=<wi>,[magnitude],[power=<p>]
              These  directives  specify  the  weights  which  are  used  during  the  fit  of  the  geometrical
              transformation.  For  example,  in  practice it  is  useful  in the following situation. We try to
              match star lists, then the fainter  stars  are  believed  to  have  higher  astrometrical  errors,
              therefore  they  should  have  smaller  influence  in  the fit. We can take the weights  from  the
              reference  (specify "reference") and from the input (specify "input"), from the  column  specified
              by  the  weight-index.  The  weights   can   be  derived from  stellar  magnitudes, if so, specify
              "magnitude" to convert the read values in magnitude to  flux.  The  real  weights   then   is  the
              "power"th   power   of   the   flux.   The   default  value  of the "power" is 1, however, for the
              maximum-likelihood estimation  of an assumed Gaussian distribution,  the  weights  should  be  the
              second power of the fluxes.

       Some notes on unitarity.  The unitarity of a geometrical transformation measures  how it differs from the
       closest  transformation  which  is  affine  and a combination of dilation, rotation and shift. For such a
       transformation  the unitarity  is  0 and if the second-order terms in a  transformation  distort  a  such
       unitary  transformation,  the  unitarity  will   have   the   same  magnitude  like the magnitude of this
       second-order effect. For example, to map a part of a sphere with the size  of  d  degrees  will  have  an
       unitarity  of  1-cos(d).  Therefore,  for  astrometrical  purposes,  a  reasonable  value of the critical
       unitarity in "auto" triangulation  mode  can  be estimated  as  2 or 3 times 1-cos(d/2) where  d  is  the
       size of the field in which astrometry should be performed.

REPORTING BUGS

       Report bugs to <apal@szofi.net>, see also https://fitsh.net/.

COPYRIGHT

       Copyright © 1996, 2002, 2004-2008, 2010-2016, 2018-2020; Pal, Andras <apal@szofi.net>

grmatch 0.9.4 (2021.01.24)                        January 2021                                        GRMATCH(1)