Provided by: msort_8.53-2.3build4_amd64 bug

NAME

       msort - sort records in complex ways

SYNOPSIS

       msort <options> [<input file>]

DESCRIPTION

       msort  is  a  program  for  sorting  text  files  in  sophisticated ways.  It was developed initially for
       alphabetizing dictionaries of languages in which the ordering may be quite different from English but has
       many other uses.

       msort allows you to sort blocks of text delimited in a number of ways  rather  than  just  lines  and  to
       specify  particular fields of a record as sort keys using either their position, counted from either end,
       or by matching regular expressions to their tags.

       msort is capable of sorting on multiple keys, so that when two records tie on one key,  the  tie  may  be
       broken on another. Any or all keys may be optional.  How absent optional keys are ordered with respect to
       present keys may be set separately for each key.

       msort  allows  you  to  specify  arbitrary  sort  orders  and  to  define  virtually unlimited numbers of
       multigraphs of effectively unlimited length.  The sort order and multigraphs are defined  separately  for
       each  key.  If your system has locale support, you can also use locale collation rules instead of specify
       your own sort order.

       msort provides twelve types of key comparison: lexicographic, numeric, numeric string, hybrid, by  string
       length,  by  angle,  by  date,  by  domain  name, by time, by ISO8601 date/time stamp, by month name, and
       random.

       What month names are used is a bit complicated. If the -s flag is used on the same key and  its  argument
       is  the  name  of a file, the month names are read from the file, which should be in the same format as a
       sort order definition file. If the -s flag is used and its argument is a locale  name,  the  month  names
       recognized will be the month names and abbreviations associated with the specified locale. If the -s flag
       is  not  used  the  month  names recognized will be the month names and abbreviations associated with the
       current locale. If your system does not have locale support and you do not use the -s flag  to  read  the
       month names from a file, the month names recognized will be the English month names and abbreviations.

       msort can reverse the characters in a key, allowing it to be used to generate reverse dictionaries.

       A choice of sorting algorithms is provided.

       msort  fully supports Unicode. The text to be sorted, and all specifications, should be in UTF-8 Unicode.
       (If you have plain ASCII text, this is not a problem as ASCII is a subset of Unicode.) Full Unicode case-
       folding is available, in Turkic and  non-Turkic  variants.  Unicode  normalization  is  performed  before
       sorting.

       For usage information, execute msort with no arguments.

       Full  information about msort is currently to be found in the reference manual, which is distributed as a
       PDF (Portable Document Format) file. If a copy is not available locally, you can download it from msort's
       home page:
       http://billposer.org/Software/msort.html

OPTIONS

   Informational options
       -h,--help
              Print usage message

       -v,--version
              Print version message

       -D,--defaults
              List defaults

       -F,--general-options
              List general command line options

       -G,--gnu-equivalences
              List equivalents for GNU sort command line options.

       -H,--informational-options
              List informational command line options

       -K,--key-specific-options
              List key-specific command line options

       -L,--limits
              List limits

       -N,--number-systems
              List the supported number systems.

   General options
       -b,--block
              A record is terminated by two or more newlines

       -l,--line
              A record consists of a single line

       -r,--record-separator <separator>
              A record is terminated by separator character

       -O,--fixed-size-record <bytes>
              A record consists of the specified number of bytes.

       -d,--field-separators <character>+
              Fields are delimited by the named character(s)

       -w,--whole
              Sort on the entire text of the record

       -a,--algorithm <algorithm>
              Use the specified sort algorithm. The choices are: I(nsertionSort), M(ergeSort), Q(uickSort),  and
              S(hellSort).   Note that InsertionSort and MergeSort are stable, while QuickSort and ShellSort are
              unstable. The default is QuickSort.

       -M,-initial-maximum-records <records>
              Set initial maximum number of records

       -m,--line-end-carriage-return
              End-of-line in the input data is marked by Carriage Return (0x0D) as on the Macintosh rather  than
              by Line Feed (0x0A) as on Unix systems.

       -I,--invert-globally
              Invert sense of comparisons globally

       -B,--BMP
              No characters fall outside the Basic Multingual Plane (that is, have values greater than 0xFFFF).

       -Z,--skip-first-record
              Copy  the  first  record in the input to the output without sorting it. This is useful for sorting
              files with a header.

       -p,--reserve-private-use-area
              Do not make internal use of the Private Use areas. By default, multigraphs are assigned internally
              to codepoints in the Supplementary Private Use areas if full Unicode is in use or to codepoints in
              the Private Use area if input is restricted to the Basic Multilingual Plane by  means  of  the  -B
              option.  If  your input makes use of the Private Use areas, this option prevents interference with
              your input. In this case, multigraphs will be  assigned  to  the  Low  and  High  Surrogate  areas
              (0xD800-0xDFFF). Note that this limits the number of multigraphs to 2,048.

       -P,--random-seed <seed>
              Set  the seed for the random number generator. If not set here, it is set to a value determined by
              the time. The seed used is reported in the log. This option allows runs to be replicated.

       -Q,--check-only
              Check whether the input is already sorted. Do not generate any output.  Exit status is 0 if  input
              is already sorted, 11 if not sorted.

       -1,--in <input file name>

       -2,--out <output file name>
              If  the  output  file is the same as the input file, the input file will be overwritten. The input
              file will not be overwritten if the run is unsuccessful.

       -j,--suppress-log
              Suppress output to the log. If this flag is given before there is any output to  the  log  from  a
              command  line  flag, nothing will be written to the log and the log file will not be created. If a
              command line flag generates a log message before this flag is processed,  the  log  file  will  be
              created  but  no log messages will be written to it once this flag is processed. To guarantee that
              no attempt will be made to open a log file, give this flag first.

       -q,--quiet
              Be quiet - do not chat while working

       -u,--unicode-normalization <mode>
              Select Unicode normalization mode. The choices of mode are: c for normalization form  C  (NFC),  d
              for  normalization  form  D (NFD), C for normalization form KC (NFKC), D for normalization form KD
              (NFKD), and n for no normalization. The default is NFC.

   Key specific options
       -e,--character-range <m,n>
              Sort on characters m through n. Positive  indices  start  from  one.   Negative  indices  indicate
              position with respect to the end of the record.  For example, the range 3,-2 consists of the third
              character through the next-to-last character.

       -n,--position <POS>(,<POS>)
              Sort  on  the  specified  POS  or  contiguous  range  of  POSs,  where a POS is of the form <field
              number>(.<character number>). Both counts begin at one.  Field numbers but not  character  numbers
              may  be negative, in which case they are counted from the right. Thus, 1.2 is the second character
              of the first field; -2.1 is the first character of the next to last field.

       -t,--tag <tag regexp>
              Sort on the field with the specified tag

       -o,--optional <comparison>
              Optional: compare as (<,=,>) to present key if absent

       -C,--fold-case
              Fold case

       -z,--fold-case-turkic
              Fold case with additional Turkic conversions.

       -c,--comparison-type <comparison type>
              a(ngle),l(exicographic), i(so8601 date/time), t(ime), D(omain name/email address), d(ate),  m(onth
              name), n(umeric), N(umeric string),s(ize), h(hybrid), r(andom)

       -y,--number-system <number system>
              Specifies  the  number  system expected for this key. This affects only numeric and numeric string
              keys. There are two special values. If the number system is "all", records may contain any  number
              system  that  msort can interpret. Different records may contain different number systems.  If the
              number system is "any", records may contain any writing system that msort can interpret,  but  all
              records must make use of the same number system.  msort sets the number system on the basis of the
              first record.

       -f,--date-format <date format>
              Permutation  of  ymd with separators, e.g. y-m-d for international date format, m/d/y for American
              date format, or a permutation of yd with separators, e.g. y-d, for day-of-year  dates.  All  three
              components  may  be  numbers  in  any available number system. The month field may also be a month
              name, determined by the same devices as independent month name fields.

       -W,--sort-order-file-separators <file name>
              Read the list of characters to be treated as separators in the sort order definition file.

       -S,--substitutions <file name>
              Read substitutions from named file

       -s,--sort-order <file name>|<locale name>|"locale"
              If the argument is a file name, it is taken to be a sort order file and the sort order for the key
              is read from the file. If the argument is a locale name, the collation rules for that  locale  are
              used. If the argument is "locale", the collation rules for the current locale are used.

       -T,--transformations <(d)(e)(s)>
              Apply  the  specified transformations.  d specifies that diacritics are to be stripped. Separately
              encoded combining  diacritics  are  removed.  Characters  with  diacritics  represented  by single
              codepoints are replaced with the corresponding ASCII character without the diacritics, if there is
              one.  e specifies that enclosed characters, that is, characters within circles or parentheses, are
              to  be  replaced  with  the corresponding plain ASCII character if there is one.  s specifies that
              characters in special styles are to be replaced with the corresponding plain  ASCII  character  if
              there  is  one.  Stylistic  equivalents  include: small capitals (e.g. U+1D04), script forms (e.g.
              U+212C), black letter forms  (e.g.  U+212D),  Arabic  presentation  forms  (e.g.  U+FE81),  Hebrew
              presentation  forms  (e.g.  U+FB1D), fullwidth forms (e.g. U+FF01), halfwidth forms (e.g. U+FF7B),
              and the mathematical alphanumeric symbols (e.g. U+1D400).

       -x,--exclusion-file <file name>
              Read exclusions from named file

       -X,--exclude-characters <exclusions>
              Exclude specified characters

       -i,--invert-locally
              Invert sense of comparisons

       -R,--reverse-key
              Reverse characters of key

       -A,--first-character-only
              Ignore all but the first character of the field, after substitutions, exclusions, etc.

       Note: long options may not be available on your system.

SEE ALSO

       sort(1), uninum(3)

AUTHOR

       Bill Poser (billposer@alum.mit.edu)

LICENSE

       GNU General Public License (http://www.gnu.org/licenses/gpl.html), version 3.

msort                                             January 2010                                          MSORT(1)