Provided by: glimpse_4.18.7-8build1_amd64 bug

NAME

       agrep - search a file for a string or regular expression, with approximate matching capabilities

SYNOPSIS

       agrep [ -#cdehiklnpstvwxBDGIS ] pattern [ -f patternfile ] [ filename... ]

DESCRIPTION

       agrep  searches  the input filenames (standard input is the default, but see a warning under LIMITATIONS)
       for records containing strings which either exactly or approximately match a pattern.   A  record  is  by
       default a line, but it can be defined differently using the -d option (see below).  Normally, each record
       found  is  copied  to  the standard output.  Approximate matching allows finding records that contain the
       pattern  with  several  errors  including  substitutions,  insertions,  and  deletions.    For   example,
       Massechusets  matches  Massachusetts with two errors (one substitution and one insertion).  Running agrep
       -2 Massechusets foo outputs all  lines  in  foo  containing  any  string  with  at  most  2  errors  from
       Massechusets.

       agrep  supports  many  kinds of queries including arbitrary wild cards, sets of patterns, and in general,
       regular expressions.  See PATTERNS below.  It supports most of the options supported by the  grep  family
       plus several more (but it is not 100% compatible with grep).  For more information on the algorithms used
       by  agrep  see  Wu  and Manber, "Fast Text Searching With Errors," Technical report #91-11, Department of
       Computer Science, University of Arizona, June 1991 (available by anonymous  ftp  from  cs.arizona.edu  in
       agrep/agrep.ps.1),  and Wu and Manber, "Agrep -- A Fast Approximate Pattern Searching Tool", To appear in
       USENIX Conference 1992 January (available by anonymous ftp from cs.arizona.edu in agrep/agrep.ps.2).

       As with the rest of the grep family, the characters `$', `^', `', `[', `]', `^', `|', `(', `)', `!', and
       `\' can cause unexpected results when included in the pattern, as these characters are also meaningful to
       the shell.  To avoid these problems, one should always enclose the  entire  pattern  argument  in  single
       quotes, i.e., 'pattern'.  Do not use double quotes (").

       When  agrep is applied to more than one input file, the name of the file is displayed preceding each line
       which matches the pattern.  The filename is not displayed when  processing  a  single  file,  so  if  you
       actually want the filename to appear, use /dev/null as a second file in the list.

OPTIONS

       -#     #  is  a  non-negative  integer  (at  most 8) specifying the maximum number of errors permitted in
              finding the approximate matches (defaults to  zero).   Generally,  each  insertion,  deletion,  or
              substitution  counts  as  one  error.   It  is possible to adjust the relative cost of insertions,
              deletions and substitutions (see -I -D and -S options).

       -c     Display only the count of matching records.

       -d 'delim'
              Define delim to be the separator between two records.  The default value is '$', namely  a  record
              is by default a line.  delim can be a string of size at most 8 (with possible use of ^ and $), but
              not  a  regular  expression.  Text between two delim's, before the first delim, and after the last
              delim is considered as one record.  For example, -d '$$' defines  paragraphs  as  records  and  -d
              '^From '  defines  mail  messages  as records.  agrep matches each record separately.  This option
              does not currently work with regular expressions.

       -e pattern
              Same as a simple pattern argument, but useful when the pattern begins with a `-'.

       -f patternfile
              patternfile contains a set of (simple) patterns.  The output is all lines that match at least  one
              of  the  patterns  in  patternfile.   Currently,  the -f option works only for exact match and for
              simple patterns (any meta symbol is interpreted as a regular character);  it  is  compatible  only
              with -c, -h, -i, -l, -s, -v, -w, and -x options.  see LIMITATIONS for size bounds.

       -h     Do not display filenames.

       -i     Case-insensitive search — e.g., "A" and "a" are considered equivalent.

       -k     No  symbol  in  the  pattern is treated as a meta character.  For example, agrep -k 'a(b|c)*d' foo
              will find the occurrences of a(b|c)*d in foo whereas agrep 'a(b|c)*d' foo will find substrings  in
              foo that match the regular expression 'a(b|c)*d'.

       -l     List  only the files that contain a match.  This option is useful for looking for files containing
              a certain pattern.  For example, " agrep -l 'wonderful'  * " will list the names of those files in
              current directory that contain the word 'wonderful'.

       -n     Each line that is printed is prefixed by its record number in the file.

       -p     Find records in the text that contain a supersequence of the pattern.  For example,
               agrep -p DCS foo will match "Department of Computer Science."

       -s     Work silently, that is, display nothing except error messages.  This is useful  for  checking  the
              error status.

       -t     Output  the  record  starting  from  the  end of delim to (and including) the next delim.  This is
              useful for cases where delim should come at the end of the record.

       -v     Inverse mode — display only those records that do not contain the pattern.

       -w     Search for the pattern as a word — i.e., surrounded  by  non-alphanumeric  characters.   The  non-
              alphanumeric must surround the match;  they cannot be counted as errors.  For example, agrep -w -1
              car will match cars, but not characters.

       -x     The pattern must match the whole line.

       -y     Used  with  -B  option.  When -y is on, agrep will always output the best matches without giving a
              prompt.

       -B     Best match mode.  When -B is specified and no exact matches are  found,  agrep  will  continue  to
              search  until  the  closest  matches  (i.e., the ones with minimum number of errors) are found, at
              which point the following message will be shown: "the best match contains x errors,  there  are  y
              matches,  output  them?  (y/n)"  The  best  match  mode is not supported for standard input, e.g.,
              pipeline input.  When the -#, -c, or -l options are specified,  the  -B  option  is  ignored.   In
              general, -B may be slower than -#, but not by very much.

       -Dk    Set  the  cost  of a deletion to k (k is a positive integer).  This option does not currently work
              with regular expressions.

       -G     Output the files that contain a match.

       -Ik    Set the cost of an insertion to k (k is a positive integer).  This option does not currently  work
              with regular expressions.

       -Sk    Set  the  cost  of  a substitution to k (k is a positive integer).  This option does not currently
              work with regular expressions.

PATTERNS

       agrep supports a large variety of patterns, including simple strings, strings with classes of characters,
       sets of strings, wild cards, and regular expressions.

       Strings
              any sequence of characters, including the special symbols `^' for beginning of line  and  `$'  for
              end  of  line.  The special characters listed above ( `$', `^', `', `[', `^', `|', `(', `)', `!',
              and `\' ) should be preceded by `\' if they are to be matched as regular characters.  For example,
              \^abc\\ corresponds to the string ^abc\, whereas  ^abc  corresponds  to  the  string  abc  at  the
              beginning of a line.

       Classes of characters
              a  list  of  characters  inside  []  (in  order)  corresponds to any character from the list.  For
              example, [a-ho-z] is any character between a and h or between o and z.  The symbol `^'  inside  []
              complements  the  list.   For  example,  [^i-n]  denote  any character in the character set except
              character 'i' to 'n'.  The symbol `^' thus has two meanings, but this is  consistent  with  egrep.
              The symbol `.' (don't care) stands for any symbol (except for the newline symbol).

       Boolean operations
              agrep  supports  an  `and' operation `;' and an `or' operation `,', but not a combination of both.
              For example, 'fast;network' searches for all records containing both words.

       Wild cards
              The symbol '#' is used to denote a  wild  card.   #  matches  zero  or  any  number  of  arbitrary
              characters.   For  example,  ex#e matches example.  The symbol # is equivalent to .* in egrep.  In
              fact, .* will work too, because it is a valid regular expression (see below), but unless  this  is
              part of an actual regular expression, # will work faster.

       Combination of exact and approximate matching
              any pattern inside angle brackets <> must match the text exactly even if the match is with errors.
              For  example,  <mathemat>ics matches mathematical with one error (replacing the last s with an a),
              but mathe<matics> does not match mathematical no matter how many errors we allow.

       Regular expressions
              The syntax of regular expressions in agrep is in general the same as that for  egrep.   The  union
              operation  `|',  Kleene  closure  `*', and parentheses () are all supported.  Currently '+' is not
              supported.  Regular expressions are currently limited to approximately  30  characters  (generally
              excluding  meta  characters).  Some options (-d, -w, -f, -t, -x, -D, -I, -S) do not currently work
              with regular expressions.  The maximal number of errors for regular expressions that  use  '*'  or
              '|' is 4.

EXAMPLES

       agrep -2 -c ABCDEFG foo
              gives the number of lines in file foo that contain ABCDEFG within two errors.

       agrep -1 -D2 -S2 'ABCD#YZ' foo
              outputs  the  lines  containing  ABCD  followed,  within arbitrary distance, by YZ, with up to one
              additional insertion (-D2 and -S2 make deletions and substitutions too "expensive").

       agrep -5 -p abcdefghij /path/to/dictionary/words
              outputs the list of all words containing at least 5 of the first 10 letters  of  the  alphabet  in
              order.   (Try  it:   any  list  starting  with  academia  and  ending  with sacrilegious must mean
              something!)

       agrep -1 'abc[0-9](de|fg)*[x-z]' foo
              outputs the lines containing, within up to one error, the string that starts with abc followed  by
              one digit, followed by zero or more repetitions of either de or fg, followed by either x, y, or z.

       agrep -d '^From ' 'breakdown;internet' mbox
              outputs  all  mail  messages  (the  pattern  '^From ' separates mail messages in a mail file) that
              contain keywords 'breakdown' and 'internet'.

       agrep -d '$$' -1 '<word1> <word2>' foo
              finds all paragraphs that contain word1 followed by word2 with one error in place  of  the  blank.
              In  particular,  if word1 is the last word in a line and word2 is the first word in the next line,
              then the space will be substituted by a newline symbol and it will match.  Thus, this is a way  to
              overcome  separation  by a newline.  Note that -d '$$' (or another delim which spans more than one
              line) is necessary, because otherwise agrep searches only one line at a time.

       agrep '^agrep' <this manual>
              outputs all the examples of the use of agrep in this man pages.

SEE ALSO

       ed(1), ex(1), grep(1V), sh(1), csh(1).

BUGS/LIMITATIONS

       Any  bug  reports  or  comments  will  be  appreciated!   Please  mail  them  to   sw@cs.arizona.edu   or
       udi@cs.arizona.edu

       Regular  expressions  do not support the '+' operator (match 1 or more instances of the preceding token).
       These can be searched for by using this syntax in the pattern:

          'pattern(pattern)*'

       (search for strings containing one instance of the pattern, followed  by  0  or  more  instances  of  the
       pattern).

       The  following  can  cause  an infinite loop: agrep pattern * > output_file.  If the number of matches is
       high, they may be deposited in output_file before it is completely read leading to more  matches  of  the
       pattern within output_file (the matches are against the whole directory).  It's not clear whether this is
       a "bug" (grep will do the same), but be warned.

       The maximum size of the patternfile is limited to be 250Kb, and the maximum number of patterns is limited
       to be 30,000.

       Standard input is the default if no input file is given.  However, if standard input is keyed in directly
       (as opposed to through a pipe, for example) agrep may not work for some non-simple patterns.

       There  is  no  size  limit  for  simple  patterns.   More  complicated  patterns are currently limited to
       approximately 30 characters.  Lines are limited to 1024 characters.  Records are limited to 48K, and  may
       be  truncated  if  they are larger than that.  The limit of record length can be changed by modifying the
       parameter Max_record in agrep.h.

DIAGNOSTICS

       Exit status is 0 if any matches are found, 1 if none, 2 for syntax errors or inaccessible files.

AUTHORS

       Sun Wu and Udi Manber,  Department  of  Computer  Science,  University  of  Arizona,  Tucson,  AZ  85721.
       {sw|udi}@cs.arizona.edu.

                                                  Jan 17, 1992                                          AGREP(1)