Provided by: ifile_1.3.9-8_amd64 bug

NAME

       ifile - core executable for the ifile mail filtering system

SYNOPSIS

       ifile [-b file] [-q|-Q] [-g] [-k] [-o] [-v num] [lexing options] file ...
       ifile -c -q|-Q [-T threshold] [-b file] [-g] [-k] [-o] [lexing options] file ...
       ifile [-b file] [-d folder] [-i folder|-u folder] [-g] [-k] [-o] [-v num] [lexing options] file ...
       ifile -r [-b file]

DESCRIPTION

       ifile is a mail filter client that uses machine learning to classify e-mail into folders/mail boxes.  The
       algorithm  that  it  uses  is  called  Naive  Bayes.    Basically, naive bayes considers each document an
       unordered collection of words and classifies by matching the document distribution with the most  closely
       matching folder/mailbox distribution.

OPTIONS

       -b, --db-file=file
              Location to read/store ifile database.  Default is ~/.idata

       -c, --concise
              equivalent of "ifile -v 0 | head -1 | cut -f1 -d".  Must be used with -q or -Q.

       -d, --delete=folder
              Delete the statistics for each of files from the category folder

       -f, --folder-calcs=folder
              Show the word-probability calculations for folder

       -g, --log-file
              Create and store debugging information in ~/.ifile.log

       -i, --insert=folder
              Add the statistics for each of the files to the category folder

       -k, --keep-infrequent
              Leave in the database words that occur infrequently (normally they are tossed)

       -l, --query-loocv=folder
              For  each  of  the  files, temporarily removes file from folder, performs query and then reinserts
              file in folder.  Database is not modified.

       -o, --occur
              Uses document bit-vector representation.  Count each word once per document.

       -q, --query
              Output rating scores for each of the files

       -Q, --query-insert
              For each of the files, output rating scores and add statistics for the  folder  with  the  highest
              score

       -T, --threshold=threshold
              When used with both -c and -q, output the two highest ranking categories if their score differs by
              at  most  threshold  / 1000, which can be used to detect border cases.  When used with -q only and
              any threshold > 0, output the score difference percentage.  For example,
                     ifile -T1 -q foo.txt
              might result in
                     spam -15570.48640776
                     non-spam -18728.00272369
                     diff[spam,non-spam](%) 9.21
              If so, then
                     ifile -T93 -q -c foo.txt
              will result in
                     foo.txt spam,non-spam
              whereas
                     ifile -T92 -q -c foo.txt
              will result in
                     foo.txt spam

       -r, --reset-data
              Erases all currently stored information

       -u, --update=folder
              Same as 'insert' except only adds stats if folder already exists

       -v, --verbosity=num
              Amount of output while running: 0=silent, 1=quiet, 2=progress, 3=verbose, 4=debug

       Lexing options:

       -a, --alpha-lexer
              Lex words as sequences of alphabetic characters (default)

       -A, --alpha-only-lexer
              Only lex space-separated character sequences which are composed entirely of alphabetic characters

       -h, --strip-header
              Skip all of the header lines except Subject:, From: and To:

       -m, --max-length=char
              Ignore portion of message after first char characters.  Use entire  message  if  char  set  to  0.
              Default is 50,000.

       -p, --print-tokens
              Just tokenize and print, don't do any other processing.  Documents are returned as a list of word,
              frequency pairs.

       -s, --no-stoplist
              Do not throw out overly frequent (stoplist) words when lexing

       -S, --stemming
              Use 'Porter' stemming algorithm when lexing documents

       -w, --white-lexer
              Lex words as sequences of space separated characters

       If no files are specified on the command line, ifile will use standard input as its message to process.

       -?, --help
              Give this help list

       --usage
              Give a short usage message

       -V, --version
              Print program version

       Mandatory  or  optional  arguments  to  long options are also mandatory or optional for any corresponding
       short options.

FILES

       ~/.idata
              ifile database (default location).  See FAQ included in ifile package for description of  database
              format.

AUTHOR

       Jason Rennie <jrennie@csail.mit.edu> and many others.  See the ChangeLog for the full list.

EXAMPLES

       Before  using  ifile,  you  need to train it.  Let's say that you have three folders, "spam", "ifile" and
       "friends", and the following directory structure:

              /--+--spam----+--1
                 |          +--2
                 |          +--3
                 |
                 +--ifile---+--1
                 |          +--2
                 |          +--3
                 |
                 +--friends-+--1
                            +--2
                            +--3

       The following commands build the ifile database in ~/.idata (use the -d option  to  specify  a  different
       location for the database):

              ifile -h -i spam /spam/*
              ifile -h -i ifile /ifile/*
              ifile -h -i friends /friends/*

       The  -h option strips off headers besides "Subject:", "From:" and "To:".  I find that -h improves ifile's
       performance, but you may find otherwise for your personal collection.

       Note that we have made the argument to -i the  same  as  the  corresponding  folder  name.  This  is  not
       necessary.  The  argument  to  -i  can be any word you want to use to identify a category of e-mails. The
       argument to -i must not include space characters (including tab, feedline, etc.).

       At this point, your ~/.idata file should look something like this:

              spam ifile friends
              662 1020 6451
              3 3 3
              jrennie 9 0:3 1:18 2:16
              mindspring 6 1:7 2:5
              make 9 0:5 1:3
              yahoo 9 0:1 1:22 2:2

       The first line is the space-separated list of folders. Their  ordering  specifies  a  numbering  (spam=0,
       ifile=1,  friends=2).  The  second line is a token count for each folder (e.g. 662 tokens observed in the
       three spam messages). The third line is an e-mail count for each folder (e.g. 3 e-mails for each of spam,
       ifile and friends). Each following line specifies statistics for a word. The format of a line is

              word age folder:count [folder:count ...]

       where folder is the folder number determined by the first line ordering. Folders with a count of zero are
       not listed. So, the line beginning with "jrennie" indicates that "jrennie" appeared 3 times in "spam"  e-
       mails,  18  times  in "ifile" e-mails and 16 times in "friends" e-mails. The age is the number of e-mails
       that have been processed since the word was added to the database. Very infrequent words are pruned  from
       the database to keep the database size down.

       Now  that you have a database, you might want to filter some e-mails. Say you have the following incoming
       e-mails:

              /--inbox--+--1
                        +--2
                        +--3

       To find out what folders ifile thinks these e-mails belong in, run

              ifile -c -q /inbox/1
              ifile -c -q /inbox/2
              ifile -c -q /inbox/3

       Let's say that 1 is about ifile, 2 is spam and 3 is from a friend. Assuming ifile does its job correctly,
       you'll see output like this:

              /inbox/1 ifile
              /inbox/2 spam
              /inbox/3 friends

       With such little training data, ifile is unlikely to get the labels correct, but you should get the  idea
       :-)

       Now,  if  you  move  the  e-mails  to  the folders suggested by ifile, you'll want to update the database
       accordingly. You can do this with the -i option, like before. Or, you can simply use -Q in  place  of  -q
       above. This automatically adds the e-mail to the folder ifile suggests.

       Now,  assume for a moment that e-mail 1 was actually spam. We've added 1 to ifile and put it in the ifile
       folder. We need to move it to the spam folder and update the ifile database accordingly.  We  can  update
       the database with the following command:

              ifile -d ifile -i spam /inbox/1

       This deletes the e-mail from "ifile" and adds it to "spam".

SEE ALSO

       Examples  of  how  to  use  ifile together with procmail(1) and metamail(1) can be found in the directory
       /usr/share/doc/ifile/examples.

ifile 1.3.4                                       November 2004                                         IFILE(1)