Ubuntu Manpage: Mail::SpamAssassin::PerMsgStatus - per-message status (spam or not-spam)

NAME

       Mail::SpamAssassin::PerMsgStatus - per-message status (spam or not-spam)

SYNOPSIS

         my $spamtest = Mail::SpamAssassin->new({
           'rules_filename'      => '/etc/spamassassin.rules',
           'userprefs_filename'  => $ENV{HOME}.'/.spamassassin/user_prefs'
         });
         my $mail = $spamtest->parse();

         my $status = $spamtest->check ($mail);

         my $rewritten_mail;
         if ($status->is_spam()) {
           $rewritten_mail = $status->rewrite_mail ();
         }
         ...

DESCRIPTION

       The Mail::SpamAssassin check() method returns an object of this class.  This object encapsulates all the
       per-message state.

METHODS

       $status->check ()
           Runs the SpamAssassin rules against the message pointed to by the object.

       $status->learn()
           After  a mail message has been checked, this method can be called.  If the score is outside a certain
           range around the threshold, ie. if the message is judged more-or-less definitely spam  or  definitely
           non-spam,  it  will  be  fed  into  SpamAssassin's  learning  systems  (currently  the naive Bayesian
           classifier), so that future similar mails will be caught.

       $score = $status->get_autolearn_points()
           Return the message's score as computed for auto-learning.  Certain tests are ignored:

             - rules with tflags set to 'learn' (the Bayesian rules)

             - rules with tflags set to 'userconf' (user welcome/block-listing rules, etc)

             - rules with tflags set to 'noautolearn'

           Also note that auto-learning occurs using scores from either scoreset  0  or  1,  depending  on  what
           scoreset  is  used  during  message check.  It is likely that the message check and auto-learn scores
           will be different.

       $score = $status->get_head_only_points()
           Return the message's score as computed for auto-learning, ignoring all rules except for  header-based
           ones.

       $score = $status->get_learned_points()
           Return  the  message's  score  as computed for auto-learning, ignoring all rules except for learning-
           based ones.

       $score = $status->get_body_only_points()
           Return the message's score as computed for auto-learning, ignoring all rules  except  for  body-based
           ones.

       $score = $status->get_autolearn_force_status()
           Return whether a message's score included any rules that are flagged as autolearn_force.

       $rule_names = $status->get_autolearn_force_names()
           Return  a list of comma separated list of rule names if a message's score included any rules that are
           flagged as autolearn_force.

       $isspam = $status->is_spam ()
           After a mail message has been checked, this method  can  be  called.   It  will  return  1  for  mail
           determined likely to be spam, 0 if it does not seem spam-like.

       $list = $status->get_names_of_tests_hit ()
           After  a  mail  message has been checked, this method can be called. It will return a comma-separated
           string, listing all the symbolic test names of the tests which were triggered by the mail.

       $list = $status->get_names_of_tests_hit_with_scores_hash ()
           After a mail message has been checked, this method can be called. It will return a pointer to a  hash
           for  rule & score pairs for all the symbolic test names and individual scores of the tests which were
           triggered by the mail.

       $list = $status->get_names_of_tests_hit_with_scores ()
           After a mail message has been checked, this method can be called. It will  return  a  comma-separated
           string  of  rule=score pairs for all the symbolic test names and individual scores of the tests which
           were triggered by the mail.

       $list = $status->get_names_of_subtests_hit ()
           After a mail message has been checked, this method can be called.  It will return  a  comma-separated
           string,  listing  all  the symbolic test names of the meta-rule sub-tests which were triggered by the
           mail.  Sub-tests are the normally-hidden rules, which score 0  and  have  names  beginning  with  two
           underscores, used in meta rules.

           If  a parameter of collapsed or dbg is passed, the output will be a condensed array of sub-tests with
           multiple hits reduced to one entry.

           If the parameter of dbg is passed, the output will be a condensed string of sub-tests  with  multiple
           hits  reduced  to one entry with the number of hits in parentheses. Some information is also added at
           the end regarding the multiple hits.

       $num = $status->get_score ()
           After a mail message has been checked, this method can be  called.   It  will  return  the  message's
           score.

       $num = $status->get_required_score ()
           After  a mail message has been checked, this method can be called.  It will return the score required
           for a mail to be considered spam.

       $num = $status->get_autolearn_status ()
           After a mail message has been checked, this method  can  be  called.   It  will  return  one  of  the
           following  strings  depending  on  whether  the  mail  was  auto-learned or not: "ham", "no", "spam",
           "disabled", "failed", "unavailable".

           It also returns is flagged with auto_learn_force, it will also include the status and the rules  hit.
           For example: "autolearn_force=yes (AUTOLEARNTEST_BODY)"

       $report = $status->get_report ()
           Deliver  a  "spam  report"  on  the  checked  mail  message.   This contains details of how many spam
           detection rules it triggered.

           The report is returned as a multi-line string, with the lines separated by "\n" characters.

       $preview = $status->get_content_preview ()
           Give a "preview" of the content.

           This is returned as a multi-line string, with the lines separated by "\n"  characters,  containing  a
           fully-decoded, safe, plain-text sample of the first few lines of the message body.

       $msg = $status->get_message()
           Return the object representing the message being scanned.

       $status->rewrite_mail ()
           Rewrite  the  mail  message.   This  will at minimum add headers, and at maximum MIME-encapsulate the
           message text, to reflect its spam or not-spam status.  The function  will  return  a  scalar  of  the
           rewritten message.

           The  actual  modifications  depend  on  the  configuration  (see  "Mail::SpamAssassin::Conf" for more
           information).

           The possible modifications are as follows:

           To:, From: and Subject: modification on spam mails
               Depending on the configuration, the To: and From: lines can have a user-defined RFC 2822  comment
               appended  for spam mail. The subject line may have a user-defined string prepended to it for spam
               mail.

           X-Spam-* headers for all mails
               Depending on the configuration, zero or more headers with names beginning with "X-Spam-" will  be
               added to mail depending on whether it is spam or ham.

           spam message with report_safe
               If  report_safe  is  set  to  true  (1),  then  spam  messages  are  encapsulated  into their own
               message/rfc822 MIME attachment without any modifications being made.

               If report_safe is set  to  false  (0),  then  the  message  will  only  have  the  above  headers
               added/modified.

       $status->action_depends_on_tags($tags, $code, @args)
           Enqueue  the  supplied  subroutine  reference  $code,  to become runnable when all the specified tags
           become available. The $tags may be a simple scalar - a tag name, or  a  listref  of  tag  names.  The
           subroutine  &$code  when  called  will  be  passed a "permessagestatus" object as its first argument,
           followed by the supplied (optional) list @args .

       $status->set_tag($tagname, $value)
           Set a template tag, as used in "add_header", report templates, etc.  This API is intended for use  by
           plugins.   Tag names will be converted to an all-uppercase representation internally.  Tag names must
           consist only of [A-Z0-9_] characters and must not contain consecutive  underscores.   Also  the  name
           must not start or end in an underscore, as that is the template tagging format.

           $value  can  be  a  simple  scalar  (string or number), or a reference to an array, in which case the
           public method get_tag will join array elements using a space  as  a  separator,  returning  a  single
           string for backward compatibility.

           $value  can  also  be  a  subroutine  reference,  which  will  be evaluated each time the template is
           expanded. The first argument passed by get_tag to a called subroutine will be a  PerMsgStatus  object
           (this module's object), followed by optional arguments provided by a caller to get_tag.

           Note  that  perl  supports  closures,  which  means  that  variables set in the caller's scope can be
           accessed inside this "sub". For example:

               my $text = "hello world!";
               $status->set_tag("FOO", sub {
                         my $pms = shift;
                         return $text;
                       });

           See "Mail::SpamAssassin::Conf"'s "TEMPLATE TAGS"  and  "CAPTURING  TAGS  USING  REGEX  NAMED  CAPTURE
           GROUPS" sections for more details on how template tags are used.

       $string = $status->get_tag($tagname)
           Get  the current value of a template tag, as used in "add_header", report templates, etc. This API is
           intended for use by plugins.   Tag  names  will  be  converted  to  an  all-uppercase  representation
           internally.

           See  "Mail::SpamAssassin::Conf"'s  "TEMPLATE  TAGS"  and  "CAPTURING  TAGS  USING REGEX NAMED CAPTURE
           GROUPS" sections for more details on how template tags are used.

           "undef" will be returned if a tag by that name has not been defined.

       $string = $status->get_tag_raw($tagname, @args)
           Similar to "get_tag", but keeps a tag name unchanged (does not uppercase it), and  does  not  convert
           arrayref tag values into a single string.

       $status->set_spamd_result_item($subref)
           Set  an  entry  for  the  spamd result log line.  $subref should be a code reference for a subroutine
           which will return a string in 'name=VALUE' format, similar to the other entries in the  spamd  result
           line:

             Jul 17 14:10:47 radish spamd[16670]: spamd: result: Y 22 - ALL_NATURAL,
             DATE_IN_FUTURE_03_06,DIET_1,DRUGS_ERECTILE,DRUGS_PAIN,
             TEST_FORGED_YAHOO_RCVD,TEST_INVALID_DATE,TEST_NOREALNAME,
             TEST_NORMAL_HTTP_TO_IP,UNDISC_RECIPS scantime=0.4,size=3138,user=jm,
             uid=1000,required_score=5.0,rhost=localhost,raddr=127.0.0.1,
             rport=33153,mid=<9PS291LhupY>,autolearn=spam

           "name"  and  "VALUE"  must not contain "=" or "," characters, as it is important that these log lines
           are easy to parse.

           The  code  reference  will  be  called  by  spamd  after  the  message  has  been  scanned,  and  the
           PerMsgStatus::check() method has returned.

       $status->finish ()
           Indicate that this $status object is finished with, and can be destroyed.

           If  you  are  using SpamAssassin in a persistent environment, or checking many mail messages from one
           "Mail::SpamAssassin" factory, this method should be called to ensure Perl's garbage  collection  will
           clean up old status objects.

       $name = $status->get_current_eval_rule_name()
           Return the name of the currently-running eval rule.  "undef" is returned if no eval rule is currently
           being  run.  Useful for plugins to determine the current rule name while inside an eval test function
           call.

       $status->get_decoded_body_text_array ()
           Returns the message body, with base64 or quoted-printable encodings decoded, and  non-text  parts  or
           non-inline attachments stripped.

           This is the same result text as used in 'rawbody' rules.

           It  is  returned as an array of strings, with each string being a 2-4kB chunk of the body, split from
           boundaries if possible.

       $status->get_decoded_stripped_body_text_array ()
           Returns the  message  body,  decoded  (as  described  in  get_decoded_body_text_array()),  with  HTML
           rendered, and with whitespace normalized.

           This is the same result text as used in 'body' rules.

           It will always render text/html.

           It is returned as an array of strings, with each string representing one 'paragraph'.  Paragraphs, in
           plain-text mails, are double-newline-separated blocks of multi-line text.

       $status->get (header_name [, default_value])
           Returns  message  headers,  pseudo-headers, names, email-addresses or some other parsed values set by
           modifiers.  "header_name" is  the  name  of  a  mail  header  such  as  'Subject',  'To'  etc,  or  a
           pseudo/metadata-header like 'ALL', 'X-Spam-Relays-Untrusted' etc.

           Should be called in list context since SpamAssassin 4.0.  This supports returning multiple values for
           all header and modifier types.

           If  called in scalar context (pre-4.0 style), only first value is returned for modifiers like ":addr"
           or ":name".

           If "default_value" is given, it will be used if the requested "header_name" does not exist.  This  is
           mainly  useful  when  called  in scalar context to set 'undef' instead of legacy '' return value when
           header does not exist.

           Appending ":raw" modifier to the header name will inhibit decoding  of  quoted-printable  or  base-64
           encoded strings.

           Appending  ":addr"  modifier  to the header name will return all email-addresses found in the header.
           It is mainly applicable to header fields 'From', 'Sender', 'To', 'Cc'  along  with  their  'Resent-*'
           counterparts,  and the 'Return-Path'.  For example, all of the following will result in "example@foo"
           (and "example@bar"):

           example@foo
           example@foo (Foo Blah), <example@bar>
           example@foo, example@bar
           display: example@foo (Foo Blah), example@bar ;
           Foo Blah <example@foo>
           "Foo Blah" <example@foo>
           "'Foo Blah'" <example@foo>

           Appending ":name" modifier to the header name will return all "display names" from the header  field.
           As  with  ":addr",  it  is mainly applicable to header fields 'From', 'Sender', 'To', 'Cc' along with
           their 'Resent-*' counterparts, and the 'Return-Path'.  For example, all of the following will  result
           in "Foo Blah" (and "Bar Baz").  One level of single quotes is stripped too, as it is often seen.

           example@foo (Foo Blah)
           example@foo (Foo Blah), "Bar Baz" <example@bargt
           display: example@foo (Foo Blah), example@bar ;
           Foo Blah <example@foo>
           "Foo Blah" <example@foo>
           "'Foo Blah'" <example@foo>

           Appending  ":host"  to the header name will return the first hostname-looking string that ends with a
           valid TLD.  First it tries to find a match after @ character (possible email), then from any part  of
           the  header.  Normal use of this would be for example 'From:addr:host' to return the hostname portion
           of a From-address.

           Appending ":domain" to the header name implies ":host", but will  return  only  domain  part  of  the
           hostname, as returned by RegistryBoundaries::trim_domain().

           Appending  ":ip"  to the header name, will return the first IPv4 or IPv6 address string found.  Could
           be used for example as 'X-Originating-IP:ip'.

           Appending ":revip" to the header name implies ":ip", but will return the found IP in reverse (usually
           for DNSBL usage).

           Appending ":first" modifier to the header name will return only the first (topmost) header,  in  case
           there are multiple ones.  Similarly ":last" will select the last one.  These affect only the physical
           header  line  selection.  If selected header is parsed further with ":addr" or similar, it may return
           multiple results, if the selected header contains multiple addresses.

           There are several special pseudo-headers that can be specified:

           "ALL" can be used to mean the text of all the message's headers. Each header is decoded and unfolded
           to single line, unless called with :raw.
           "ALL-TRUSTED" can be used to mean the text of all the message's headers that could only have been
           added by trusted relays.
           "ALL-INTERNAL" can be used to mean the text of all the message's headers that could only have been
           added by internal relays.
           "ALL-UNTRUSTED" can be used to mean the text of all the message's headers that may have been added by
           untrusted relays.  To make this pseudo-header more useful for header rules the 'Received' header that
           was added by the last trusted relay is included, even though it can be trusted.
           "ALL-EXTERNAL" can be used to mean the text of all the message's headers that may have been added by
           external relays.  Like "ALL-UNTRUSTED" the 'Received' header added by the last internal relay is
           included.
           "ToCc" can be used to mean the contents of both the 'To' and 'Cc' headers.
           "EnvelopeFrom" is the address used in the 'MAIL FROM:' phase of the SMTP transaction that delivered
           this message, if this data has been made available by the SMTP server.
           "MESSAGEID" is a symbol meaning all Message-Id's found in the message; some mailing list software
           moves the real 'Message-Id' to 'Resent-Message-Id' or 'X-Message-Id' or 'X-Original-Message-ID', then
           uses its own one in the 'Message-Id' header.  The value returned for this symbol is the text from all
           4 headers.
           "X-Spam-Relays-Untrusted" is the generated metadata of untrusted relays the message has passed
           through
           "X-Spam-Relays-Trusted" is the generated metadata of trusted relays the message has passed through
           "X-Spam-Relays-External" is the generated metadata of external relays the message has passed through
           "X-Spam-Relays-Internal" is the generated metadata of internal relays the message has passed through
       $status->get_uri_list ()
           Returns an array of all unique URIs found in the message.  It takes a combination of the  URIs  found
           in  the  rendered  (decoded  and  HTML stripped) body and the URIs found when parsing the HTML in the
           message.  Will also set $status->{uri_list} (the array as returned by this function).

           The returned array will include the "raw" URI as well as "slightly cooked"  versions.   For  example,
           the     single     URI    'http://%77&#00119;%77.example.com/'    will    get    turned    into:    (
           'http://%77&#00119;%77.example.com/', 'http://www.example.com/' )

       $status->get_uri_detail_list ()
           Returns a hash reference of all unique URIs found in the message and various  data  about  where  the
           URIs  were  found  in the message.  It takes a combination of the URIs found in the rendered (decoded
           and HTML stripped) body and the URIs found when parsing the HTML  in  the  message.   Will  also  set
           $status->{uri_detail_list} (the hash reference as returned by this function).

           The hash format looks something like this:

             raw_uri => {
               types => { a => 1, img => 1, parsed => 1, domainkeys => 1,
                          unlinked => 1, schemeless => 1 },
               cleaned => [ canonicalized_uri ],
               anchor_text => [ "click here", "no click here" ],
               domains => { domain1 => 1, domain2 => 1 },
               hosts => { host1 => domain1, host2 => domain2 },
             }

           "raw_uri"  is  whatever  the URI was in the message itself (http://spamassassin.apache%2Eorg/).  Uris
           parsed from text will be prefixed with scheme if missing (http://, mailto: etc).  HTML  uris  are  as
           found.

           "types"  is a hash of the HTML tags (lowercase) which referenced the raw_uri.  parsed is a faked type
           which specifies that the raw_uri was seen in the rendered text.  domainkeys is defined  when  raw_uri
           was found from DK/DKIM d= field.  unlinked is defined when it's assumed that MUA will not linkify uri
           (found  in  body without scheme or www. prefix).  schemeless is always added for uris without scheme,
           regardless of linkifying (i.e. email address found in body without mailto:).

           "cleaned"   is   an   array   of   the   raw   and   canonicalized    version    of    the    raw_uri
           (http://spamassassin.apache%2Eorg/, https://spamassassin.apache.org/).

           "anchor_text" is an array of the anchor text (text between <a> and </a>), if any, which linked to the
           URI.

           "domains" is a hash of the domains found in the canonicalized URIs.

           "hosts"  is  a  hash of unstripped hostnames found in the canonicalized URIs as hash keys, with their
           domain part stored as a value of each hash entry.

       $status->add_uri_detail_list ($raw_uri, $types, $source, $valid_domain)
           Adds values  to  internal  uri_detail_list.   When  used  from  Plugins,  recommended  to  call  from
           parsed_metadata    (along    with   register_method_priority,   -10)   so   other   Plugins   calling
           get_uri_detail_list() will see it.

           "raw_uri" is the URI to be added. The only required parameter.

           "types"  is  an  optional  hash  reference,  contents  are  added  to  uri_detail_list->{types}  (see
           get_uri_detail_list  for  known  keys).   parsed  is  default is no hash given.  nocanon does not run
           uri_list_canonicalize (no redirector, uri fixing).  noclean skips adding  uri_detail_list->{cleaned},
           so  it  would  not  be  used in "uri" rule checks, but domain/hosts would still be used for URIBL/RBL
           purposes.

           "source" is an optional simple string, only used for debug logging purposes  to  identify  where  uri
           originates from (default: "parsed").

           "valid_domain"  is  an optional boolean (0/1).  If true, uri will not be added unless hostname/domain
           is in valid format and contains a valid TLD.  (default: 0)

       $status->clear_test_state()
           DEPRECATED, UNNEEDED SINCE 4.0

       $status->got_hit ($rulename, $desc_prepend [, name => value, ...])
           Register a hit against a rule in the ruleset.

           There are two mandatory arguments. These are  $rulename,  the  name  of  the  rule  that  fired,  and
           $desc_prepend,  which  is  a  short  string  that will be prepended to the rules "describe" string in
           output reports.

           In addition, callers can supplement that with the following optional data:

           score => $num
               Optional:  the  score  to  use  for  the  rule  hit.   If  unspecified,  the   value   from   the
               "Mail::SpamAssassin::Conf" object's "{scores}" hash will be used (a configured score), and in its
               absence the "defscore" option value.

           defscore => $num
               Optional:  the  score  to  use  for the rule hit if neither the option "score" is provided, nor a
               configured score value is provided.

           value => $num
               Optional: the value to assign to the rule; the default value is 1.   tflags  multiple  rules  use
               values of greater than 1 to indicate multiple hits.  This value is accessible to meta rules.

           ruletype => $type
               Optional,  but  recommended:  the  rule type string.  This is used in the "hit_rule" plugin call,
               called by this method.  If unset, 'unknown' is used.

           tflags => $string
               Optional: a string, i.e. a space-separated list  of  additional  tflags  to  be  appended  to  an
               existing  list  of  flags  in  $self->{conf}->{tflags},  such as: "nice noautolearn multiple". No
               syntax checks are performed.

           description => $string
               Optional: a custom rule description string.  This is used in the "hit_rule" plugin  call,  called
               by this method. If unset, the static description is used.

           Backward  compatibility:  the  two  mandatory arguments have been part of this API since SpamAssassin
           2.x. The optional "name=>value" pairs, however, are a new addition in SpamAssassin 3.2.0.

       $status->rule_ready ($rulename [, $no_async])
           Mark an asynchronous rule ready, so it can be considered for meta rule evaluation.  Asynchronous rule
           is a rule whose eval-function returns undef, marking that  it's  not  ready  yet,  expecting  results
           later.  $status->rule_ready() must be called later to mark it ready, alternatively $status->got_hit()
           also  does  this.   If  neither  is  called,  then  any meta rule that depends on this rule might not
           evaluate.

           Optional boolean $no_async skips checking if there are pending async DNS lookups for the rule.

       $status->test_log ($text [, $rulename])
           Add $text log entry for a hit rule in final message REPORT/SUMMARY.

           Usually called just before got_hit(), to describe for example what URI the rule matched on.  Optional
           $rulename argument is recommended to make sure log is written to correct rule.  If  rulename  is  not
           provided, get_current_eval_rule_name() is used as fallback.

           Can be called multiple times per rule for additional entries.

       $status->create_fulltext_tmpfile (fulltext_ref)
           This  function creates a temporary file containing the passed scalar reference data.  If no scalar is
           passed, full/pristine message text is assumed.  This is typically  used  by  external  programs  like
           pyzor and dccproc, to avoid hangs due to buffering issues.

           All tempfiles are automatically cleaned up by PerMsgStatus destructor.

       $status->delete_fulltext_tmpfile (tmpfile)
           Will  cleanup  after  a  $status->create_fulltext_tmpfile()  call.   Deletes  the  temporary file and
           uncaches the filename.  Generally there no need to call this, PerMsgStatus destructor cleans  up  all
           tmpfiles.

       all_from_addrs_domains
           This  function  returns  all  the various from addresses in a message using all_from_addrs() and then
           returns only the domain names.

NAME

SYNOPSIS

DESCRIPTION

METHODS

SEE ALSO