Provided by: liblingua-en-tagger-perl_0.31-3_all bug

NAME

       Lingua::EN::Tagger - Part-of-speech tagger for English natural language processing.

SYNOPSIS

           # Create a parser object
           my $p = new Lingua::EN::Tagger;

           # Add part of speech tags to a text
           my $tagged_text = $p->add_tags($text);

           ...

           # Get a list of all nouns and noun phrases with occurrence counts
           my %word_list = $p->get_words($text);

           ...

           # Get a readable version of the tagged text
           my $readable_text = $p->get_readable($text);

DESCRIPTION

       The module is a probability based, corpus-trained tagger that assigns POS tags to English text based on a
       lookup dictionary and a set of probability values.  The tagger assigns appropriate tags based on
       conditional probabilities - it examines the preceding tag to determine the appropriate tag for the
       current word.  Unknown words are classified according to word morphology or can be set to be treated as
       nouns or other parts of speech.

       The tagger also extracts as many nouns and noun phrases as it can, using a set of regular expressions.

CONSTRUCTOR

       new %PARAMS
           Class constructor.  Takes a hash with the following parameters (shown with default values):

           unknown_word_tag => ''
               Tag to assign to unknown words

           stem => 0
               Stem single words using Lingua::Stem::EN

           weight_noun_phrases => 0
               When  returning occurrence counts for a noun phrase, multiply the value by the number of words in
               the NP.

           longest_noun_phrase => 5
               Will ignore noun phrases longer than this  threshold.  This  affects  only  the  get_words()  and
               get_nouns() methods.

           relax => 0
               Relax  the  Hidden Markov Model: this may improve accuracy for uncommon words, particularly words
               used polysemously

METHODS

       add_tags TEXT
           Examine the string provided and return it fully tagged (XML style)

       add_tags_incrementally TEXT
           Examine the string provided and return it fully tagged (XML style) but  do  not  reset  the  internal
           part-of-speech state between invocations.

       get_words TEXT
           Given  a  text  string,  return  as  many  nouns  and noun phrases as possible.  Applies add_tags and
           involves three stages:

               * Tag the text
               * Extract all the maximal noun phrases
               * Recursively extract all noun phrases from the MNPs

       get_readable TEXT
           Return an easy-on-the-eyes tagged version of a text string.  Applies add_tags  and  reformats  to  be
           easier to read.

       get_sentences TEXT
           Returns an anonymous array of sentences (without POS tags) from a text.

       get_proper_nouns TAGGED_TEXT
           Given  a  POS-tagged  text,  this  method  returns  a  hash  of all proper nouns and their occurrence
           frequencies. The method is greedy and will return multi-word phrases, if possible, so it  would  find
           ``Linguistic  Data  Consortium'' as a single unit, rather than as three individual proper nouns. This
           method does not stem the found words.

       get_nouns TAGGED_TEXT
           Given a POS-tagged text, this method returns all nouns and their occurrence frequencies.

       get_max_noun_phrases TAGGED_TEXT
           Given a POS-tagged text, this method returns only the maximal noun phrases.  May be called  directly,
           but is also used by get_noun_phrases

       get_noun_phrases TAGGED_TEXT
           Similar to get_words, but requires a POS-tagged text as an argument.

       install
           Reads  some  included  corpus  data  and  saves it in a stored hash on the local file system. This is
           called automatically if the tagger can't find the stored lexicon.

AUTHORS

           Aaron Coburn <acoburn@apache.org>

CONTRIBUTORS

           Maciej Ceglowski <developer@ceglowski.com>
           Eric Nichols, Nara Institute of Science and Technology

COPYRIGHT AND LICENSE

           Copyright 2003-2010 Aaron Coburn <acoburn@apache.org>

           This program is free software; you can redistribute it and/or modify
           it under the terms of version 3 of the GNU General Public License as
           published by the Free Software Foundation.

perl v5.36.0                                       2022-11-27                                        Tagger(3pm)