Provided by: libstring-tagged-perl_0.17-1_all bug

NAME

       "String::Tagged" - string buffers with value tags on extents

SYNOPSIS

        use String::Tagged;

        my $st = String::Tagged->new( "An important message" );

        $st->apply_tag( 3, 9, bold => 1 );

        $st->iter_substr_nooverlap(
           sub {
              my ( $substring, %tags ) = @_;

              print $tags{bold} ? "<b>$substring</b>"
                                : $substring;
           }
        );

DESCRIPTION

       This module implements an object class, instances of which store a (mutable) string buffer that supports
       tags. A tag is a name/value pair that applies to some non-empty extent of the underlying string.

       The types of tag names ought to be strings, or at least values that are well-behaved as strings, as the
       names will often be used as the keys in hashes or applied to the "eq" operator.

       The types of tag values are not restricted - any scalar will do. This could be a simple integer or
       string, ARRAY or HASH reference, or even a CODE reference containing an event handler of some kind.

       Tags may be arbitrarily overlapped. Any given offset within the string has in effect, a set of uniquely
       named tags. Tags of different names are independent.  For tags of the same name, only the latest,
       shortest tag takes effect.

       For example, consider a string with three tags represented here:

        Here is my string with tags
        [-------------------------]  foo => 1
                [-------]            foo => 2
             [---]                   bar => 3

       Every character in this string has a tag named "foo". The value of this tag is 2 for the words "my" and
       "string" and the space inbetween, and 1 elsewhere. Additionally, the words "is" and "my" and the space
       between them also have the tag "bar" with a value 3.

       Since "String::Tagged" does not understand the significance of the tag values it therefore cannot detect
       if two neighbouring tags really contain the same semantic idea. Consider the following string:

        A string with words
        [-------]            type => "message"
                 [--------]  type => "message"

       This string contains two tags. "String::Tagged" will treat this as two different tag values as far as
       "iter_tags_nooverlap" is concerned, even though "get_tag_at" yields the same value for the "type" tag at
       any position in the string. The "merge_tags" method may be used to merge tag extents of tags that should
       be considered as equal.

NAMING

       I spent a lot of time considering the name for this module. It seems that a number of people across a
       number of languages all created similar functionality, though named very differently. For the benefit of
       keyword-based search tools and similar, here's a list of some other names this sort of object might be
       known by:

       •   Extents

       •   Overlays

       •   Attribute or attributed strings

       •   Markup

       •   Out-of-band data

CONSTRUCTOR

   new
          $st = String::Tagged->new( $str )

       Returns  a  new  instance  of  a  "String::Tagged" object. It will contain no tags.  If the optional $str
       argument is supplied, the string buffer will be initialised from this value.

       If $str is a "String::Tagged" object then it will be cloned, as if calling the "clone" method on it.

   new_tagged
          $st = String::Tagged->new_tagged( $str, %tags )

       Shortcut for creating a new "String::Tagged" object with the given tags applied to the entire length. The
       tags will not be anchored at either end.

   clone (class)
          $new = String::Tagged->clone( $orig, %opts )

       Returns a new instance of "String::Tagged" made by cloning the original, subject to the options provided.
       The returned instance will be in the requested class, which need not match the class of the original.

       The following options are recognised:

       only_tags => ARRAY
           If present, gives an ARRAY reference containing tag names. Only those tags named here will be copied;
           others will be ignored.

       except_tags => ARRAY
           If present, gives an ARRAY reference containing tag names. All tags will be copied except those named
           here.

       convert_tags => HASH
           If present, gives a HASH reference containing tag conversion functions. For any tags in the  original
           to  be  copied  whose  names appear in the hash, the name and value are passed into the corresponding
           function, which should return an even-sized key/value list giving a tag, or a list of tags, to  apply
           to the new clone.

            my @new_tags = $convert_tags->{$orig_name}->( $orig_name, $orig_value )
            # Where @new_tags is ( $new_name, $new_value, $new_name_2, $new_value_2, ... )

           As  a  further  convenience,  if  the  value for a given tag name is a plain string instead of a code
           reference, it gives the new name for the tag, and will be applied with its existing value.

           If "only_tags" is being used too, then the source names of any tags to  be  converted  must  also  be
           listed there, or they will not be copied.

   clone (instance)
          $new = $orig->clone( %args )

       Called  as  an  instance  (rather than a class) method, the newly-cloned instance is returned in the same
       class as the original.

   from_sprintf
          $str = String::Tagged->from_sprintf( $format, @args )

       Since version 0.15.

       Returns a new instance of a "String::Tagged" object, initialised by  formatting  the  supplied  arguments
       using the supplied format.

       The  $format  string  is  similar to that supported by the core "sprintf" operator, though a few features
       such as out-of-order argument indexing and vector formatting are missing. This format  string  may  be  a
       plain  perl  string,  or  an  instance  of  "String::Tagged".  In the latter case, any tags within it are
       preserved in the result.

       In the case of a %s conversion, the value of the argument  consumed  may  itself  be  a  "String::Tagged"
       instance. In this case it will be appended to the returned object, preserving any tags within it.

       All other conversions are handled individually by the core "sprintf" operator and appended to the result.

   join
          $str = String::Tagged->join( $sep, @parts )

       Since version 0.17.

       Returns  a new instance of a "String::Tagged" object, formed by concatenating each of the component piece
       together, joined with the separator string.

       The result will be much like the core "join" function, except that it will preserve tags in the resulting
       string.

METHODS

   str
          $str = $st->str

          $str = "$st"

       Returns the plain string contained within the object.

       This method is also called for stringification; so the "String::Tagged" object can be  used  in  a  plain
       string interpolation such as

        my $message = String::Tagged->new( "Hello world" );
        print "My message is $message\n";

   length
          $len = $st->length

          $len = length( $st )

       Returns  the  length  of the plain string. Because stringification works on this object class, the normal
       core "length" function works correctly on it.

   substr
          $str = $st->substr( $start, $len )

       Returns a "String::Tagged" instance representing a section from within the given string,  containing  all
       the same tags at the same conceptual positions.

   plain_substr
          $str = $st->plain_substr( $start, $len )

       Returns as a plain perl string, the substring at the given position. This will be the same string data as
       returned by "substr", only as a plain string without the tags

   apply_tag
          $st->apply_tag( $start, $len, $name, $value )

       Apply  the  named tag value to the given extent. The tag will start on the character at the $start index,
       and continue for the next $len characters.

       If $start is given as -1, the tag will be considered to start "before" the  actual  string.  If  $len  is
       given as -1, the tag will be considered to end "after" end of the actual string. These special limits are
       used  by  "set_substr"  when  deciding  whether  to move a tag boundary. The start of any tag that starts
       "before" the string is never moved, even if more text is inserted at  the  beginning.  Similarly,  a  tag
       which ends "after" the end of the string, will continue to the end even if more text is appended.

       This method returns the $st object.

          $st->apply_tag( $e, $name, $value )

       Alternatively, an existing extent object can be passed as the first argument instead of two integers. The
       new tag will apply at the given extent.

   unapply_tag
          $st->unapply_tag( $start, $len, $name )

       Unapply  the  named  tag  value  from  the  given extent. If the tag extends beyond this extent, then any
       partial fragment of the tag will be left in the string.

       This method returns the $st object.

          $st->unapply_tag( $e, $name )

       Alternatively, an existing extent object can be passed as the first argument instead of two integers.

   delete_tag
          $st->delete_tag( $start, $len, $name )

       Delete the named tag within the given extent. Entire tags are removed, even if they  extend  beyond  this
       extent.

       This method returns the $st object.

          $st->delete_tag( $e, $name )

       Alternatively, an existing extent object can be passed as the first argument instead of two integers.

   merge_tags
          $st->merge_tags( $eqsub )

       Merge neighbouring or overlapping tags of the same name and equal values.

       For  each  pair  of  tags  of the same name that apply on neighbouring or overlapping extents, the $eqsub
       callback is called, as

         $equal = $eqsub->( $name, $value_a, $value_b )

       If this function returns true then the tags are merged.

       The equallity test function is free to perform any comparison of the values that may be relevant  to  the
       application;  for  example  it  may  deeply compare referred structures and check for equivalence in some
       application-defined manner. In this case, the first tag of a pair is retained,  the  second  is  deleted.
       This may be relevant if the tag value is a reference to some object.

   iter_extents
          $st->iter_extents( $callback, %opts )

       Iterate  the  tags  stored  in the string. For each tag, the CODE reference in $callback is invoked once,
       being passed an extent object that represents the extent of the tag.

        $callback->( $extent, $tagname, $tagvalue )

       Options passed in %opts may include:

       start => INT
           Start at the given position; defaults to 0.

       end => INT
           End after the given position; defaults to end of string. This option overrides "len".

       len => INT
           End after the given length beyond the start position; defaults to end of  string.  This  option  only
           applies if "end" is not given.

       only => ARRAY
           Select only the tags named in the given ARRAY reference.

       except => ARRAY
           Select all the tags except those named in the given ARRAY reference.

   iter_tags
          $st->iter_tags( $callback, %opts )

       Iterate  the  tags  stored  in the string. For each tag, the CODE reference in $callback is invoked once,
       being passed the start point and length of the tag.

        $callback->( $start, $length, $tagname, $tagvalue )

       Options passed in %opts are the same as for "iter_extents".

   iter_extents_nooverlap
          $st->iter_extents_nooverlap( $callback, %opts )

       Iterate non-overlapping extents of tags stored in the string. The CODE reference in $callback is  invoked
       for each extent in the string where no tags change. The entire set of tags active in that extent is given
       to the callback. Because the extent covers possibly-multiple tags, it will not define the "anchor_before"
       and "anchor_after" flags.

        $callback->( $extent, %tags )

       The  callback  will  be  invoked over the entire length of the string, including any extents with no tags
       applied.

       Options may be passed in %opts to control the range of the string iterated over, in the same way  as  the
       "iter_extents" method.

       If  the "only" or "except" filters are applied, then only the tags that survive filtering will be present
       in the %tags hash. Tags that are excluded by the filtering will not be present, nor will their bounds  be
       used to split the string into extents.

   iter_tags_nooverlap
          $st->iter_tags_nooverlap( $callback, %opts )

       Iterate  extents  of  the string using "iter_extents_nooverlap", but passing the start and length of each
       extent to the callback instead of the extent object.

        $callback->( $start, $length, %tags )

       Options may be passed in %opts to control the range of the string iterated over, in the same way  as  the
       "iter_extents" method.

   iter_substr_nooverlap
          $st->iter_substr_nooverlap( $callback, %opts )

       Iterate  extents  of the string using "iter_extents_nooverlap", but passing the substring of data instead
       of the extent object.

        $callback->( $substr, %tags )

       Options may be passed in %opts to control the range of the string iterated over, in the same way  as  the
       "iter_extents" method.

   tagnames
          @names = $st->tagnames

       Returns the set of tag names used in the string, in no particular order.

   get_tags_at
          $tags = $st->get_tags_at( $pos )

       Returns a HASH reference of all the tag values active at the given position.

   get_tag_at
          $value = $st->get_tag_at( $pos, $name )

       Returns the value of the named tag at the given position, or "undef" if the tag is not applied there.

   get_tag_extent
          $extent = $st->get_tag_extent( $pos, $name )

       If  the  named  tag  applies to the given position, returns the extent of the tag at that position. If it
       does not, "undef" is returned.  If  an  extent  is  returned  it  will  define  the  "anchor_before"  and
       "anchor_after" flags if appropriate.

   get_tag_missing_extent
          $extent = $st->get_tag_missing_extent( $pos, $name )

       If  the  named  tag  does  not  apply at the given position, returns the extent of the string around that
       position that does not have the tag. If it does exist, "undef" is returned. If an extent is  returned  it
       will not define the "anchor_before" and "anchor_after" flags, as these do not make sense for the range in
       which a tag is absent.

   set_substr
          $st->set_substr( $start, $len, $newstr )

       Modifies  a  extent  of  the underlying plain string to that given. The extents of tags in the string are
       adjusted to cope with the modified region, and the adjustment in length.

       Tags entirely before the replaced extent remain unchanged.

       Tags entirely within the replaced extent are deleted.

       Tags entirely after the replaced extent are moved by appropriate amount to ensure they still apply to the
       same characters as before.

       Tags that start before and end after the extent remain, and have their lengths suitably adjusted.

       Tags that span just the start or end of the extent, but not both, are truncated, so as to remove the part
       of the tag applied on the modified extent but preserving that applied outside.

       If $newstr is a "String::Tagged" object, then its tags will be  applied  to  $st  as  appropriate.  Edge-
       anchored  tags  in  $newstr  will not be extended through $st, though they will apply as edge-anchored if
       they now sit at the edge of the new string.

   insert
          $st->insert( $start, $newstr )

       Insert the given string at the given position. A shortcut around "set_substr".

       If $newstr is a "String::Tagged" object, then its tags will be applied to $st as appropriate.  If  $start
       is 0, any before-anchored tags in will become before-anchored in $st.

   append
          $st->append( $newstr )

          $st .= $newstr

       Append to the underlying plain string. A shortcut around "set_substr".

       If  $newstr is a "String::Tagged" object, then its tags will be applied to $st as appropriate. Any after-
       anchored tags in will become after-anchored in $st.

   append_tagged
          $st->append_tagged( $newstr, %tags )

       Append to the underlying plain string, and apply the given tags to the newly-inserted extent.

       Returns $st itself so that the method may be easily chained.

   concat
          $ret = $st->concat( $other )

          $ret = $st . $other

       Returns a new "String::Tagged" containing the two strings  concatenated  together,  preserving  any  tags
       present.   This   method  overloads  normal  string  concatenation  operator,  so  expressions  involving
       "String::Tagged" values retain their tags.

       This method or operator tries to respect subclassing; preferring to return a new object of a subclass  if
       either argument or operand is a subclass of "String::Tagged". If they are both subclasses, it will prefer
       the type of the invocant or first operand.

   matches
          @subs = $st->matches( $regexp )

       Returns a list of substrings (as "String::Tagged" instances) for every non-overlapping match of the given
       $regexp.

       This  could  be  used,  for  example,  to  build  a formatted string from a formatted template containing
       variable expansions:

        my $template = ...
        my %vars = ...

        my $ret = String::Tagged->new;
        foreach my $m ( $template->matches( qr/\$\w+|[^$]+/ ) ) {
           if( $m =~ m/^\$(\w+)$/ ) {
              $ret->append_tagged( $vars{$1}, %{ $m->get_tags_at( 0 ) } );
           }
           else {
              $ret->append( $m );
           }
        }

       This iterates segments of the template containing variables expansions starting with a  "$"  symbol,  and
       replaces  them  with  values  from  the  %vars hash, careful to preserve all the formatting tags from the
       original template string.

   split
          @parts = $st->split( $regexp, $limit )

       Returns a list of substrings by applying the regexp to the string  content;  similar  to  the  core  perl
       "split"  function.  If $limit is supplied, the method will stop at that number of elements, returning the
       entire remainder of the input string as the final element. If the $regexp contains a capture  group  then
       the content of the first one will be added to the return list as well.

   sprintf
          $ret = $st->sprintf( @args )

       Since version 0.15.

       Returns  a  new  string by using the given instance as the format string for a "from_sprintf" constructor
       call. The returned instance will be of the same class as the invocant.

   debug_sprintf
          $ret = $st->debug_sprintf

       Returns a representation of the string data and all the  tags,  suitable  for  debug  printing  or  other
       similar use. This is a format such as is given in the DESCRIPTION section above.

       The  output will consist of a number of lines, the first containing the plain underlying string, then one
       line per tag. The line shows the extent of the tag given by "[---]" markers, or a "|" in the special case
       of a tag covering only a single character. Special markings of  "<"  and  ">"  indicate  tags  which  are
       "before" or "after" anchored.

       For example:

         Hello, world
         [---]         word       => 1
        <[----------]> everywhere => 1
               |       space      => 1

Extent Objects

       These  objects  represent  a range of characters within the containing "String::Tagged" object. The range
       they represent is fixed at the time of creation. If the containing  string  is  modified  by  a  call  to
       "set_substr"  then  the effect on the extent object is not defined. These objects should be considered as
       relatively short-lived - used briefly for the purpose of  querying  the  result  of  an  operation,  then
       discarded soon after.

   $extent->string
       Returns the containing "String::Tagged" object.

   $extent->start
       Returns the start index of the extent. This is the index of the first character within the extent.

   $extent->end
       Returns  the  end  index  of  the  extent. This is the index of the first character beyond the end of the
       extent.

   $extent->anchor_before
       True if this extent begins "before" the start of the string. Only certain  methods  return  extents  with
       this flag defined.

   $extent->anchor_after
       True  if  this  extent  ends "after" the end of the string. Only certain methods return extents with this
       flag defined.

   $extent->length
       Returns the number of characters within the extent.

   $extent->substr
       Returns the substring contained by the extent.

   $extent->plain_substr
       Returns the substring of the underlying plain string buffer contained by the extent.

TODO

       •   There are likely variations on the rules for "set_substr" that could equally apply to  some  uses  of
           tagged strings. Consider whether the behaviour of modification is chosen per-method, per-tag, or per-
           string.

       •   Consider  how  to  implement  a  clone  from  one tag format to another which wants to merge multiple
           different source tags together into a single new one.

AUTHOR

       Paul Evans <leonerd@leonerd.org.uk>

perl v5.32.1                                       2021-09-18                                String::Tagged(3pm)