Ubuntu Manpage: binary - Library for handling binary data.

Provided by: erlang-manpages_25.3.2.8+dfsg-1ubuntu4.4_all

NAME

       binary - Library for handling binary data.

DESCRIPTION

       This  module  contains  functions  for  manipulating  byte-oriented  binaries.  Although  the majority of
       functions could be provided using bit-syntax, the functions in this library are highly optimized and  are
       expected  to  either  execute  faster or consume less memory, or both, than a counterpart written in pure
       Erlang.

       The module is provided according to Erlang Enhancement Proposal (EEP) 31.

   Note:
       The library handles byte-oriented data. For bitstrings that are not  binaries  (does  not  contain  whole
       octets of bits) a badarg exception is thrown from any of the functions in this module.

DATA TYPES

       cp()

              Opaque  data  type  representing  a  compiled  search pattern. Guaranteed to be a tuple() to allow
              programs to distinguish it from non-precompiled search patterns.

       part() = {Start :: integer() >= 0, Length :: integer()}

              A representation of a part (or range) in a binary. Start is a zero-based offset  into  a  binary()
              and  Length  is  the  length  of  that  part. As input to functions in this module, a reverse part
              specification is allowed, constructed with a negative Length, so  that  the  part  of  the  binary
              begins at Start + Length and is -Length long. This is useful for referencing the last N bytes of a
              binary  as  {size(Binary),  -N}.  The functions in this module always return part()s with positive
              Length.

EXPORTS

       at(Subject, Pos) -> byte()

              Types:

                 Subject = binary()
                 Pos = integer() >= 0

              Returns the byte at position Pos  (zero-based)  in  binary  Subject  as  an  integer.  If  Pos  >=
              byte_size(Subject), a badarg exception is raised.

       bin_to_list(Subject) -> [byte()]

              Types:

                 Subject = binary()

              Same as bin_to_list(Subject, {0,byte_size(Subject)}).

       bin_to_list(Subject, PosLen) -> [byte()]

              Types:

                 Subject = binary()
                 PosLen = part()

              Converts  Subject  to  a  list of byte()s, each representing the value of one byte. part() denotes
              which part of the binary() to convert.

              Example:

              1> binary:bin_to_list(<<"erlang">>, {1,3}).
              "rla"
              %% or [114,108,97] in list notation.

              If PosLen in any way references outside the binary, a badarg exception is raised.

       bin_to_list(Subject, Pos, Len) -> [byte()]

              Types:

                 Subject = binary()
                 Pos = integer() >= 0
                 Len = integer()

              Same as bin_to_list(Subject, {Pos, Len}).

       compile_pattern(Pattern) -> cp()

              Types:

                 Pattern = binary() | [binary()]

              Builds an internal structure representing a compilation of a search pattern, later to be  used  in
              functions  match/3,  matches/3,  split/3,  or  replace/4.  The cp() returned is guaranteed to be a
              tuple() to allow programs to distinguish it from non-precompiled search patterns.

              When a list of binaries is specified, it denotes a set of alternative binaries to search for.  For
              example,  if  [<<"functional">>,<<"programming">>]  is  specified  as  Pattern,  this means either
              <<"functional">> or <<"programming">>". The pattern is a set of alternatives; when only  a  single
              binary  is  specified, the set has only one element. The order of alternatives in a pattern is not
              significant.

              The list of binaries used for search alternatives must be flat and proper.

              If Pattern is not a binary or a flat proper list of binaries with length > 0, a  badarg  exception
              is raised.

       copy(Subject) -> binary()

              Types:

                 Subject = binary()

              Same as copy(Subject, 1).

       copy(Subject, N) -> binary()

              Types:

                 Subject = binary()
                 N = integer() >= 0

              Creates a binary with the content of Subject duplicated N times.

              This  function always creates a new binary, even if N = 1. By using copy/1 on a binary referencing
              a larger binary, one can free up the larger binary for garbage collection.

          Note:
              By deliberately copying a single binary to avoid referencing a larger binary, one can, instead  of
              freeing  up  the  larger  binary  for  later garbage collection, create much more binary data than
              needed. Sharing binary data is usually good. Only in special cases,  when  small  parts  reference
              large binaries and the large binaries are no longer used in any process, deliberate copying can be
              a good idea.

              If N < 0, a badarg exception is raised.

       decode_unsigned(Subject) -> Unsigned

              Types:

                 Subject = binary()
                 Unsigned = integer() >= 0

              Same as decode_unsigned(Subject, big).

       decode_unsigned(Subject, Endianness) -> Unsigned

              Types:

                 Subject = binary()
                 Endianness = big | little
                 Unsigned = integer() >= 0

              Converts the binary digit representation, in big endian or little endian, of a positive integer in
              Subject to an Erlang integer().

              Example:

              1> binary:decode_unsigned(<<169,138,199>>,big).
              11111111

       encode_unsigned(Unsigned) -> binary()

              Types:

                 Unsigned = integer() >= 0

              Same as encode_unsigned(Unsigned, big).

       encode_unsigned(Unsigned, Endianness) -> binary()

              Types:

                 Unsigned = integer() >= 0
                 Endianness = big | little

              Converts   a  positive  integer  to  the  smallest  possible  representation  in  a  binary  digit
              representation, either big endian or little endian.

              Example:

              1> binary:encode_unsigned(11111111, big).
              <<169,138,199>>

       encode_hex(Bin) -> Bin2

              Types:

                 Bin = binary()
                 Bin2 = <<_:_*16>>

              Encodes a binary into a hex encoded binary.

              Example:

              1> binary:encode_hex(<<"f">>).
              <<"66">>

       decode_hex(Bin) -> Bin2

              Types:

                 Bin = <<_:_*16>>
                 Bin2 = binary()

              Decodes a hex encoded binary into a binary.

              Example

              1> binary:decode_hex(<<"66">>).
              <<"f">>

       first(Subject) -> byte()

              Types:

                 Subject = binary()

              Returns the first byte of binary Subject as an integer. If the size of Subject is zero,  a  badarg
              exception is raised.

       last(Subject) -> byte()

              Types:

                 Subject = binary()

              Returns  the  last  byte of binary Subject as an integer. If the size of Subject is zero, a badarg
              exception is raised.

       list_to_bin(ByteList) -> binary()

              Types:

                 ByteList = iolist()

              Works exactly as erlang:list_to_binary/1, added for completeness.

       longest_common_prefix(Binaries) -> integer() >= 0

              Types:

                 Binaries = [binary()]

              Returns the length of the longest common prefix of the binaries in list Binaries.

              Example:

              1> binary:longest_common_prefix([<<"erlang">>, <<"ergonomy">>]).
              2
              2> binary:longest_common_prefix([<<"erlang">>, <<"perl">>]).
              0

              If Binaries is not a flat list of binaries, a badarg exception is raised.

       longest_common_suffix(Binaries) -> integer() >= 0

              Types:

                 Binaries = [binary()]

              Returns the length of the longest common suffix of the binaries in list Binaries.

              Example:

              1> binary:longest_common_suffix([<<"erlang">>, <<"fang">>]).
              3
              2> binary:longest_common_suffix([<<"erlang">>, <<"perl">>]).
              0

              If Binaries is not a flat list of binaries, a badarg exception is raised.

       match(Subject, Pattern) -> Found | nomatch

              Types:

                 Subject = binary()
                 Pattern = binary() | [binary()] | cp()
                 Found = part()

              Same as match(Subject, Pattern, []).

       match(Subject, Pattern, Options) -> Found | nomatch

              Types:

                 Subject = binary()
                 Pattern = binary() | [binary()] | cp()
                 Found = part()
                 Options = [Option]
                 Option = {scope, part()}
                 part() = {Start :: integer() >= 0, Length :: integer()}

              Searches for the first occurrence of Pattern in Subject and returns the position and length.

              The function returns {Pos, Length} for the binary in Pattern, starting at the lowest  position  in
              Subject.

              Example:

              1> binary:match(<<"abcde">>, [<<"bcde">>, <<"cd">>],[]).
              {1,4}

              Even  though  <<"cd">>  ends before <<"bcde">>, <<"bcde">> begins first and is therefore the first
              match. If two overlapping matches begin at the same position, the longest is returned.

              Summary of the options:

                {scope, {Start, Length}}:
                  Only the specified part is searched. Return values still have offsets from  the  beginning  of
                  Subject. A negative Length is allowed as described in section Data Types in this manual.

              If none of the strings in Pattern is found, the atom nomatch is returned.

              For a description of Pattern, see function compile_pattern/1.

              If  {scope, {Start,Length}} is specified in the options such that Start > size of Subject, Start +
              Length < 0 or Start + Length > size of Subject, a badarg exception is raised.

       matches(Subject, Pattern) -> Found

              Types:

                 Subject = binary()
                 Pattern = binary() | [binary()] | cp()
                 Found = [part()]

              Same as matches(Subject, Pattern, []).

       matches(Subject, Pattern, Options) -> Found

              Types:

                 Subject = binary()
                 Pattern = binary() | [binary()] | cp()
                 Found = [part()]
                 Options = [Option]
                 Option = {scope, part()}
                 part() = {Start :: integer() >= 0, Length :: integer()}

              As match/2, but Subject is searched until exhausted  and  a  list  of  all  non-overlapping  parts
              matching Pattern is returned (in order).

              The  first  and  longest  match  is  preferred to a shorter, which is illustrated by the following
              example:

              1> binary:matches(<<"abcde">>,
                                [<<"bcde">>,<<"bc">>,<<"de">>],[]).
              [{1,4}]

              The result shows that <<"bcde">> is selected instead of the shorter match  <<"bc">>  (which  would
              have  given  raise to one more match, <<"de">>). This corresponds to the behavior of POSIX regular
              expressions (and programs like awk), but is not consistent with alternative  matches  in  re  (and
              Perl), where instead lexical ordering in the search pattern selects which string matches.

              If none of the strings in a pattern is found, an empty list is returned.

              For  a  description of Pattern, see compile_pattern/1. For a description of available options, see
              match/3.

              If {scope, {Start,Length}} is specified in the options such that Start > size of Subject, Start  +
              Length < 0 or Start + Length is > size of Subject, a badarg exception is raised.

       part(Subject, PosLen) -> binary()

              Types:

                 Subject = binary()
                 PosLen = part()

              Extracts the part of binary Subject described by PosLen.

              A negative length can be used to extract bytes at the end of a binary:

              1> Bin = <<1,2,3,4,5,6,7,8,9,10>>.
              2> binary:part(Bin, {byte_size(Bin), -5}).
              <<6,7,8,9,10>>

          Note:
              part/2  and  part/3  are  also  available  in  the erlang module under the names binary_part/2 and
              binary_part/3. Those BIFs are allowed in guard tests.

              If PosLen in any way references outside the binary, a badarg exception is raised.

       part(Subject, Pos, Len) -> binary()

              Types:

                 Subject = binary()
                 Pos = integer() >= 0
                 Len = integer()

              Same as part(Subject, {Pos, Len}).

       referenced_byte_size(Binary) -> integer() >= 0

              Types:

                 Binary = binary()

              If a binary references a larger binary (often described as being a subbinary), it can be useful to
              get the size of the referenced binary. This function can be used in a program to trigger  the  use
              of  copy/1.  By  copying a binary, one can dereference the original, possibly large, binary that a
              smaller binary is a reference to.

              Example:

              store(Binary, GBSet) ->
                NewBin =
                    case binary:referenced_byte_size(Binary) of
                        Large when Large > 2 * byte_size(Binary) ->
                           binary:copy(Binary);
                        _ ->
                           Binary
                    end,
                gb_sets:insert(NewBin,GBSet).

              In this example, we chose to copy the binary content before inserting it in  gb_sets:set()  if  it
              references  a  binary  more  than  twice the data size we want to keep. Of course, different rules
              apply when copying to different programs.

              Binary sharing occurs whenever binaries are taken  apart.  This  is  the  fundamental  reason  why
              binaries  are  fast,  decomposition can always be done with O(1) complexity. In rare circumstances
              this data sharing is however undesirable, why this function together with  copy/1  can  be  useful
              when optimizing for memory use.

              Example of binary sharing:

              1> A = binary:copy(<<1>>, 100).
              <<1,1,1,1,1 ...
              2> byte_size(A).
              100
              3> binary:referenced_byte_size(A).
              100
              4> <<B:10/binary, C:90/binary>> = A.
              <<1,1,1,1,1 ...
              5> {byte_size(B), binary:referenced_byte_size(B)}.
              {10,10}
              6> {byte_size(C), binary:referenced_byte_size(C)}.
              {90,100}

              In the above example, the small binary B was copied while the larger binary C references binary A.

          Note:
              Binary  data  is  shared  among  processes. If another process still references the larger binary,
              copying the part this process uses only consumes more memory and  does  not  free  up  the  larger
              binary  for garbage collection. Use this kind of intrusive functions with extreme care and only if
              a real problem is detected.

       replace(Subject, Pattern, Replacement) -> Result

              Types:

                 Subject = binary()
                 Pattern = binary() | [binary()] | cp()
                 Replacement = Result = binary()

              Same as replace(Subject, Pattern, Replacement,[]).

       replace(Subject, Pattern, Replacement, Options) -> Result

              Types:

                 Subject = binary()
                 Pattern = binary() | [binary()] | cp()
                 Replacement = binary()
                 Options = [Option]
                 Option = global | {scope, part()} | {insert_replaced, InsPos}
                 InsPos = OnePos | [OnePos]
                 OnePos = integer() >= 0
                   An integer() =< byte_size(Replacement)
                 Result = binary()

              Constructs a new binary by replacing the parts in Subject matching Pattern  with  the  content  of
              Replacement.

              If  the  matching  subpart  of  Subject  giving  raise to the replacement is to be inserted in the
              result, option {insert_replaced, InsPos}  inserts  the  matching  part  into  Replacement  at  the
              specified position (or positions) before inserting Replacement into Subject.

              Example:

              1> binary:replace(<<"abcde">>,<<"b">>,<<"[]">>, [{insert_replaced,1}]).
              <<"a[b]cde">>
              2> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[]">>,[global,{insert_replaced,1}]).
              <<"a[b]c[d]e">>
              3> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[]">>,[global,{insert_replaced,[1,1]}]).
              <<"a[bb]c[dd]e">>
              4> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[-]">>,[global,{insert_replaced,[1,2]}]).
              <<"a[b-b]c[d-d]e">>

              If  any  position  specified  in  InsPos  >  size of the replacement binary, a badarg exception is
              raised.

              Options global and {scope, part()} work as for split/3. The return type is always a binary().

              For a description of Pattern, see compile_pattern/1.

       split(Subject, Pattern) -> Parts

              Types:

                 Subject = binary()
                 Pattern = binary() | [binary()] | cp()
                 Parts = [binary()]

              Same as split(Subject, Pattern, []).

       split(Subject, Pattern, Options) -> Parts

              Types:

                 Subject = binary()
                 Pattern = binary() | [binary()] | cp()
                 Options = [Option]
                 Option = {scope, part()} | trim | global | trim_all
                 Parts = [binary()]

              Splits Subject into a list of binaries based on Pattern. If option global is not  specified,  only
              the first occurrence of Pattern in Subject gives rise to a split.

              The parts of Pattern found in Subject are not included in the result.

              Example:

              1> binary:split(<<1,255,4,0,0,0,2,3>>, [<<0,0,0>>,<<2>>],[]).
              [<<1,255,4>>, <<2,3>>]
              2> binary:split(<<0,1,0,0,4,255,255,9>>, [<<0,0>>, <<255,255>>],[global]).
              [<<0,1>>,<<4>>,<<9>>]

              Summary of options:

                {scope, part()}:
                  Works  as  in match/3 and matches/3. Notice that this only defines the scope of the search for
                  matching strings, it does not cut the binary before splitting. The bytes before and after  the
                  scope are kept in the result. See the example below.

                trim:
                  Removes trailing empty parts of the result (as does trim in re:split/3.

                trim_all:
                  Removes all empty parts of the result.

                global:
                  Repeats  the  split until Subject is exhausted. Conceptually option global makes split work on
                  the positions returned by matches/3, while it normally  works  on  the  position  returned  by
                  match/3.

              Example of the difference between a scope and taking the binary apart before splitting:

              1> binary:split(<<"banana">>, [<<"a">>],[{scope,{2,3}}]).
              [<<"ban">>,<<"na">>]
              2> binary:split(binary:part(<<"banana">>,{2,3}), [<<"a">>],[]).
              [<<"n">>,<<"n">>]

              The return type is always a list of binaries that are all referencing Subject. This means that the
              data  in Subject is not copied to new binaries, and that Subject cannot be garbage collected until
              the results of the split are no longer referenced.

              For a description of Pattern, see compile_pattern/1.

Ericsson AB                                      stdlib 4.3.1.3                                     binary(3erl)