Provided by: erlang-manpages_25.3.2.8+dfsg-1ubuntu4.4_all bug

NAME

       uri_string - URI processing functions.

DESCRIPTION

       This module contains functions for parsing and handling URIs (RFC 3986) and form-urlencoded query strings
       (HTML 5.2).

       Parsing and serializing non-UTF-8 form-urlencoded query strings are also supported (HTML 5.0).

       A  URI  is an identifier consisting of a sequence of characters matching the syntax rule named URI in RFC
       3986.

       The generic URI syntax consists of a hierarchical sequence of  components  referred  to  as  the  scheme,
       authority, path, query, and fragment:

           URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
           hier-part   = "//" authority path-abempty
                          / path-absolute
                          / path-rootless
                          / path-empty
           scheme      = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
           authority   = [ userinfo "@" ] host [ ":" port ]
           userinfo    = *( unreserved / pct-encoded / sub-delims / ":" )

           reserved    = gen-delims / sub-delims
           gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"
           sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
                       / "*" / "+" / "," / ";" / "="

           unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"

       The  interpretation  of  a  URI  depends  only on the characters used and not on how those characters are
       represented in a network protocol.

       The functions implemented by this module cover the following use cases:

         * Parsing URIs into its components and returing a map
           parse/1

         * Recomposing a map of URI components into a URI string
           recompose/1

         * Changing inbound binary and percent-encoding of URIs
           transcode/2

         * Transforming URIs into a normalized form
           normalize/1
           normalize/2

         * Composing form-urlencoded query strings from a list of key-value pairs
           compose_query/1
           compose_query/2

         * Dissecting form-urlencoded query strings into a list of key-value pairs
           dissect_query/1

         * Decoding percent-encoded triplets in URI map or a specific component of URI
           percent_decode/1

         * Preparing and retrieving application specific data included in URI components
           quote/1quote/2unquote/1

       There are four different encodings present during the handling of URIs:

         * Inbound binary encoding in binaries

         * Inbound percent-encoding in lists and binaries

         * Outbound binary encoding in binaries

         * Outbound percent-encoding in lists and binaries

       Functions with uri_string() argument accept lists, binaries and mixed lists (lists with binary  elements)
       as  input  type. All of the functions but transcode/2 expects input as lists of unicode codepoints, UTF-8
       encoded binaries and UTF-8 percent-encoded URI parts ("%C3%B6" corresponds to the unicode character "ö").

       Unless otherwise specified the return value type and  encoding  are  the  same  as  the  input  type  and
       encoding.  That  is, binary input returns binary output, list input returns a list output but mixed input
       returns list output.

       In case of lists there is only percent-encoding. In binaries, however, both binary encoding and  percent-
       encoding  shall be considered. transcode/2 provides the means to convert between the supported encodings,
       it takes a uri_string() and a list of options specifying inbound and outbound encodings.

       RFC 3986 does not mandate any specific character encoding and it is usually defined by  the  protocol  or
       surrounding  text. This library takes the same assumption, binary and percent-encoding are handled as one
       configuration unit, they cannot be set to different values.

       Quoting functions are intended to be used by URI producing application during  component  preparation  or
       retrieval  phase to avoid conflicts between data and characters used in URI syntax. Quoting functions use
       percent encoding, but with different rules than for example during execution of recompose/1. It  is  user
       responsibility  to provide quoting functions with application data only and using their output to combine
       an URI component.
       Quoting functions can for instance be used for constructing a path component with  a  segment  containing
       '/' character which should not collide with '/' used as general delimiter in path component.

DATA TYPES

       error() = {error, atom(), term()}

              Error tuple indicating the type of error. Possible values of the second component:

                * invalid_character

                * invalid_encoding

                * invalid_input

                * invalid_map

                * invalid_percent_encoding

                * invalid_scheme

                * invalid_uri

                * invalid_utf8

                * missing_value

              The third component is a term providing additional information about the cause of the error.

       uri_map() =
           #{fragment => unicode:chardata(),
             host => unicode:chardata(),
             path => unicode:chardata(),
             port => integer() >= 0 | undefined,
             query => unicode:chardata(),
             scheme => unicode:chardata(),
             userinfo => unicode:chardata()}

              Map holding the main components of a URI.

       uri_string() = iodata()

              List  of unicode codepoints, a UTF-8 encoded binary, or a mix of the two, representing an RFC 3986
              compliant URI (percent-encoded form). A URI is a sequence of characters from a very  limited  set:
              the letters of the basic Latin alphabet, digits, and a few special characters.

EXPORTS

       allowed_characters() -> [{atom(), list()}]

              This  is  a  utility function meant to be used in the shell for printing the allowed characters in
              each major URI component, and also in the most important characters sets. Please  note  that  this
              function  does  not  replace  the  ABNF  rules  defined by the standards, these character sets are
              derived directly from those aformentioned rules. For more information  see  the  Uniform  Resource
              Identifiers chapter in stdlib's Users Guide.

       compose_query(QueryList) -> QueryString

              Types:

                 QueryList = [{unicode:chardata(), unicode:chardata() | true}]
                 QueryString = uri_string() | error()

              Composes  a  form-urlencoded  QueryString based on a QueryList, a list of non-percent-encoded key-
              value pairs. Form-urlencoding is defined in section 4.10.21.6 of the HTML 5.2 specification and in
              section 4.10.22.6 of the HTML 5.0 specification for non-UTF-8 encodings.

              See also the opposite operation dissect_query/1.

              Example:

              1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}]).
              "foo+bar=1&city=%C3%B6rebro"
              2> uri_string:compose_query([{<<"foo bar">>,<<"1">>},
              2> {<<"city">>,<<"örebro"/utf8>>}]).
              <<"foo+bar=1&city=%C3%B6rebro">>

       compose_query(QueryList, Options) -> QueryString

              Types:

                 QueryList = [{unicode:chardata(), unicode:chardata() | true}]
                 Options = [{encoding, atom()}]
                 QueryString = uri_string() | error()

              Same as compose_query/1 but with an additional  Options  parameter,  that  controls  the  encoding
              ("charset")  used  by the encoding algorithm. There are two supported encodings: utf8 (or unicode)
              and latin1.

              Each character in the entry's name and value that cannot be expressed using the selected character
              encoding, is replaced by a string consisting of a U+0026 AMPERSAND character (&), a  "#"  (U+0023)
              character,  one  or more ASCII digits representing the Unicode code point of the character in base
              ten, and finally a ";" (U+003B) character.

              Bytes that are out of the range 0x2A, 0x2D, 0x2E, 0x30 to 0x39, 0x41 to 0x5A, 0x5F, 0x61 to  0x7A,
              are  percent-encoded  (U+0025  PERCENT  SIGN  character (%) followed by uppercase ASCII hex digits
              representing the hexadecimal value of the byte).

              See also the opposite operation dissect_query/1.

              Example:

              1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}],
              1> [{encoding, latin1}]).
              "foo+bar=1&city=%F6rebro"
              2> uri_string:compose_query([{<<"foo bar">>,<<"1">>},
              2> {<<"city">>,<<"東京"/utf8>>}], [{encoding, latin1}]).
              <<"foo+bar=1&city=%26%2326481%3B%26%2320140%3B">>

       dissect_query(QueryString) -> QueryList

              Types:

                 QueryString = uri_string()
                 QueryList =
                     [{unicode:chardata(), unicode:chardata() | true}] | error()

              Dissects an urlencoded QueryString and returns a QueryList, a  list  of  non-percent-encoded  key-
              value pairs. Form-urlencoding is defined in section 4.10.21.6 of the HTML 5.2 specification and in
              section 4.10.22.6 of the HTML 5.0 specification for non-UTF-8 encodings.

              See also the opposite operation compose_query/1.

              Example:

              1> uri_string:dissect_query("foo+bar=1&city=%C3%B6rebro").
              [{"foo bar","1"},{"city","örebro"}]
              2> uri_string:dissect_query(<<"foo+bar=1&city=%26%2326481%3B%26%2320140%3B">>).
              [{<<"foo bar">>,<<"1">>},
               {<<"city">>,<<230,157,177,228,186,172>>}]

       normalize(URI) -> NormalizedURI

              Types:

                 URI = uri_string() | uri_map()
                 NormalizedURI = uri_string() | error()

              Transforms an URI into a normalized form using Syntax-Based Normalization as defined by RFC 3986.

              This   function  implements  case  normalization,  percent-encoding  normalization,  path  segment
              normalization and scheme based normalization for HTTP(S) with basic support for FTP, SSH, SFTP and
              TFTP.

              Example:

              1> uri_string:normalize("/a/b/c/./../../g").
              "/a/g"
              2> uri_string:normalize(<<"mid/content=5/../6">>).
              <<"mid/6">>
              3> uri_string:normalize("http://localhost:80").
              "http://localhost/"
              4> uri_string:normalize(#{scheme => "http",port => 80,path => "/a/b/c/./../../g",
              4> host => "localhost-örebro"}).
              "http://localhost-%C3%B6rebro/a/g"

       normalize(URI, Options) -> NormalizedURI

              Types:

                 URI = uri_string() | uri_map()
                 Options = [return_map]
                 NormalizedURI = uri_string() | uri_map() | error()

              Same as normalize/1 but with an additional Options parameter, that controls whether the normalized
              URI shall be returned as an uri_map(). There is one supported option: return_map.

              Example:

              1> uri_string:normalize("/a/b/c/./../../g", [return_map]).
              #{path => "/a/g"}
              2> uri_string:normalize(<<"mid/content=5/../6">>, [return_map]).
              #{path => <<"mid/6">>}
              3> uri_string:normalize("http://localhost:80", [return_map]).
              #{scheme => "http",path => "/",host => "localhost"}
              4> uri_string:normalize(#{scheme => "http",port => 80,path => "/a/b/c/./../../g",
              4> host => "localhost-örebro"}, [return_map]).
              #{scheme => "http",path => "/a/g",host => "localhost-örebro"}

       parse(URIString) -> URIMap

              Types:

                 URIString = uri_string()
                 URIMap = uri_map() | error()

              Parses an RFC 3986 compliant uri_string() into a uri_map(), that holds the  parsed  components  of
              the URI. If parsing fails, an error tuple is returned.

              See also the opposite operation recompose/1.

              Example:

              1> uri_string:parse("foo://user@example.com:8042/over/there?name=ferret#nose").
              #{fragment => "nose",host => "example.com",
                path => "/over/there",port => 8042,query => "name=ferret",
                scheme => foo,userinfo => "user"}
              2> uri_string:parse(<<"foo://user@example.com:8042/over/there?name=ferret">>).
              #{host => <<"example.com">>,path => <<"/over/there">>,
                port => 8042,query => <<"name=ferret">>,scheme => <<"foo">>,
                userinfo => <<"user">>}

       percent_decode(URI) -> Result

              Types:

                 URI = uri_string() | uri_map()
                 Result =
                     uri_string() |
                     uri_map() |
                     {error, {invalid, {atom(), {term(), term()}}}}

              Decodes all percent-encoded triplets in the input that can be both a uri_string() and a uri_map().
              Note,  that  this  function  performs  raw  decoding  and  it  shall be used on already parsed URI
              components. Applying this function directly on a standard URI can effectively change it.

              If the input encoding is not UTF-8, an error tuple is returned.

              Example:

              1> uri_string:percent_decode(#{host => "localhost-%C3%B6rebro",path => [],
              1> scheme => "http"}).
              #{host => "localhost-örebro",path => [],scheme => "http"}
              2> uri_string:percent_decode(<<"%C3%B6rebro">>).
              <<"örebro"/utf8>>

          Warning:
              Using uri_string:percent_decode/1 directly on a URI is not safe. This example  shows,  that  after
              each consecutive application of the function the resulting URI will be changed. None of these URIs
              refer to the same resource.

              3> uri_string:percent_decode(<<"http://local%252Fhost/path">>).
              <<"http://local%2Fhost/path">>
              4> uri_string:percent_decode(<<"http://local%2Fhost/path">>).
              <<"http://local/host/path">>

       quote(Data) -> QuotedData

              Types:

                 Data = QuotedData = unicode:chardata()

              Replaces characters out of unreserved set with their percent encoded equivalents.

              Unreserved characters defined in RFC 3986 are not quoted.

              Example:

              1> uri_string:quote("SomeId/04").
              "SomeId%2F04"
              2> uri_string:quote(<<"SomeId/04">>).
              <<"SomeId%2F04">>

          Warning:
              Function  is  not  aware  about  any URI component context and should not be used on whole URI. If
              applied more than once on the same data, might produce unexpected results.

       quote(Data, Safe) -> QuotedData

              Types:

                 Data = unicode:chardata()
                 Safe = string()
                 QuotedData = unicode:chardata()

              Same as quote/1, but Safe allows user to provide  a  list  of  characters  to  be  protected  from
              encoding.

              Example:

              1> uri_string:quote("SomeId/04", "/").
              "SomeId/04"
              2> uri_string:quote(<<"SomeId/04">>, "/").
              <<"SomeId/04">>

          Warning:
              Function  is  not  aware  about  any URI component context and should not be used on whole URI. If
              applied more than once on the same data, might produce unexpected results.

       recompose(URIMap) -> URIString

              Types:

                 URIMap = uri_map()
                 URIString = uri_string() | error()

              Creates an RFC 3986 compliant URIString (percent-encoded), based on the components of  URIMap.  If
              the URIMap is invalid, an error tuple is returned.

              See also the opposite operation parse/1.

              Example:

              1> URIMap = #{fragment => "nose", host => "example.com", path => "/over/there",
              1> port => 8042, query => "name=ferret", scheme => "foo", userinfo => "user"}.
              #{fragment => "nose",host => "example.com",
                path => "/over/there",port => 8042,query => "name=ferret",
                scheme => "foo",userinfo => "user"}

              2> uri_string:recompose(URIMap).
              "foo://example.com:8042/over/there?name=ferret#nose"

       resolve(RefURI, BaseURI) -> TargetURI

              Types:

                 RefURI = BaseURI = uri_string() | uri_map()
                 TargetURI = uri_string() | error()

              Convert  a  RefURI reference that might be relative to a given base URI into the parsed components
              of the reference's target, which can then be recomposed to form the target URI.

              Example:

              1> uri_string:resolve("/abs/ol/ute", "http://localhost/a/b/c?q").
              "http://localhost/abs/ol/ute"
              2> uri_string:resolve("../relative", "http://localhost/a/b/c?q").
              "http://localhost/a/relative"
              3> uri_string:resolve("http://localhost/full", "http://localhost/a/b/c?q").
              "http://localhost/full"
              4> uri_string:resolve(#{path => "path", query => "xyz"}, "http://localhost/a/b/c?q").
              "http://localhost/a/b/path?xyz"

       resolve(RefURI, BaseURI, Options) -> TargetURI

              Types:

                 RefURI = BaseURI = uri_string() | uri_map()
                 Options = [return_map]
                 TargetURI = uri_string() | uri_map() | error()

              Same as resolve/2 but with an additional Options parameter, that controls whether the  target  URI
              shall be returned as an uri_map(). There is one supported option: return_map.

              Example:

              1> uri_string:resolve("/abs/ol/ute", "http://localhost/a/b/c?q", [return_map]).
              #{host => "localhost",path => "/abs/ol/ute",scheme => "http"}
              2> uri_string:resolve(#{path => "/abs/ol/ute"}, #{scheme => "http",
              2> host => "localhost", path => "/a/b/c?q"}, [return_map]).
              #{host => "localhost",path => "/abs/ol/ute",scheme => "http"}

       transcode(URIString, Options) -> Result

              Types:

                 URIString = uri_string()
                 Options =
                     [{in_encoding, unicode:encoding()} |
                      {out_encoding, unicode:encoding()}]
                 Result = uri_string() | error()

              Transcodes  an  RFC 3986 compliant URIString, where Options is a list of tagged tuples, specifying
              the inbound (in_encoding) and outbound  (out_encoding)  encodings.  in_encoding  and  out_encoding
              specifies both binary encoding and percent-encoding for the input and output data. Mixed encoding,
              where  binary  encoding  is  not the same as percent-encoding, is not supported. If an argument is
              invalid, an error tuple is returned.

              Example:

              1> uri_string:transcode(<<"foo%00%00%00%F6bar"/utf32>>,
              1> [{in_encoding, utf32},{out_encoding, utf8}]).
              <<"foo%C3%B6bar"/utf8>>
              2> uri_string:transcode("foo%F6bar", [{in_encoding, latin1},
              2> {out_encoding, utf8}]).
              "foo%C3%B6bar"

       unquote(QuotedData) -> Data

              Types:

                 QuotedData = Data = unicode:chardata()

              Percent decode characters.

              Example:

              1> uri_string:unquote("SomeId%2F04").
              "SomeId/04"
              2> uri_string:unquote(<<"SomeId%2F04">>).
              <<"SomeId/04">>

          Warning:
              Function is not aware about any URI component context and should not be  used  on  whole  URI.  If
              applied more than once on the same data, might produce unexpected results.

Ericsson AB                                      stdlib 4.3.1.3                                 uri_string(3erl)