Provided by: trurl_0.9-1build2_amd64 bug

NAME

       trurl - transpose URLs

SYNOPSIS

       trurl [options / URLs]

DESCRIPTION

       trurl parses, manipulates and outputs URLs and parts of URLs.

       It  uses  the RFC 3986 definition of URLs and it uses libcurl's URL parser to do so, which includes a few
       "extensions". The URL support is limited to "hierarchical" URLs, the ones that use "://" separators after
       the scheme.

       Typically you pass in one or more URLs and decide what of that you want output.  Possibly  modifying  the
       URL as well.

       trurl  knows  URLs  and  every  URL  consists  of  up to ten separate and independent "components". These
       components can be extracted, removed and updated with trurl and they are referred to by their  respective
       names: scheme, user, password, options, host, port, path, query, fragment and zoneid.

OPTIONS

       Options start with one or two dashes. Many of the options require an additional value next to them.

       Any  other  argument  is  interpreted  as  a  URL argument, and is treated as if it was following a --url
       option.

       The first argument that is exactly two dashes ("--"), marks the end of options; any  argument  after  the
       end of options is interpreted as a URL argument even if it starts with a dash.

       -a, --append [component]=[data]
              Append data to a component. This can only append data to the path and the query components.

              For path, this URL encodes and appends the new segment to the path, separated with a slash.

              For  query, this URL encodes and appends the new segment to the query, separated with an ampersand
              (&). If the appended segment contains an equal sign ('=') that one will be kept verbatim and  both
              sides of the first occurrence will be URL encoded separately.

       --accept-space
              When  set,  trurl  will  try  to  accept  spaces  as  part  of the URL and instead URL encode such
              occurrences accordingly.

              According to RFC 3986, a space cannot legally be part of a URL. This option provides a best-effort
              to convert the provided string into a valid URL.

       --default-port
              When set, trurl will use the scheme's default port number  for  URLs  with  a  known  scheme,  and
              without an explicit port number.

              Note that trurl only knows default port numbers for URL schemes that are supported by libcurl.

              Since,  by  default, trurl removes default port numbers from URLs with a known scheme, this option
              is pretty much ignored unless one of --get, --json, and --keep-port is not also specified.

       -f, --url-file [file name]
              Read URLs to work on from the given file. Use the file name "-" (a single minus) to tell trurl  to
              read the URLs from stdin.

              Each  line  needs to be a single valid URL. trurl will remove one carriage return character at the
              end of the line if present, trim off all the trailing space and tab characters, and skip all empty
              (after trimming) lines.

              The maximum line length supported in a file like this is 4094 bytes. Lines that exceed that length
              are skipped, and a warning is printed to stderr when they are encountered.

       -g, --get [format]
              Output text and URL data according to the provided format string. Components from the URL  can  be
              output  when  specified as {component} or [component], with the name of the part show within curly
              braces or brackets. You can not mix braces and brackets for this purpose in the same command line.

              The following component names  are  available  (case  sensitive):  url,  scheme,  user,  password,
              options, host, port, path, query, fragment and zoneid.

              {component} will expand to nothing if the given component does not have a value.

              Components  are  shown  URL decoded by default. If you instead write the component prefixed with a
              colon like "{:path}", it gets output URL encoded.

              You may also prefix components with default: and/or puny: or idn:, in any order.

              If default: is specified, like "{default:url}" or "{default:port}", and the port is not explicitly
              specified in the URL, the scheme's default port will be output if it is known.

              If puny: is specified, like "{puny:url}" or "{puny:host}", the "punycoded"  version  of  the  host
              name will be used in the output. This option is mutually exclusive with idn:.

              If  idn:  is  specified like "{idn:url}" or "{idn:host}", the International Domain Name version of
              the host name will be used in the output if  it  is  provided  as  a  correctly  encoded  punycode
              version. This option is mutually exclusive with puny:.

              If  --default-port  is  specified,  all  formats  are  expanded  as  if they used default:; and if
              --punycode is used, all formats are expanded as if they used puny:.  Also  note  that  "{url}"  is
              affected by the --keep-port option.

              Hosts  provided  as  IPv6  numerical  addresses  will  be  provided  within  square brackets. Like
              "[fe80::20c:29ff:fe9c:409b]".

              Hosts provided as IPv4 numerical addresses will be "normalized" and provided as four dot-separated
              decimal numbers when output.

              You can access specific keys in the query string using the format {query:key}. Then the  value  of
              the  first matching key will be output using a case sensitive match. When extracting a URL decoded
              query key that contains %00, such octet will be replaced with a single period '.' in the output.

              You can access specific keys in the query string and out  all  values  using  the  format  {query-
              all:key}.  This  looks  for  'key' case sensitively and will output all values for that key space-
              separated.

              The "format" string supports the following backslash sequences:

              \\ - backslash

              \t - tab

              \n - newline

              \r - carriage return

              \{ - an open curly brace that does not start a variable

              \[ - an open bracket that does not start a variable

              All other text in the format string will be shown as-is.

       -h, --help
              Show the help output.

       --iterate [component]=[item1 item2 ...]
              Set the component to multiple values and output  the  result  once  for  each  iteration.  Several
              combined  iterations  are  allowed  to  generate  combinations,  but only one --iterate option per
              component. The listed items to iterate over should be separated by single spaces.

       --json Outputs all set components of the URLs as JSON objects. All components of the URL that  have  data
              will  get  populated in the parts object using their component names. See below for details on the
              format.

       --keep-port
              By default, trurl removes default port numbers from URLs with a known  scheme  even  if  they  are
              explicitly specified in the input URL. This options, makes trurl not remove them.

       --no-guess-scheme
              Disables  libcurl's  scheme guessing feature. URLs that do not contain a scheme will be treated as
              invalid URLs.

       --punycode
              Uses the "punycoded" version of the host  name,  which  is  how  International  Domain  Names  are
              converted into plain ASCII. If the host name is not using IDN, the regular ASCII name is used.

       --as-idn
              Converts  a  "punycoded"  ASCII host name to its original International Domain Name in Unicode. If
              the host name is not using punycode then the original host name is used.

       --query-separator [what]
              Specify the single letter used for separating query pairs. The default is "&" but at least in  the
              past sometimes semicolons ";" or even colons ":" have been used for this purpose. If your URL uses
              something  other  than the default letter, setting the right one makes sure trurl can do its query
              operations properly.

       --redirect [URL]
              Redirect the URL to this new location.  The redirection is performed on the base URL,  so,  if  no
              base URL is specified, no redirection will be performed.

       -s, --set [component][:]=[data]
              Set this URL component. Setting blank string ("") will clear the component from the URL.

              The  following  components  can  be  set:  url, scheme, user, password, options, host, port, path,
              query, fragment and zoneid.

              If a simple "="-assignment is used, the data is URL encoded when applied. If  ":="  is  used,  the
              data is assumed to already be URL encoded and will be stored as-is.

              If  no URL or --url-file argument is provided, trurl will try to create a URL using the components
              provided by the --set options. If not enough components are specified, this will fail.

       --sort-query
              The  "variable=content"  tuplets  in  the  query  component  are  sorted  in  a  case  insensitive
              alphabetical  order. This helps making URLs identical that otherwise only had their query pairs in
              different orders.

       --url [URL]
              Set the input URL to work with. The URL may be provided without a scheme, which then typically  is
              not  actually  a legal URL but trurl will try to figure out what is meant and guess what scheme to
              use (unless --no-guess-scheme is used).

              Providing multiple URLs will make trurl act on all URLs in a serial fashion.

              If the URL cannot be parsed for whatever reason, trurl will simply move on to  the  next  provided
              URL - unless --verify is used.

       --urlencode
              Outputs URL encoded version of components by default when using --get or --json.

       --trim [component]=[what]
              Trims data off a component. Currently this can only trim a query component.

              "what"  is  specified  as a full word or as a word prefix (using a single trailing asterisk ('*'))
              which makes trurl remove the tuples from the query string that match the instruction.

              To match a literal trailing asterisk instead of using a wildcard, escape it with  a  backslash  in
              front of it. Like "\*".

       -v, --version
              Show version information and exit.

       --verify
              When  a  URL  is provided, return error immediately if it does not parse as a valid URL. In normal
              cases, trurl can forgive a bad URL input.

       --quiet
              Suppress (some) notes and warnings.

JSON output format

       The --json option outputs a JSON array with one or more objects. One for each URL.

       Each URL JSON object contains a number of properties, a series of key/value pairs. The exact set  depends
       on the given URL.

       url    This  key exists in every object. It is the complete URL. Affected by --default-port, --keep-port,
              and --punycode.

       parts  This key exists in every object, and contains an object with a key for each of  the  settable  URL
              components.  If  a  component is missing, it means it is not present in the URL. The parts are URL
              decoded unless --urlencode is used.

              scheme The URL scheme.

              user   The user name.

              password
                     The password.

              options
                     The options. Note that only a few URL schemes support the "options" component.

              host   The and normalized host name. It might be a UTF-8 name if an IDN name  was  used.   It  can
                     also be a normalized IPv4 or IPv6 address. An IPv6 address always starts with a bracket ([)
                     -  and  no  other host names can contain such a symbol. If --punycode is used, the punycode
                     version of the host is outputted instead.

              port   The provided port number as a string. If the port number was not provided in the  URL,  but
                     the  scheme  is a known one, and --default-port is in use, the default port for that scheme
                     will be provided here.

              path   The path. Including the leading slash.

              query  The full query, excluding the question mark separator.

              fragment
                     The fragment, excluding the pound sign separator.

              zoneid The zone id, which can only be present in an IPv6 address. When this key is  present,  then
                     host is an IPv6 numerical address.

       params This  key  contains  an  array of query key/value objects. Each such pair is listed with "key" and
              "value" and their respective contents in the output.

              The key/values are extracted from the query where they are separated by ampersands (&)  -  or  the
              user sets with --query-separator.

              The  query  pairs  are listed in the order of appearance in a left-to-right order, but can be made
              alpha-sorted with --sort-query.

              It is only present if the URL has a query.

EXAMPLES

       Replace the host name of a URL
              $ trurl --url https://curl.se --set host=example.com
              https://example.com/

       Create a URL by setting components
               $ trurl --set host=example.com --set scheme=ftp
               ftp://example.com/

       Redirect a URL
              $ trurl --url https://curl.se/we/are.html --redirect here.html
              https://curl.se/we/here.html

       Change port number
              This also shows how trurl will remove dot-dot sequences
              $ trurl --url https://curl.se/we/../are.html --set port=8080
              https://curl.se:8080/are.html

       Extract the path from a URL
              $ trurl --url https://curl.se/we/are.html --get '{path}'
              /we/are.html

       Extract the port from a URL
              This gets the default port based on the scheme if the port is not set in the URL.
              $ trurl --url https://curl.se/we/are.html --get '{default:port}'
              443

       Append a path segment to a URL
              $ trurl --url https://curl.se/hello --append path=you
              https://curl.se/hello/you

       Append a query segment to a URL
              $ trurl --url "https://curl.se?name=hello" --append query=search=string
               https://curl.se/?name=hello&search=string

       Read URLs from stdin
              $ cat urllist.txt | trurl --url-file -
              ...

       Output JSON
              $ trurl "https://fake.host/search?q=answers&user=me#frag" --json
              [
                {
                  "url": "https://fake.host/search?q=answers&user=me#frag",
                  "parts": [
                      "scheme": "https",
                      "host": "fake.host",
                      "path": "/search",
                      "query": "q=answers&user=me"
                      "fragment": "frag",
                  ],
                  "params": [
                    {
                      "key": "q",
                      "value": "answers"
                    },
                    {
                      "key": "user",
                      "value": "me"
                    }
                  ]
                }
              ]

       Remove tracking tuples from query
              $ trurl "https://curl.se?search=hey&utm_source=tracker" --trim query="utm_*"
              https://curl.se/?search=hey

       Show a specific query key value
              $ trurl "https://example.com?a=home&here=now&thisthen" -g '{query:a}'
              home

       Sort the key/value pairs in the query component
              $ trurl "https://example.com?b=a&c=b&a=c" --sort-query
              https://example.com?a=c&b=a&c=b

       Work with a query that uses a semicolon separator
              $ trurl "https://curl.se?search=fool;page=5" --trim query="search" --query-separator ";"
              https://curl.se?page=5

       Accept spaces in the URL path
              $ trurl "https://curl.se/this has space/index.html" --accept-space
              https://curl.se/this%20has%20space/index.html

       Create multiple variations of a URL with different schemes
              $ trurl "https://curl.se/path/index.html" --iterate "scheme=http ftp sftp"
              http://curl.se/path/index.html
              ftp://curl.se/path/index.html
              sftp://curl.se/path/index.html

WWW

       https://curl.se/trurl

SEE ALSO

       curl_url_set(3), curl_url_get(3)

trurl                                            April 27, 2023                                         trurl(1)