Provided by: pcp-export-pcp2spark_6.2.0-1.1build4_amd64 bug

NAME

       pcp2spark - pcp-to-spark metrics exporter

SYNOPSIS

       pcp2spark  [-5CGHIjLmnrRvV?]   [-4  action] [-8|-9 limit] [-a archive] [-A align] [--archive-folio folio]
       [-b|-B space-scale] [-c config] [--container container] [--daemonize] [-e derived] [-g server] [-h  host]
       [-i  instances]  [-J rank] [-K spec] [-N predicate] [-O origin] [-p port] [-P|-0 precision] [-q|-Q count-
       scale] [-s samples] [-S starttime] [-t interval] [-T endtime] [-y|-Y time-scale] metricspec [...]

DESCRIPTION

       pcp2spark is a customizable performance metrics exporter tool from PCP to Apache  Spark.   Any  available
       performance  metric,  live  or  archived,  system and/or application, can be selected for exporting using
       either command line arguments or a configuration file.

       pcp2spark acts as a bridge which provides a network socket stream on a given address/port which an Apache
       Spark worker task can connect to and pull the configured PCP metrics from pcp2spark exporting them  using
       the streaming extensions of the Apache Spark API.

       pcp2spark  is a close relative of pmrep(1).  Refer to pmrep(1) for the metricspec description accepted on
       pcp2spark command line.  See pmrep.conf(5) for  description  of  the  pcp2spark.conf  configuration  file
       syntax.   This  page  describes  pcp2spark  specific  options  and  configuration  file  differences with
       pmrep.conf(5).  pmrep(1) also lists some usage examples of which most are applicable  with  pcp2spark  as
       well.

       Only the command line options listed on this page are supported, other options available for pmrep(1) are
       not supported.

       Options  via  environment values (see pmGetOptions(3)) override the corresponding built-in default values
       (if any).  Configuration file options override the corresponding environment variables (if any).  Command
       line options override the corresponding configuration file options (if any).

GENERAL USAGE

       A general setup for making use of pcp2spark would involve the user  configuring  pcp2spark  for  the  PCP
       metrics  to  export  followed  by starting the pcp2spark application. The pcp2spark application will then
       wait and listen on the given address/port for a connection from an  Apache  Spark  worker  thread  to  be
       started.  The worker thread will then connect to pcp2spark.

       When  an  Apache  Spark  worker  thread  has connected, pcp2spark will begin streaming PCP metric data to
       Apache Spark until the worker thread completes or the connection is interrupted.  If  the  connection  is
       interrupted or the socket is closed from the Apache Spark worker thread pcp2spark will exit.

       For  an  example  Apache  Spark  worker  job  which  will  connect  to  an  pcp2spark instance on a given
       address/port and pull in PCP metric data see the example provided  in  the  PCP  examples  directory  for
       pcp2spark   (often   provided   by   the   PCP   development   package)   or   the   online   version  at
       https://github.com/performancecopilot/pcp/blob/main/src/pcp2spark/.

CONFIGURATION FILE

       pcp2spark uses a configuration file with syntax described in pmrep.conf(5).  The  following  options  are
       common  with  pmrep.conf:  version, source, speclocal, derived, header, globals, samples, interval, type,
       type_prefer,    ignore_incompat,    names_change,    instances,    live_filter,    rank,    limit_filter,
       limit_filter_force,  invert_filter,  predicate,  omit_flat,  include_labels,  precision, precision_force,
       count_scale, count_scale_force, space_scale, space_scale_force, time_scale, time_scale_force.   The  rest
       of the pmrep.conf options are recognized but ignored for compatibility.

   pcp2spark specific options
       spark_server (string)
           Specify  the  address  on  which  pcp2spark  will  listen for connections from an Apache Spark worker
           thread.  Corresponding command line option is -g.  Defaults to 127.0.0.1.

       spark_port (integer)
           Specify the port on which pcp2spark will listen for connections.  Corresponding command  line  option
           is -p.  Defaults to 44325.

OPTIONS

       The available command line options are:

       -0 precision, --precision-force=precision
            Like -P but this option will override per-metric specifications.

       -4 action, --names-change=action
            Specify which action to take on receiving a metric names change event during sampling.  These events
            occur when a PMDA discovers new metrics sometime after starting up, and informs running client tools
            like  pcp2spark.   Valid  values  for  action are update (refresh metrics being sampled), ignore (do
            nothing - the default behaviour) and abort (exit the program if such an event occurs).

       -5, --ignore-unknown
            Silently ignore any metric name that cannot be resolved.  At least one metric must be found for  the
            tool to start.

       -8 limit, --limit-filter=limit
            Limit results to instances with values above/below limit.  A positive integer will include instances
            with  values  at  or  above  the limit in reporting.  A negative integer will include instances with
            values at or below the limit in reporting.  A value of  zero  performs  no  limit  filtering.   This
            option will not override possible per-metric specifications.  See also -J and -N.

       -9 limit, --limit-filter-force=limit
            Like -8 but this option will override per-metric specifications.

       -a archive, --archive=archive
            Performance  metric  values  are  retrieved from the set of Performance Co-Pilot (PCP) archive files
            identified by the archive argument, which is a comma-separated list of names, each of which  may  be
            the base name of an archive or the name of a directory containing one or more archives.

       -A align, --align=align
            Force  the  initial  sample  to  be  aligned on the boundary of a natural time unit align.  Refer to
            PCPIntro(1) for a complete description of the syntax for align.

       --archive-folio=folio
            Read metric source archives from the PCP archive folio created by tools  like  pmchart(1)  or,  less
            often, manually with mkaf(1).

       -b scale, --space-scale=scale
            Unit/scale  for  space (byte) metrics, possible values include bytes, Kbytes, KB, Mbytes, MB, and so
            forth.   This  option  will   not   override   possible   per-metric   specifications.    See   also
            pmParseUnitsStr(3).

       -B scale, --space-scale-force=scale
            Like -b but this option will override per-metric specifications.

       -c config, --config=config
            Specify  the  config file or directory to use.  In case config is a directory all files in it ending
            .conf will be included.  The default is the first found of: ./pcp2spark.conf, $HOME/.pcp2spark.conf,
            $HOME/pcp/pcp2spark.conf, and $PCP_SYSCONF_DIR/pcp2spark.conf.  For details, see the  above  section
            and pmrep.conf(5).

       --container=container
            Fetch performance metrics from the specified container, either local or remote (see -h).

       -C, --check
            Exit  before  reporting  any  values,  but  after parsing the configuration and metrics and printing
            possible headers.

       --daemonize
            Daemonize on startup.

       -e derived, --derived=derived
            Specify derived performance metrics.  If derived starts with a slash (``/'') or with a  dot  (``.'')
            it will be interpreted as a PCP derived metrics configuration file, otherwise it will be interpreted
            as  comma-  or  semicolon-separated derived metric expressions.  For complete description of derived
            metrics   and   PCP   derived   metrics   configuration   files   see   pmLoadDerivedConfig(3)   and
            pmRegisterDerived(3).   Alternatively,  using  pmrep.conf(5)  configuration  syntax  allows defining
            derived metrics as part of metricsets.

       -g server, --spark-server=server
            pcp2spark local server address.

       -G, --no-globals
            Do not include global metrics in reporting (see pmrep.conf(5)).

       -h host, --host=host
            Fetch performance metrics from pmcd(1) on host, rather than from the default localhost.

       -H, --no-header
            Do not print any headers.

       -i instances, --instances=instances
            Retrieve and report only the specified metric instances.  By  default  all  instances,  present  and
            future, are reported.

            Refer to pmrep(1) for complete description of this option.

       -I, --ignore-incompat
            Ignore incompatible metrics.  By default incompatible metrics (that is, their type is unsupported or
            they  cannot  be scaled as requested) will cause pcp2spark to terminate with an error message.  With
            this option all incompatible metrics are silently omitted from reporting.  This  may  be  especially
            useful when requesting non-leaf nodes of the PMNS tree for reporting.

       -j, --live-filter
            Perform  instance  live  filtering.  This allows capturing all named instances even if processes are
            restarted at some point (unlike without live filtering).  Performing  live  filtering  over  a  huge
            number  of  instances will add some internal overhead so a bit of user caution is advised.  See also
            -n.

       -J rank, --rank=rank
            Limit results to highest/lowest ranked instances of set-valued metrics.   A  positive  integer  will
            include  highest  valued  instances  in  reporting.   A  negative integer will include lowest valued
            instances in reporting.  A value of zero performs no ranking.  Ranking does not imply  sorting,  see
            -6.  See also -8.

       -K spec, --spec-local=spec
            When  fetching  metrics  from a local context (see -L), the -K option may be used to control the DSO
            PMDAs that should be made accessible.  The  spec  argument  conforms  to  the  syntax  described  in
            pmSpecLocalPMDA(3).  More than one -K option may be used.

       -L, --local-PMDA
            Use a local context to collect metrics from DSO PMDAs on the local host without PMCD.  See also -K.

       -m, --include-labels
            Include PCP metric labels in the output.

       -n, --invert-filter
            Perform  ranking before live filtering.  By default instance live filtering (when requested, see -j)
            happens before instance ranking (when requested, see -J).  With this option the  logic  is  inverted
            and ranking happens before live filtering.

       -N predicate, --predicate=predicate
            Specify  a  comma-separated list of predicate filter reference metrics.  By default ranking (see -J)
            happens for each metric individually.  With predicates, ranking  is  done  only  for  the  specified
            predicate  metrics.   When  reporting,  rest  of  the  metrics sharing the same instance domain (see
            PCPIntro(1)) as the predicate  will  include  only  the  highest/lowest  ranking  instances  of  the
            corresponding predicate.  Ranking does not imply sorting, see -6.

            So  for  example,  using  proc.memory.rss  (resident memory size of process) as the predicate metric
            together with proc.io.total_bytes and mem.util.used as metrics to be reported,  only  the  processes
            using  most/least  (as  per  -J)  memory  will  be  included  when  reporting total bytes written by
            processes.  Since mem.util.used is a single-valued metric (thus not sharing the same instance domain
            as the process related metrics), it will be reported as usual.

       -O origin, --origin=origin
            When reporting archived metrics, start reporting at origin within the time window (see -S  and  -T).
            Refer to PCPIntro(1) for a complete description of the syntax for origin.

       -p port, --spark-port=port
            pcp2spark local port.

       -P precision, --precision=precision
            Use  precision  for numeric non-integer output values.  The default is to use 3 decimal places (when
            applicable).  This option will not override possible per-metric specifications.

       -q scale, --count-scale=scale
            Unit/scale for count metrics, possible values include count x 10^-1, count,  count  x  10,  count  x
            10^2,  and  so forth from 10^-8 to 10^7.  (These values are currently space-sensitive.)  This option
            will not override possible per-metric specifications.  See also pmParseUnitsStr(3).

       -Q scale, --count-scale-force=scale
            Like -q but this option will override per-metric specifications.

       -r, --raw
            Output raw metric values, do not convert cumulative counters to rates.  This  option  will  override
            possible per-metric specifications.

       -R, --raw-prefer
            Like -r but this option will not override per-metric specifications.

       -s samples, --samples=samples
            The samples argument defines the number of samples to be retrieved and reported.  If samples is 0 or
            -s  is not specified, pcp2spark will sample and report continuously (in real time mode) or until the
            end of the set of PCP archives (in archive mode).  See also -T.

       -S starttime, --start=starttime
            When reporting archived metrics, the report will be restricted to those records logged at  or  after
            starttime.  Refer to PCPIntro(1) for a complete description of the syntax for starttime.

       -t interval, --interval=interval
            Set  the  reporting  interval  to  something other than the default 1 second.  The interval argument
            follows the syntax described in PCPIntro(1), and in the simplest form may  be  an  unsigned  integer
            (the implied units in this case are seconds).  See also the -T option.

       -T endtime, --finish=endtime
            When  reporting archived metrics, the report will be restricted to those records logged before or at
            endtime.  Refer to PCPIntro(1) for a complete description of the syntax for endtime.

            When used to define the runtime before pcp2spark will exit, if no samples is given (see -s) then the
            number of reported samples depends on interval (see -t).  If samples is given then interval will  be
            adjusted  to  allow  reporting  of samples during runtime.  In case all of -T, -s, and -t are given,
            endtime determines the actual time pcp2spark will run.

       -v, --omit-flat
            Report only set-valued metrics with instances (e.g. disk.dev.read) and omit  single-valued  ``flat''
            metrics without instances (e.g.  kernel.all.sysfork).  See -i and -I.

       -V, --version
            Display version number and exit.

       -y scale, --time-scale=scale
            Unit/scale for time metrics, possible values include nanosec, ns, microsec, us, millisec, ms, and so
            forth  up  to hour, hr.  This option will not override possible per-metric specifications.  See also
            pmParseUnitsStr(3).

       -Y scale, --time-scale-force=scale
            Like -y but this option will override per-metric specifications.

       -?, --help
            Display usage message and exit.

FILES

       pcp2spark.conf
            pcp2spark configuration file (see -c)

       $PCP_SYSCONF_DIR/pmrep/*.conf
            system provided default pmrep configuration files

PCP ENVIRONMENT

       Environment variables with the prefix PCP_ are used to parameterize the file and directory names used  by
       PCP.   On  each  installation, the file /etc/pcp.conf contains the local values for these variables.  The
       $PCP_CONF variable may be used to specify an alternative configuration file, as described in pcp.conf(5).

       For environment variables affecting PCP tools, see pmGetOptions(3).

SEE ALSO

       PCPIntro(1),  mkaf(1),  pcp(1),  pcp2elasticsearch(1),  pcp2graphite(1),  pcp2influxdb(1),   pcp2json(1),
       pcp2xlsx(1),     pcp2xml(1),    pcp2zabbix(1),    pmcd(1),    pminfo(1),    pmrep(1),    pmGetOptions(3),
       pmLoadDerivedConfig(3),  pmParseUnitsStr(3),  pmRegisterDerived(3),  pmSpecLocalPMDA(3),   LOGARCHIVE(5),
       pcp.conf(5), pmrep.conf(5) and PMNS(5).

Performance Co-Pilot                                   PCP                                          PCP2SPARK(1)