|
NAME | SYNOPSIS | DESCRIPTION | GENERAL USAGE | CONFIGURATION FILE | OPTIONS | FILES | PCP ENVIRONMENT | DEBUGGING OPTIONS | SEE ALSO | COLOPHON |
|
|
|
PCP2SPARK(1) General Commands Manual PCP2SPARK(1)
pcp2spark - pcp-to-spark metrics exporter
pcp2spark [-5CGHIjLmnrRvV?] [-4 action] [-8|-9 limit] [-a
archive] [-A align] [--archive-folio folio] [-b|-B space-scale]
[-c config] [--container container] [-D debug] [--daemonize] [-e
derived] [-g server] [-h host] [-i instances] [-J rank] [-K spec]
[-N predicate] [-O origin] [-p port] [-P|-0 precision] [-q|-Q
count-scale] [-s samples] [-S starttime] [-t interval] [-T
endtime] [-y|-Y time-scale] metricspec [...]
pcp2spark is a customizable performance metrics exporter tool from
PCP to Apache Spark. Any available performance metric, live or
archived, system and/or application, can be selected for exporting
using either command line arguments or a configuration file.
pcp2spark acts as a bridge which provides a network socket stream
on a given address/port which an Apache Spark worker task can
connect to and pull the configured PCP metrics from pcp2spark
exporting them using the streaming extensions of the Apache Spark
API.
pcp2spark is a close relative of pmrep(1). Refer to pmrep(1) for
the metricspec description accepted on pcp2spark command line.
See pmrep.conf(5) for description of the pcp2spark.conf
configuration file syntax. This page describes pcp2spark specific
options and configuration file differences with pmrep.conf(5).
pmrep(1) also lists some usage examples of which most are
applicable with pcp2spark as well.
Only the command line options listed on this page are supported,
other options available for pmrep(1) are not supported.
Options via environment values (see pmGetOptions(3)) override the
corresponding built-in default values (if any). Configuration
file options override the corresponding environment variables (if
any). Command line options override the corresponding
configuration file options (if any).
A general setup for making use of pcp2spark would involve the user
configuring pcp2spark for the PCP metrics to export followed by
starting the pcp2spark application. The pcp2spark application will
then wait and listen on the given address/port for a connection
from an Apache Spark worker thread to be started. The worker
thread will then connect to pcp2spark.
When an Apache Spark worker thread has connected, pcp2spark will
begin streaming PCP metric data to Apache Spark until the worker
thread completes or the connection is interrupted. If the
connection is interrupted or the socket is closed from the Apache
Spark worker thread pcp2spark will exit.
For an example Apache Spark worker job which will connect to an
pcp2spark instance on a given address/port and pull in PCP metric
data see the example provided in the PCP examples directory for
pcp2spark (often provided by the PCP development package) or the
online version at
https://github.com/performancecopilot/pcp/blob/main/src/pcp2spark/ .
pcp2spark uses a configuration file with syntax described in
pmrep.conf(5). The following options are common with pmrep.conf:
version, source, speclocal, derived, header, globals, samples,
interval, type, type_prefer, ignore_incompat, names_change,
instances, live_filter, rank, limit_filter, limit_filter_force,
invert_filter, predicate, omit_flat, include_labels, precision,
precision_force, count_scale, count_scale_force, space_scale,
space_scale_force, time_scale, time_scale_force. The rest of the
pmrep.conf options are recognized but ignored for compatibility.
pcp2spark specific options
spark_server (string)
Specify the address on which pcp2spark will listen for
connections from an Apache Spark worker thread. Corresponding
command line option is -g. Defaults to 127.0.0.1.
spark_port (integer)
Specify the port on which pcp2spark will listen for
connections. Corresponding command line option is -p.
Defaults to 44325.
The available command line options are:
-0 precision, --precision-force=precision
Like -P but this option will override per-metric
specifications.
-4 action, --names-change=action
Specify which action to take on receiving a metric names
change event during sampling. These events occur when a PMDA
discovers new metrics sometime after starting up, and informs
running client tools like pcp2spark. Valid values for action
are update (refresh metrics being sampled), ignore (do
nothing - the default behaviour) and abort (exit the program
if such an event occurs).
-5, --ignore-unknown
Silently ignore any metric name that cannot be resolved. At
least one metric must be found for the tool to start.
-8 limit, --limit-filter=limit
Limit results to instances with values above/below limit. A
positive integer will include instances with values at or
above the limit in reporting. A negative integer will
include instances with values at or below the limit in
reporting. A value of zero performs no limit filtering.
This option will not override possible per-metric
specifications. See also -J and -N.
-9 limit, --limit-filter-force=limit
Like -8 but this option will override per-metric
specifications.
-a archive, --archive=archive
Performance metric values are retrieved from the set of
Performance Co-Pilot (PCP) archive files identified by the
archive argument, which is a comma-separated list of names,
each of which may be the base name of an archive or the name
of a directory containing one or more archives.
-A align, --align=align
Force the initial sample to be aligned on the boundary of a
natural time unit align. Refer to PCPIntro(1) for a complete
description of the syntax for align.
--archive-folio=folio
Read metric source archives from the PCP archive folio
created by tools like pmchart(1) or, less often, manually
with mkaf(1).
-b scale, --space-scale=scale
Unit/scale for space (byte) metrics, possible values include
bytes, Kbytes, KB, Mbytes, MB, and so forth. This option
will not override possible per-metric specifications. See
also pmParseUnitsStr(3).
-B scale, --space-scale-force=scale
Like -b but this option will override per-metric
specifications.
-c config, --config=config
Specify the config file or directory to use. In case config
is a directory all files in it ending .conf will be included.
The default is the first found of: ./pcp2spark.conf,
$HOME/.pcp2spark.conf, $HOME/pcp/pcp2spark.conf, and
$PCP_SYSCONF_DIR/pcp2spark.conf. For details, see the above
section and pmrep.conf(5).
--container=container
Fetch performance metrics from the specified container,
either local or remote (see -h).
-C, --check
Exit before reporting any values, but after parsing the
configuration and metrics and printing possible headers.
--daemonize
Daemonize on startup.
-e derived, --derived=derived
Specify derived performance metrics. If derived starts with
a slash (``/'') or with a dot (``.'') it will be interpreted
as a PCP derived metrics configuration file, otherwise it
will be interpreted as comma- or semicolon-separated derived
metric expressions. For complete description of derived
metrics and PCP derived metrics configuration files see
pmLoadDerivedConfig(3) and pmRegisterDerived(3).
Alternatively, using pmrep.conf(5) configuration syntax
allows defining derived metrics as part of metricsets.
In case of issues with derived metrics, review the
aforementioned manual pages in detail and ensure all the
required metrics are available, especially when using
archives. Use -Dderive to see additional debug information
about parsing derived metrics.
-g server, --spark-server=server
pcp2spark local server address.
-G, --no-globals
Do not include global metrics in reporting (see
pmrep.conf(5)).
-h host, --host=host
Fetch performance metrics from pmcd(1) on host, rather than
from the default localhost.
-H, --no-header
Do not print any headers.
-i instances, --instances=instances
Retrieve and report only the specified metric instances. By
default all instances, present and future, are reported.
Refer to pmrep(1) for complete description of this option.
-I, --ignore-incompat
Ignore incompatible metrics. By default incompatible metrics
(that is, their type is unsupported or they cannot be scaled
as requested) will cause pcp2spark to terminate with an error
message. With this option all incompatible metrics are
silently omitted from reporting. This may be especially
useful when requesting non-leaf nodes of the PMNS tree for
reporting.
-j, --live-filter
Perform instance live filtering. This allows capturing all
named instances even if processes are restarted at some point
(unlike without live filtering). Performing live filtering
over a huge number of instances will add some internal
overhead so a bit of user caution is advised. See also -n.
-J rank, --rank=rank
Limit results to highest/lowest ranked instances of set-
valued metrics. A positive integer will include highest
valued instances in reporting. A negative integer will
include lowest valued instances in reporting. A value of
zero performs no ranking. Ranking does not imply sorting,
see -6. See also -8.
-K spec, --spec-local=spec
When fetching metrics from a local context (see -L), the -K
option may be used to control the DSO PMDAs that should be
made accessible. The spec argument conforms to the syntax
described in pmSpecLocalPMDA(3). More than one -K option may
be used.
-L, --local-PMDA
Use a local context to collect metrics from DSO PMDAs on the
local host without PMCD. See also -K.
-m, --include-labels
Include PCP metric labels in the output.
-n, --invert-filter
Perform ranking before live filtering. By default instance
live filtering (when requested, see -j) happens before
instance ranking (when requested, see -J). With this option
the logic is inverted and ranking happens before live
filtering.
-N predicate, --predicate=predicate
Specify a comma-separated list of predicate filter reference
metrics. By default ranking (see -J) happens for each metric
individually. With predicates, ranking is done only for the
specified predicate metrics. When reporting, rest of the
metrics sharing the same instance domain (see PCPIntro(1)) as
the predicate will include only the highest/lowest ranking
instances of the corresponding predicate. Ranking does not
imply sorting, see -6.
So for example, using proc.memory.rss (resident memory size
of process) as the predicate metric together with
proc.io.total_bytes and mem.util.used as metrics to be
reported, only the processes using most/least (as per -J)
memory will be included when reporting total bytes written by
processes. Since mem.util.used is a single-valued metric
(thus not sharing the same instance domain as the process
related metrics), it will be reported as usual.
-O origin, --origin=origin
When reporting archived metrics, start reporting at origin
within the time window (see -S and -T). Refer to PCPIntro(1)
for a complete description of the syntax for origin.
-p port, --spark-port=port
pcp2spark local port.
-P precision, --precision=precision
Use precision for numeric non-integer output values. The
default is to use 3 decimal places (when applicable). This
option will not override possible per-metric specifications.
-q scale, --count-scale=scale
Unit/scale for count metrics, possible values include count x
10^-1, count, count x 10, count x 10^2, and so forth from
10^-8 to 10^7. (These values are currently space-sensitive.)
This option will not override possible per-metric
specifications. See also pmParseUnitsStr(3).
-Q scale, --count-scale-force=scale
Like -q but this option will override per-metric
specifications.
-r, --raw
Output raw metric values, do not convert cumulative counters
to rates. This option will override possible per-metric
specifications.
-R, --raw-prefer
Like -r but this option will not override per-metric
specifications.
-s samples, --samples=samples
The samples argument defines the number of samples to be
retrieved and reported. If samples is 0 or -s is not
specified, pcp2spark will sample and report continuously (in
real time mode) or until the end of the set of PCP archives
(in archive mode). See also -T.
-S starttime, --start=starttime
When reporting archived metrics, the report will be
restricted to those records logged at or after starttime.
Refer to PCPIntro(1) for a complete description of the syntax
for starttime.
-t interval, --interval=interval
Set the reporting interval to something other than the
default 1 second. The interval argument follows the syntax
described in PCPIntro(1), and in the simplest form may be an
unsigned integer (the implied units in this case are
seconds). See also the -T option.
-T endtime, --finish=endtime
When reporting archived metrics, the report will be
restricted to those records logged before or at endtime.
Refer to PCPIntro(1) for a complete description of the syntax
for endtime.
When used to define the runtime before pcp2spark will exit,
if no samples is given (see -s) then the number of reported
samples depends on interval (see -t). If samples is given
then interval will be adjusted to allow reporting of samples
during runtime. In case all of -T, -s, and -t are given,
endtime determines the actual time pcp2spark will run.
-v, --omit-flat
Report only set-valued metrics with instances (e.g.
disk.dev.read) and omit single-valued ``flat'' metrics
without instances (e.g. kernel.all.sysfork). See -i and -I.
-V, --version
Display version number and exit.
-y scale, --time-scale=scale
Unit/scale for time metrics, possible values include nanosec,
ns, microsec, us, millisec, ms, and so forth up to hour, hr.
This option will not override possible per-metric
specifications. See also pmParseUnitsStr(3).
-Y scale, --time-scale-force=scale
Like -y but this option will override per-metric
specifications.
-?, --help
Display usage message and exit.
pcp2spark.conf
pcp2spark configuration file (see -c)
$PCP_SYSCONF_DIR/pmrep/*.conf
system provided default pmrep configuration files
Environment variables with the prefix PCP_ are used to
parameterize the file and directory names used by PCP. On each
installation, the file /etc/pcp.conf contains the local values for
these variables. The $PCP_CONF variable may be used to specify an
alternative configuration file, as described in pcp.conf(5).
For environment variables affecting PCP tools, see
pmGetOptions(3).
Of particular note, PCP_DISCRETE_ONCE can be set to ensure that
discrete metric values are reported only once, unless they change
at some point.
The -D or --debug option enables the output of additional
diagnostics on stderr to help triage problems, although the
information is sometimes cryptic and primarily intended to provide
guidance for developers rather end-users. debug is a comma
separated list of debugging options; use pmdbg(1) with the -l
option to obtain a list of the available debugging options and
their meaning.
PCPIntro(1), mkaf(1), pcp(1), pcp2elasticsearch(1),
pcp2graphite(1), pcp2influxdb(1), pcp2json(1), pcp2xlsx(1),
pcp2xml(1), pcp2zabbix(1), pmcd(1), pminfo(1), pmrep(1),
pmGetOptions(3), pmLoadDerivedConfig(3), pmParseUnitsStr(3),
pmRegisterDerived(3), pmSpecLocalPMDA(3), LOGARCHIVE(5),
pcp.conf(5), pmrep.conf(5) and PMNS(5).
This page is part of the PCP (Performance Co-Pilot) project.
Information about the project can be found at
⟨http://www.pcp.io/⟩. If you have a bug report for this manual
page, send it to [email protected]. This page was obtained from the
project's upstream Git repository
⟨https://github.com/performancecopilot/pcp.git⟩ on 2025-08-11.
(At that time, the date of the most recent commit that was found
in the repository was 2025-08-11.) If you discover any rendering
problems in this HTML version of the page, or you believe there is
a better or more up-to-date source for the page, or you have
corrections or improvements to the information in this COLOPHON
(which is not part of the original manual page), send a mail to
[email protected]
Performance Co-Pilot PCP PCP2SPARK(1)
Pages that refer to this page: pcp2elasticsearch(1), pcp2graphite(1), pcp2influxdb(1), pcp2json(1), pcp2openmetrics(1), pcp2template(1), pcp2xlsx(1), pcp2xml(1), pcp2zabbix(1), pmrep(1), pmrepconf(1)