Namhyung Kim
f123b2d84e
perf stat: Remove prefix argument in print_metric_headers()
...
It always passes a whitespace to the function, thus we can just add it to the
function body. Furthermore, it's only used in the normal output mode.
Well, actually CSV used it but it doesn't need to since we don't care about the
indentation or alignment in the CSV output.
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Acked-by: Ian Rogers <irogers@google.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221123180208.2068936-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-24 09:37:31 -03:00
Namhyung Kim
a7ec1dd2d7
perf stat: Use scnprintf() in prepare_interval()
...
It should not use sprintf() anymore. Let's pass the buffer size and use the
safer scnprintf() instead.
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Acked-by: Ian Rogers <irogers@google.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221123180208.2068936-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-24 09:37:08 -03:00
Namhyung Kim
8e55ae24c0
perf stat: Do not align time prefix in CSV output
...
We don't care about the alignment in the CSV output as it's intended for machine
processing. Let's get rid of it to make the output more compact.
Before:
# perf stat -a --summary -I 1 -x, true
0.001149309,219.20,msec,cpu-clock,219322251,100.00,219.200,CPUs utilized
0.001149309,144,,context-switches,219241902,100.00,656.935,/sec
0.001149309,38,,cpu-migrations,219173705,100.00,173.358,/sec
0.001149309,61,,page-faults,219093635,100.00,278.285,/sec
0.001149309,10679310,,cycles,218746228,100.00,0.049,GHz
0.001149309,6288296,,instructions,218589869,100.00,0.59,insn per cycle
0.001149309,1386904,,branches,218428851,100.00,6.327,M/sec
0.001149309,56863,,branch-misses,218219951,100.00,4.10,of all branches
summary,219.20,msec,cpu-clock,219322251,100.00,20.025,CPUs utilized
summary,144,,context-switches,219241902,100.00,656.935,/sec
summary,38,,cpu-migrations,219173705,100.00,173.358,/sec
summary,61,,page-faults,219093635,100.00,278.285,/sec
summary,10679310,,cycles,218746228,100.00,0.049,GHz
summary,6288296,,instructions,218589869,100.00,0.59,insn per cycle
summary,1386904,,branches,218428851,100.00,6.327,M/sec
summary,56863,,branch-misses,218219951,100.00,4.10,of all branches
After:
0.001148449,224.75,msec,cpu-clock,224870589,100.00,224.747,CPUs utilized
0.001148449,176,,context-switches,224775564,100.00,783.103,/sec
0.001148449,38,,cpu-migrations,224707428,100.00,169.079,/sec
0.001148449,61,,page-faults,224629326,100.00,271.416,/sec
0.001148449,12172071,,cycles,224266368,100.00,0.054,GHz
0.001148449,6901907,,instructions,224108764,100.00,0.57,insn per cycle
0.001148449,1515655,,branches,223946693,100.00,6.744,M/sec
0.001148449,70027,,branch-misses,223735385,100.00,4.62,of all branches
summary,224.75,msec,cpu-clock,224870589,100.00,21.066,CPUs utilized
summary,176,,context-switches,224775564,100.00,783.103,/sec
summary,38,,cpu-migrations,224707428,100.00,169.079,/sec
summary,61,,page-faults,224629326,100.00,271.416,/sec
summary,12172071,,cycles,224266368,100.00,0.054,GHz
summary,6901907,,instructions,224108764,100.00,0.57,insn per cycle
summary,1515655,,branches,223946693,100.00,6.744,M/sec
summary,70027,,branch-misses,223735385,100.00,4.62,of all branches
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Acked-by: Ian Rogers <irogers@google.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221123180208.2068936-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-24 09:36:22 -03:00
Namhyung Kim
6d74ed369d
perf stat: Move summary prefix printing logic in CSV output
...
It matches to the prefix (interval timestamp), so better to have them together.
No functional change intended.
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Acked-by: Ian Rogers <irogers@google.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221123180208.2068936-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-24 09:35:57 -03:00
Namhyung Kim
15792642db
perf stat: Fix cgroup display in JSON output
...
It missed the 'else' keyword after checking json output mode.
Fixes: 41cb875242 ("perf stat: Split print_cgroup() function")
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Acked-by: Ian Rogers <irogers@google.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221123180208.2068936-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-24 09:35:26 -03:00
Namhyung Kim
4dd7ff4a03
perf stat: Add print_aggr_cgroup() for --for-each-cgroup and --topdown
...
Normally, --for-each-cgroup only works with AGGR_GLOBAL. However
the --topdown on some cpu (e.g. Intel Skylake) converts it to the
AGGR_CORE internally.
To support those machines, add print_aggr_cgroup and handle the events
like in print_cgroup_events().
$ perf stat -a --for-each-cgroup system.slice,user.slice --topdown sleep 1
nmi_watchdog enabled with topdown. May give wrong results.
Disable with echo 0 > /proc/sys/kernel/nmi_watchdog
Performance counter stats for 'system wide':
retiring bad speculation frontend bound backend bound
S0-D0-C0 2 system.slice 49.0% -46.6% 31.4%
S0-D0-C1 2 system.slice 55.5% 8.0% 45.5% -9.0%
S0-D0-C2 2 system.slice 87.8% 22.1% 30.3% -40.3%
S0-D0-C3 2 system.slice 53.3% -11.9% 45.2% 13.4%
S0-D0-C0 2 user.slice 123.5% 4.0% 48.5% -75.9%
S0-D0-C1 2 user.slice 19.9% 6.5% 89.9% -16.3%
S0-D0-C2 2 user.slice 29.9% 7.9% 71.3% -9.1
S0-D0-C3 2 user.slice 28.0% 7.2% 43.3% 21.5%
1.004136937 seconds time elapsed
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221114230227.1255976-20-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-16 09:51:23 -03:00
Namhyung Kim
67f8b7eb4e
perf stat: Support --for-each-cgroup and --metric-only
...
When we have events for each cgroup, the metric should be printed for
each cgroup separately. Add print_cgroup_counter() to handle that
situation properly.
Also change print_metric_headers() not to print duplicate headers
by checking cgroups.
$ perf stat -a --for-each-cgroup system.slice,user.slice --metric-only sleep 1
Performance counter stats for 'system wide':
GHz insn per cycle branch-misses of all branches
system.slice 3.792 0.61 3.24%
user.slice 3.661 2.32 0.37%
1.016111516 seconds time elapsed
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221114230227.1255976-19-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-16 09:51:23 -03:00
Namhyung Kim
78670daefd
perf stat: Factor out print_metric_{begin,end}()
...
For the metric-only case, add new functions to handle the start and the
end of each metric display.
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221114230227.1255976-18-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-16 09:51:23 -03:00
Namhyung Kim
2cf38236d9
perf stat: Factor out prefix display
...
The prefix is needed for interval mode to print timestamp at the
beginning of each line. But the it's tricky for the metric only
mode since it doesn't print every evsel and combines the metrics
into a single line.
So it needed to pass 'first' argument to print_counter_aggrdata()
to determine if the current event is being printed at first. This
makes the code hard to read.
Let's move the logic out of the function and do it in the outer
print loop. This would enable further cleanups later.
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221114230227.1255976-17-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-16 09:51:23 -03:00
Namhyung Kim
453279d573
perf stat: Move condition to print_footer()
...
Likewise, I think it'd better to have the control inside the function, and keep
the higher level function clearer.
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221114230227.1255976-16-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-16 09:51:23 -03:00
Namhyung Kim
4c86b664f4
perf stat: Rework header display
...
There are print_header() and print_interval() to print header lines before
actual counter values. Also print_metric_headers() needs to be called for
the metric-only case.
Let's move all these logics to a single place including num_print_iv to
refresh the headers for interval mode.
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221114230227.1255976-15-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-16 09:51:23 -03:00
Namhyung Kim
6108712c07
perf stat: Remove impossible condition
...
The print would run only if metric_only is not set, but it's already in a
block that says it's in metric_only case. And there's no place to change
the setting.
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221114230227.1255976-14-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-16 09:51:23 -03:00
Namhyung Kim
33c4ed4799
perf stat: Cleanup interval print alignment
...
Instead of using magic values, define symbolic constants and use them.
Also add aggr_header_std[] array to simplify aggr_mode handling.
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221114230227.1255976-13-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-16 09:51:22 -03:00
Namhyung Kim
208cbcd21b
perf stat: Factor out prepare_interval()
...
This logic does not print the time directly, but it just puts the
timestamp in the buffer as a prefix. To reduce the confusion, factor
out the code into a separate function.
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221114230227.1255976-12-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-16 09:51:22 -03:00
Namhyung Kim
b2d9832e00
perf stat: Split print_metric_headers() function
...
The print_metric_headers() shows metric headers a little bit for each
mode. Split it out to make the code clearer.
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221114230227.1255976-11-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-16 09:51:22 -03:00
Namhyung Kim
8d500292bd
perf stat: Align cgroup names
...
We don't know how long cgroup name is, but at least we can align short
ones like below.
$ perf stat -a --for-each-cgroup system.slice,user.slice true
Performance counter stats for 'system wide':
0.13 msec cpu-clock system.slice # 0.010 CPUs utilized
4 context-switches system.slice # 31.989 K/sec
1 cpu-migrations system.slice # 7.997 K/sec
0 page-faults system.slice # 0.000 /sec
450,673 cycles system.slice # 3.604 GHz (92.41%)
161,216 instructions system.slice # 0.36 insn per cycle (92.41%)
32,678 branches system.slice # 261.332 M/sec (92.41%)
2,628 branch-misses system.slice # 8.04% of all branches (92.41%)
14.29 msec cpu-clock user.slice # 1.163 CPUs utilized
35 context-switches user.slice # 2.449 K/sec
12 cpu-migrations user.slice # 839.691 /sec
57 page-faults user.slice # 3.989 K/sec
49,683,026 cycles user.slice # 3.477 GHz (99.38%)
110,790,266 instructions user.slice # 2.23 insn per cycle (99.38%)
24,552,255 branches user.slice # 1.718 G/sec (99.38%)
127,779 branch-misses user.slice # 0.52% of all branches (99.38%)
0.012289431 seconds time elapsed
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221114230227.1255976-10-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-16 09:51:22 -03:00
Namhyung Kim
df46a3c92b
perf stat: Add before_metric argument
...
Unfortunately, event running time, percentage and noise data are printed
in different positions in normal output than CSV/JSON. I think it's
better to put such details in where it actually prints.
So add before_metric argument to print_noise() and print_running() and
call them twice before and after the metric.
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221114230227.1255976-9-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-16 09:51:22 -03:00
Namhyung Kim
d6aeb861b1
perf stat: Handle bad events in abs_printout()
...
In the printout() function, it checks if the event is bad (i.e. not
counted or not supported) and print the result. But it does the same
what abs_printout() is doing. So add an argument to indicate the value
is ok or not and use the same function in both cases.
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221114230227.1255976-8-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-16 09:51:22 -03:00
Namhyung Kim
c2019f844e
perf stat: Factor out print_counter_value() function
...
And split it for each output mode like others. I believe it makes the
code simpler and more intuitive. Now abs_printout() becomes just to
call sub-functions.
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221114230227.1255976-7-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-16 09:51:22 -03:00
Namhyung Kim
33b2e2c2ad
perf stat: Split aggr_printout() function
...
The aggr_printout() function is to print aggr_id and count (nr).
Split it for each output mode to simplify the code.
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221114230227.1255976-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-16 09:51:22 -03:00
Namhyung Kim
41cb875242
perf stat: Split print_cgroup() function
...
Likewise, split print_cgroup() for each output mode.
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221114230227.1255976-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-16 09:51:22 -03:00
Namhyung Kim
def99d60df
perf stat: Split print_noise_pct() function
...
Likewise, split print_noise_pct() for each output mode. Although it's
a tiny function, more logic will be added soon so it'd be better split
it and treat it in the same way.
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221114230227.1255976-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-16 09:51:22 -03:00
Namhyung Kim
31bf6aea99
perf stat: Split print_running() function
...
To make the code more obvious and hopefully simpler, factor out the
code for each output mode - stdio, CSV, JSON.
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221114230227.1255976-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-16 09:51:22 -03:00
Namhyung Kim
7565f9617e
perf stat: Add missing separator in the CSV header
...
It should have a comma after 'cpus' for socket and die aggregation mode.
The output of the following command shows the issue.
$ sudo perf stat -a --per-socket -x, --metric-only -I1 true
Before:
+--- here
V
time,socket,cpusGhz,insn per cycle,branch-misses of all branches,
0.000908461,S0,8,0.950,1.65,1.21,
After:
time,socket,cpus,GHz,insn per cycle,branch-misses of all branches,
0.000683094,S0,8,0.593,2.00,0.60,
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Acked-by: Ian Rogers <irogers@google.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221112032244.1077370-12-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-14 16:18:52 -03:00
Namhyung Kim
a80e0e156c
perf stat: Fix summary output in CSV with --metric-only
...
It should not print "summary" for each event when --metric-only is set.
Before:
$ sudo perf stat -a --per-socket --summary -x, --metric-only true
time,socket,cpusGhz,insn per cycle,branch-misses of all branches,
0.000709079,S0,8,0.893,2.40,0.45,
S0,8, summary, summary, summary, summary, summary,0.893, summary,2.40, summary, summary,0.45,
After:
$ sudo perf stat -a --per-socket --summary -x, --metric-only true
time,socket,cpusGHz,insn per cycle,branch-misses of all branches,
0.000882297,S0,8,0.598,1.64,0.64,
summary,S0,8,0.598,1.64,0.64,
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Acked-by: Ian Rogers <irogers@google.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221112032244.1077370-11-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-14 16:18:34 -03:00
Arnaldo Carvalho de Melo
638c335a47
Merge remote-tracking branch 'torvalds/master' into perf/core
...
To pick up fixes that went thru perf/urgent.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-14 16:12:23 -03:00
Namhyung Kim
20e2e31779
perf stat: Consolidate condition to print metrics
...
The pm variable holds an appropriate function to print metrics for CSV
anf JSON already. So we can combine the if statement to simplify the
code a little bit. This also matches to the above condition for non-CSV
and non-JSON case.
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Acked-by: Ian Rogers <irogers@google.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221107213314.3239159-10-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-14 13:21:19 -03:00
Namhyung Kim
f1db5a1d1d
perf stat: Fix condition in print_interval()
...
The num_print_interval and config->interval_clear should be checked
together like other places like later in the function. Otherwise,
the --interval-clear option could print the headers for the CSV or
JSON output unnecessarily.
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Acked-by: Ian Rogers <irogers@google.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221107213314.3239159-9-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-14 13:21:19 -03:00
Namhyung Kim
1cc7642abb
perf stat: Add header for interval in JSON output
...
It missed to print a matching header line for intervals.
Before:
# perf stat -a -e cycles,instructions --metric-only -j -I 500
{"unit" : "insn per cycle"}
{"interval" : 0.500544283}{"metric-value" : "1.96"}
^C
After:
# perf stat -a -e cycles,instructions --metric-only -j -I 500
{"unit" : "sec"}{"unit" : "insn per cycle"}
{"interval" : 0.500515681}{"metric-value" : "2.31"}
^C
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Acked-by: Ian Rogers <irogers@google.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221107213314.3239159-8-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-14 13:21:19 -03:00
Namhyung Kim
6d0a7e394e
perf stat: Do not indent headers for JSON
...
Currently --metric-only with --json indents header lines. This is not
needed for JSON.
$ perf stat -aA --metric-only -j true
{"unit" : "GHz"}{"unit" : "insn per cycle"}{"unit" : "branch-misses of all branches"}
{"cpu" : "0", {"metric-value" : "0.101"}{"metric-value" : "0.86"}{"metric-value" : "1.91"}
{"cpu" : "1", {"metric-value" : "0.102"}{"metric-value" : "0.87"}{"metric-value" : "2.02"}
{"cpu" : "2", {"metric-value" : "0.085"}{"metric-value" : "1.02"}{"metric-value" : "1.69"}
...
Note that the other lines are broken JSON, but it will be handled later.
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Acked-by: Ian Rogers <irogers@google.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221107213314.3239159-7-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-14 13:21:19 -03:00
Namhyung Kim
fdc7d60824
perf stat: Fix --metric-only --json output
...
Currently it prints all metric headers for JSON output. But actually it
skips some metrics with valid_only_metric(). So the output looks like:
$ perf stat --metric-only --json true
{"unit" : "CPUs utilized", "unit" : "/sec", "unit" : "/sec", "unit" : "/sec", "unit" : "GHz", "unit" : "insn per cycle", "unit" : "/sec", "unit" : "branch-misses of all branches"}
{"metric-value" : "3.861"}{"metric-value" : "0.79"}{"metric-value" : "3.04"}
As you can see there are 8 units in the header but only 3 metric-values
are there. It should skip the unused headers as well. Also each unit
should be printed as a separate object like metric values.
With this patch:
$ perf stat --metric-only --json true
{"unit" : "GHz"}{"unit" : "insn per cycle"}{"unit" : "branch-misses of all branches"}
{"metric-value" : "4.166"}{"metric-value" : "0.73"}{"metric-value" : "2.96"}
Fixes: df936cadfb ("perf stat: Add JSON output option")
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Claire Jensen <cjense@google.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221107213314.3239159-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-14 13:21:19 -03:00
Namhyung Kim
f4e55f88da
perf stat: Move common code in print_metric_headers()
...
The struct perf_stat_output_ctx is set in a loop with the same values.
Move the code out of the loop and keep the loop minimal.
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Acked-by: Ian Rogers <irogers@google.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221107213314.3239159-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-14 13:21:19 -03:00
Namhyung Kim
81a02c6577
perf stat: Clear screen only if output file is a tty
...
The --interval-clear option makes perf stat to clear the terminal at
each interval. But it doesn't need to clear the screen when it saves
to a file.
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221107213314.3239159-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-14 13:21:19 -03:00
Namhyung Kim
4ea0be1f0d
perf stat: Increase metric length to align outputs
...
When perf stat is called with very detailed events, the output doesn't
align well like below:
$ sudo perf stat -a -ddd sleep 1
Performance counter stats for 'system wide':
8,020.23 msec cpu-clock # 7.997 CPUs utilized
3,970 context-switches # 494.998 /sec
169 cpu-migrations # 21.072 /sec
586 page-faults # 73.065 /sec
649,568,060 cycles # 0.081 GHz (30.42%)
304,044,345 instructions # 0.47 insn per cycle (38.40%)
60,313,022 branches # 7.520 M/sec (38.89%)
2,766,919 branch-misses # 4.59% of all branches (39.26%)
74,422,951 L1-dcache-loads # 9.279 M/sec (39.39%)
8,025,568 L1-dcache-load-misses # 10.78% of all L1-dcache accesses (39.22%)
3,314,995 LLC-loads # 413.329 K/sec (30.83%)
1,225,619 LLC-load-misses # 36.97% of all LL-cache accesses (30.45%)
<not supported> L1-icache-loads
20,420,493 L1-icache-load-misses # 0.00% of all L1-icache accesses (30.29%)
58,017,947 dTLB-loads # 7.234 M/sec (30.37%)
704,677 dTLB-load-misses # 1.21% of all dTLB cache accesses (30.27%)
234,225 iTLB-loads # 29.204 K/sec (30.29%)
417,166 iTLB-load-misses # 178.10% of all iTLB cache accesses (30.32%)
<not supported> L1-dcache-prefetches
<not supported> L1-dcache-prefetch-misses
1.002947355 seconds time elapsed
Increase the METRIC_LEN by 3 so that it can align properly.
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Acked-by: Ian Rogers <irogers@google.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221107213314.3239159-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-14 13:21:19 -03:00
Athira Rajeev
ad353b710c
perf stat: Fix printing os->prefix in CSV metrics output
...
'perf stat' with CSV output option prints an extra empty string as first
field in metrics output line. Sample output below:
# ./perf stat -x, --per-socket -a -C 1 ls
S0,1,1.78,msec,cpu-clock,1785146,100.00,0.973,CPUs utilized
S0,1,26,,context-switches,1781750,100.00,0.015,M/sec
S0,1,1,,cpu-migrations,1780526,100.00,0.561,K/sec
S0,1,1,,page-faults,1779060,100.00,0.561,K/sec
S0,1,875807,,cycles,1769826,100.00,0.491,GHz
S0,1,85281,,stalled-cycles-frontend,1767512,100.00,9.74,frontend cycles idle
S0,1,576839,,stalled-cycles-backend,1766260,100.00,65.86,backend cycles idle
S0,1,288430,,instructions,1762246,100.00,0.33,insn per cycle
====> ,S0,1,,,,,,,2.00,stalled cycles per insn
The above command line uses field separator as "," via "-x," option and
per-socket option displays socket value as first field. But here the
last line for "stalled cycles per insn" has "," in the beginning.
Sample output using interval mode:
# ./perf stat -I 1000 -x, --per-socket -a -C 1 ls
0.001813453,S0,1,1.87,msec,cpu-clock,1872052,100.00,0.002,CPUs utilized
0.001813453,S0,1,2,,context-switches,1868028,100.00,1.070,K/sec
------
0.001813453,S0,1,85379,,instructions,1856754,100.00,0.32,insn per cycle
====> 0.001813453,,S0,1,,,,,,,1.34,stalled cycles per insn
Above result also has an extra CSV separator after
the timestamp. Patch addresses extra field separator
in the beginning of the metric output line.
The counter stats are displayed by function
"perf_stat__print_shadow_stats" in code
"util/stat-shadow.c". While printing the stats info
for "stalled cycles per insn", function "new_line_csv"
is used as new_line callback.
The new_line_csv function has check for "os->prefix"
and if prefix is not null, it will be printed along
with cvs separator.
Snippet from "new_line_csv":
if (os->prefix)
fprintf(os->fh, "%s%s", os->prefix, config->csv_sep);
Here os->prefix gets printed followed by ","
which is the cvs separator. The os->prefix is
used in interval mode option ( -I ), to print
time stamp on every new line. But prefix is
already set to contain CSV separator when used
in interval mode for CSV option.
Reference: Function "static void print_interval"
Snippet:
sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, config->csv_sep);
Also if prefix is not assigned (if not used with
-I option), it gets set to empty string.
Reference: function printout() in util/stat-display.c
Snippet:
.prefix = prefix ? prefix : "",
Since prefix already set to contain cvs_sep in interval
option, patch removes printing config->csv_sep in
new_line_csv function to avoid printing extra field.
After the patch:
# ./perf stat -x, --per-socket -a -C 1 ls
S0,1,2.04,msec,cpu-clock,2045202,100.00,1.013,CPUs utilized
S0,1,2,,context-switches,2041444,100.00,979.289,/sec
S0,1,0,,cpu-migrations,2040820,100.00,0.000,/sec
S0,1,2,,page-faults,2040288,100.00,979.289,/sec
S0,1,254589,,cycles,2036066,100.00,0.125,GHz
S0,1,82481,,stalled-cycles-frontend,2032420,100.00,32.40,frontend cycles idle
S0,1,113170,,stalled-cycles-backend,2031722,100.00,44.45,backend cycles idle
S0,1,88766,,instructions,2030942,100.00,0.35,insn per cycle
S0,1,,,,,,,1.27,stalled cycles per insn
Fixes: 92a61f6412 ("perf stat: Implement CSV metrics output")
Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com >
Reviewed-By: Kajol Jain <kjain@linux.ibm.com >
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com >
Tested-by: Disha Goel <disgoel@linux.vnet.ibm.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: Ian Rogers <irogers@google.com >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com >
Cc: Michael Ellerman <mpe@ellerman.id.au >
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Link: https://lore.kernel.org/r/20221018085605.63834-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-08 17:52:18 -03:00
Namhyung Kim
84d1b20132
perf stat: Fix crash with --per-node --metric-only in CSV mode
...
The following command will get segfault due to missing aggr_header_csv
for AGGR_NODE:
$ sudo perf stat -a --per-node -x, --metric-only true
Committer testing:
Before this patch:
# perf stat -a --per-node -x, --metric-only true
Segmentation fault (core dumped)
#
After:
# gdb perf
-bash: gdb: command not found
# perf stat -a --per-node -x, --metric-only true
node,Ghz,frontend cycles idle,backend cycles idle,insn per cycle,branch-misses of all branches,
N0,32,0.335,2.10,0.65,0.69,0.03,1.92,
#
Fixes: 86895b480a ("perf stat: Add --per-node agregation support")
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Ian Rogers <irogers@google.com >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: http://lore.kernel.org/lkml/20221107213314.3239159-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-11-08 17:47:33 -03:00
Namhyung Kim
cec94d6963
perf stat: Display percore events properly
...
The recent change in the perf stat broke the percore event display.
Note that the aggr counts are already processed so that the every
sibling thread in the same core will get the per-core counter values.
Check percore evsels and skip the sibling threads in the display.
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Acked-by: Ian Rogers <irogers@google.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Leo Yan <leo.yan@linaro.org >
Cc: Michael Petlan <mpetlan@redhat.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221018020227.85905-20-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-10-27 16:37:25 -03:00
Namhyung Kim
91f85f98da
perf stat: Display event stats using aggr counts
...
Now aggr counts are ready for use. Convert the display routines to use
the aggr counts and update the shadow stat with them. It doesn't need
to aggregate counts or collect aliases anymore during the display. Get
rid of now unused struct perf_aggr_thread_value.
Note that there's a difference in the display order among the aggr mode.
For per-core/die/socket/node aggregation, it shows relevant events in
the same unit together, whereas global/thread/no aggregation it shows
the same events for different units together. So it still uses separate
codes to display them due to the ordering.
One more thing to note is that it breaks per-core event display for now.
The next patch will fix it to have identical output as of now.
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Acked-by: Ian Rogers <irogers@google.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Leo Yan <leo.yan@linaro.org >
Cc: Michael Petlan <mpetlan@redhat.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221018020227.85905-19-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-10-27 16:37:25 -03:00
Namhyung Kim
93d5e70015
perf stat: Use evsel__is_hybrid() more
...
In the stat-display code, it needs to check if the current evsel is
hybrid but it uses perf_pmu__has_hybrid() which can return true for
non-hybrid event too. I think it's better to use evsel__is_hybrid().
Also remove a NULL check for the 'config' parameter in the
hybrid_merge() since it's called after config->no_merge check.
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Acked-by: Ian Rogers <irogers@google.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Leo Yan <leo.yan@linaro.org >
Cc: Michael Petlan <mpetlan@redhat.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20221018020227.85905-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-10-27 16:37:25 -03:00
Athira Rajeev
cad3b68954
perf stat: Fix cpu check to use id.cpu.cpu in aggr_printout()
...
'perf stat' has options to aggregate the counts in different modes like
per socket, per core etc. The function "aggr_printout" in
util/stat-display.c which is used to print the aggregates, has a check
for cpu in case of AGGR_NONE.
This check was originally using condition : "if (id.cpu.cpu > -1)". But
this got changed after commit df936cadfb ("perf stat: Add JSON output
option"), which added option to output json format for different
aggregation modes. After this commit, the check in "aggr_printout" is
using "if (id.core > -1)".
The old code was using "id.cpu.cpu > -1" while the new code is using
"id.core > -1". But since the value printed is id.cpu.cpu, fix this
check to use cpu and not core.
Suggested-by: Ian Rogers <irogers@google.com >
Suggested-by: James Clark <james.clark@arm.com >
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com >
Tested-by: Ian Rogers <irogers@google.com >
Cc: Disha Goel <disgoel@linux.vnet.ibm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kajol Jain <kjain@linux.ibm.com >
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com >
Cc: Michael Ellerman <mpe@ellerman.id.au >
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com >
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20221006114225.66303-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-10-06 14:50:55 -03:00
Namhyung Kim
fa2edc07b4
perf stat: Rename to aggr_cpu_id.thread_idx
...
The aggr_cpu_id has a thread value but it's actually an index to the
thread_map. To reduce possible confusion, rename it to thread_idx.
Suggested-by: Ian Rogers <irogers@google.com >
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Leo Yan <leo.yan@linaro.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20220930202110.845199-8-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-10-06 08:03:53 -03:00
Namhyung Kim
87ae87fd6c
perf stat: Use thread map index for shadow stat
...
When AGGR_THREAD is active, it aggregates the values for each thread.
Previously it used cpu map index which is invalid for AGGR_THREAD so
it had to use separate runtime stats with index 0.
But it can just use the rt_stat with thread_map_index. Rename the
first_shadow_map_idx() and make it return the thread index.
Reviewed-by: James Clark <james.clark@arm.com >
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Leo Yan <leo.yan@linaro.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20220930202110.845199-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-10-06 08:03:53 -03:00
Namhyung Kim
66b76e30ee
perf stat: Convert perf_stat_evsel.res_stats array
...
It uses only one member, no need to have it as an array.
Reviewed-by: James Clark <james.clark@arm.com >
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Leo Yan <leo.yan@linaro.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20220930202110.845199-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-10-06 08:03:53 -03:00
Claire Jensen
df936cadfb
perf stat: Add JSON output option
...
CSV output is tricky to format and column layout changes are susceptible
to breaking parsers. New JSON-formatted output has variable names to
identify fields that are consistent and informative, making the output
parseable.
CSV output example:
1.20,msec,task-clock:u,1204272,100.00,0.697,CPUs utilized
0,,context-switches:u,1204272,100.00,0.000,/sec
0,,cpu-migrations:u,1204272,100.00,0.000,/sec
70,,page-faults:u,1204272,100.00,58.126,K/sec
JSON output example:
{"counter-value" : "3805.723968", "unit" : "msec", "event" :
"cpu-clock", "event-runtime" : 3805731510100.00, "pcnt-running"
: 100.00, "metric-value" : 4.007571, "metric-unit" : "CPUs utilized"}
{"counter-value" : "6166.000000", "unit" : "", "event" :
"context-switches", "event-runtime" : 3805723045100.00, "pcnt-running"
: 100.00, "metric-value" : 1.620191, "metric-unit" : "K/sec"}
{"counter-value" : "466.000000", "unit" : "", "event" :
"cpu-migrations", "event-runtime" : 3805727613100.00, "pcnt-running"
: 100.00, "metric-value" : 122.447136, "metric-unit" : "/sec"}
{"counter-value" : "208.000000", "unit" : "", "event" :
"page-faults", "event-runtime" : 3805726799100.00, "pcnt-running"
: 100.00, "metric-value" : 54.654516, "metric-unit" : "/sec"}
Also added documentation for JSON option.
There is some tidy up of CSV code including a potential memory over run
in the os.nfields set up. To facilitate this an AGGR_MAX value is added.
Committer notes:
Fixed up using PRIu64 to format u64 values, not %lu.
Committer testing:
⬢[acme@toolbox perf]$ perf stat -j sleep 1
{"counter-value" : "0.731750", "unit" : "msec", "event" : "task-clock:u", "event-runtime" : 731750, "pcnt-running" : 100.00, "metric-value" : 0.000731, "metric-unit" : "CPUs utilized"}
{"counter-value" : "0.000000", "unit" : "", "event" : "context-switches:u", "event-runtime" : 731750, "pcnt-running" : 100.00, "metric-value" : 0.000000, "metric-unit" : "/sec"}
{"counter-value" : "0.000000", "unit" : "", "event" : "cpu-migrations:u", "event-runtime" : 731750, "pcnt-running" : 100.00, "metric-value" : 0.000000, "metric-unit" : "/sec"}
{"counter-value" : "75.000000", "unit" : "", "event" : "page-faults:u", "event-runtime" : 731750, "pcnt-running" : 100.00, "metric-value" : 102.494021, "metric-unit" : "K/sec"}
{"counter-value" : "578765.000000", "unit" : "", "event" : "cycles:u", "event-runtime" : 379366, "pcnt-running" : 49.00, "metric-value" : 0.790933, "metric-unit" : "GHz"}
{"counter-value" : "1298.000000", "unit" : "", "event" : "stalled-cycles-frontend:u", "event-runtime" : 768020, "pcnt-running" : 100.00, "metric-value" : 0.224271, "metric-unit" : "frontend cycles idle"}
{"counter-value" : "21984.000000", "unit" : "", "event" : "stalled-cycles-backend:u", "event-runtime" : 768020, "pcnt-running" : 100.00, "metric-value" : 3.798433, "metric-unit" : "backend cycles idle"}
{"counter-value" : "468197.000000", "unit" : "", "event" : "instructions:u", "event-runtime" : 768020, "pcnt-running" : 100.00, "metric-value" : 0.808959, "metric-unit" : "insn per cycle"}
{"metric-value" : 0.046955, "metric-unit" : "stalled cycles per insn"}
{"counter-value" : "103335.000000", "unit" : "", "event" : "branches:u", "event-runtime" : 768020, "pcnt-running" : 100.00, "metric-value" : 141.216262, "metric-unit" : "M/sec"}
{"counter-value" : "2381.000000", "unit" : "", "event" : "branch-misses:u", "event-runtime" : 388654, "pcnt-running" : 50.00, "metric-value" : 2.304156, "metric-unit" : "of all branches"}
⬢[acme@toolbox perf]$
Signed-off-by: Claire Jensen <cjense@google.com >
Acked-by: Namhyung Kim <namhyung@kernel.org >
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Alyssa Ross <hi@alyssa.is >
Cc: Claire Jensen <clairej735@gmail.com >
Cc: Florian Fischer <florian.fischer@muhq.space >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Like Xu <likexu@tencent.com >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Sandipan Das <sandipan.das@amd.com >
Cc: Stephane Eranian <eranian@google.com >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20220805200105.2020995-2-irogers@google.com
Signed-off-by: Ian Rogers <irogers@google.com >
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-08-10 10:43:29 -03:00
Zhengjun Xing
9a0b36266f
perf stat: Add topdown metrics in the default perf stat on the hybrid machine
...
Topdown metrics are missed in the default perf stat on the hybrid machine,
add Topdown metrics in default perf stat for hybrid systems.
Currently, we support the perf metrics Topdown for the p-core PMU in the
perf stat default, the perf metrics Topdown support for e-core PMU will be
implemented later separately. Refactor the code adds two x86 specific
functions. Widen the size of the event name column by 7 chars, so that all
metrics after the "#" become aligned again.
The perf metrics topdown feature is supported on the cpu_core of ADL. The
dedicated perf metrics counter and the fixed counter 3 are used for the
topdown events. Adding the topdown metrics doesn't trigger multiplexing.
Before:
# ./perf stat -a true
Performance counter stats for 'system wide':
53.70 msec cpu-clock # 25.736 CPUs utilized
80 context-switches # 1.490 K/sec
24 cpu-migrations # 446.951 /sec
52 page-faults # 968.394 /sec
2,788,555 cpu_core/cycles/ # 51.931 M/sec
851,129 cpu_atom/cycles/ # 15.851 M/sec
2,974,030 cpu_core/instructions/ # 55.385 M/sec
416,919 cpu_atom/instructions/ # 7.764 M/sec
586,136 cpu_core/branches/ # 10.916 M/sec
79,872 cpu_atom/branches/ # 1.487 M/sec
14,220 cpu_core/branch-misses/ # 264.819 K/sec
7,691 cpu_atom/branch-misses/ # 143.229 K/sec
0.002086438 seconds time elapsed
After:
# ./perf stat -a true
Performance counter stats for 'system wide':
61.39 msec cpu-clock # 24.874 CPUs utilized
76 context-switches # 1.238 K/sec
24 cpu-migrations # 390.968 /sec
52 page-faults # 847.097 /sec
2,753,695 cpu_core/cycles/ # 44.859 M/sec
903,899 cpu_atom/cycles/ # 14.725 M/sec
2,927,529 cpu_core/instructions/ # 47.690 M/sec
428,498 cpu_atom/instructions/ # 6.980 M/sec
581,299 cpu_core/branches/ # 9.470 M/sec
83,409 cpu_atom/branches/ # 1.359 M/sec
13,641 cpu_core/branch-misses/ # 222.216 K/sec
8,008 cpu_atom/branch-misses/ # 130.453 K/sec
14,761,308 cpu_core/slots/ # 240.466 M/sec
3,288,625 cpu_core/topdown-retiring/ # 22.3% retiring
1,323,323 cpu_core/topdown-bad-spec/ # 9.0% bad speculation
5,477,470 cpu_core/topdown-fe-bound/ # 37.1% frontend bound
4,679,199 cpu_core/topdown-be-bound/ # 31.7% backend bound
646,194 cpu_core/topdown-heavy-ops/ # 4.4% heavy operations # 17.9% light operations
1,244,999 cpu_core/topdown-br-mispredict/ # 8.4% branch mispredict # 0.5% machine clears
3,891,800 cpu_core/topdown-fetch-lat/ # 26.4% fetch latency # 10.7% fetch bandwidth
1,879,034 cpu_core/topdown-mem-bound/ # 12.7% memory bound # 19.0% Core bound
0.002467839 seconds time elapsed
Reviewed-by: Kan Liang <kan.liang@linux.intel.com >
Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Acked-by: Ian Rogers <irogers@google.com >
Acked-by: Namhyung Kim <namhyung@kernel.org >
Cc: Alexander Shishkin <alexander.shishkin@intel.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Link: https://lore.kernel.org/r/20220721065706.2886112-6-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-07-29 13:43:34 -03:00
Ian Rogers
0b9462d0ac
perf stat: Make use of index clearer with perf_counts
...
Try to disambiguate further when perf_counts is being accessed it is
with a cpu map index rather than a CPU.
Signed-off-by: Ian Rogers <irogers@google.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Alexei Starovoitov <ast@kernel.org >
Cc: Andrii Nakryiko <andrii@kernel.org >
Cc: Daniel Borkmann <daniel@iogearbox.net >
Cc: Dave Marchevsky <davemarchevsky@fb.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: John Fastabend <john.fastabend@gmail.com >
Cc: KP Singh <kpsingh@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Lv Ruyi <lv.ruyi@zte.com.cn >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Martin KaFai Lau <kafai@fb.com >
Cc: Michael Petlan <mpetlan@redhat.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Quentin Monnet <quentin@isovalent.com >
Cc: Song Liu <songliubraving@fb.com >
Cc: Stephane Eranian <eranian@google.com >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Cc: Yonghong Song <yhs@fb.com >
Cc: bpf@vger.kernel.org
Cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/20220519032005.1273691-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-05-23 09:54:02 -03:00
Ian Rogers
17b3867d97
Revert "perf stat: Support metrics with hybrid events"
...
This reverts commit 60344f1a9a .
Hybrid metrics place a PMU at the end of the parse string. This is also
where tool events are placed. The behavior of the parse string isn't
clear and so revert the change for now.
Signed-off-by: Ian Rogers <irogers@google.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: Florian Fischer <florian.fischer@muhq.space >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: John Garry <john.garry@huawei.com >
Cc: Kim Phillips <kim.phillips@amd.com >
Cc: Madhavan Srinivasan <maddy@linux.ibm.com >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Riccardo Mancini <rickyman7@gmail.com >
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com >
Cc: Stephane Eranian <eranian@google.com >
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Link: https://lore.kernel.org/r/20220507053410.3798748-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-05-09 10:09:44 -03:00
Ian Rogers
570c44a01b
perf stat: Avoid printing cpus with no counters
...
perf_evlist's user_requested_cpus can contain CPUs not present in any
evsel's cpus, for example uncore counters. Avoid printing the prefix and
trailing \n until the first valid counter is encountered.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com >
Signed-off-by: Ian Rogers <irogers@google.com >
Cc: Alexander Antonov <alexander.antonov@linux.intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Alexei Starovoitov <ast@kernel.org >
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: Andrii Nakryiko <andrii@kernel.org >
Cc: Daniel Borkmann <daniel@iogearbox.net >
Cc: German Gomez <german.gomez@arm.com >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: John Fastabend <john.fastabend@gmail.com >
Cc: John Garry <john.garry@huawei.com >
Cc: KP Singh <kpsingh@kernel.org >
Cc: Kajol Jain <kjain@linux.ibm.com >
Cc: Leo Yan <leo.yan@linaro.org >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Martin KaFai Lau <kafai@fb.com >
Cc: Mathieu Poirier <mathieu.poirier@linaro.org >
Cc: Mike Leach <mike.leach@linaro.org >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Riccardo Mancini <rickyman7@gmail.com >
Cc: Song Liu <songliubraving@fb.com >
Cc: Stephane Eranian <eranian@google.com >
Cc: Suzuki Poulouse <suzuki.poulose@arm.com >
Cc: Will Deacon <will@kernel.org >
Cc: Yonghong Song <yhs@fb.com >
Link: http://lore.kernel.org/lkml/20220503041757.2365696-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-05-03 11:19:17 -03:00
Zhengjun Xing
2c8e64514a
perf stat: Merge event counts from all hybrid PMUs
...
For hybrid events, by default stat aggregates and reports the event counts
per pmu.
# ./perf stat -e cycles -a sleep 1
Performance counter stats for 'system wide':
14,066,877,268 cpu_core/cycles/
6,814,443,147 cpu_atom/cycles/
1.002760625 seconds time elapsed
Sometimes, it's also useful to aggregate event counts from all PMUs.
Create a new option '--hybrid-merge' to enable that behavior and report
the counts without PMUs.
# ./perf stat -e cycles -a --hybrid-merge sleep 1
Performance counter stats for 'system wide':
20,732,982,512 cycles
1.002776793 seconds time elapsed
Reviewed-by: Kan Liang <kan.liang@linux.intel.com >
Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@intel.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Link: https://lore.kernel.org/r/20220422065635.767648-2-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-04-22 14:23:35 -03:00
Zhengjun Xing
60344f1a9a
perf stat: Support metrics with hybrid events
...
One metric such as 'Kernel_Utilization' may be from different PMUs and
consists of different events.
For core,
Kernel_Utilization = cpu_clk_unhalted.thread:k / cpu_clk_unhalted.thread
For atom,
Kernel_Utilization = cpu_clk_unhalted.core:k / cpu_clk_unhalted.core
The metric group string for core is:
'{cpu_clk_unhalted.thread/metric-id=cpu_clk_unhalted.thread:k/k,cpu_clk_unhalted.thread/metric-id=cpu_clk_unhalted.thread/}:W'
It's internally expanded to:
'{cpu_clk_unhalted.thread_p/metric-id=cpu_clk_unhalted.thread_p:k/k,cpu_clk_unhalted.thread/metric-id=cpu_clk_unhalted.thread/}:W#cpu_core'
The metric group string for atom is:
'{cpu_clk_unhalted.core/metric-id=cpu_clk_unhalted.core:k/k,cpu_clk_unhalted.core/metric-id=cpu_clk_unhalted.core/}:W'
It's internally expanded to:
'{cpu_clk_unhalted.core/metric-id=cpu_clk_unhalted.core:k/k,cpu_clk_unhalted.core/metric-id=cpu_clk_unhalted.core/}:W#cpu_atom'
That means the group "{cpu_clk_unhalted.thread:k,cpu_clk_unhalted.thread}:W"
is from cpu_core PMU and the group "{cpu_clk_unhalted.core:k,cpu_clk_unhalted.core}"
is from cpu_atom PMU. And then next, check if the events in the group are
valid on that PMU. If one event is not valid on that PMU, the associated
group would be removed internally.
In this example, cpu_clk_unhalted.thread is valid on cpu_core and
cpu_clk_unhalted.core is valid on cpu_atom. So the checks for these two
groups are passed.
Before:
# ./perf stat -M Kernel_Utilization -a sleep 1
WARNING: events in group from different hybrid PMUs!
WARNING: grouped events cpus do not match, disabling group:
anon group { CPU_CLK_UNHALTED.THREAD_P:k, CPU_CLK_UNHALTED.THREAD_P:k, CPU_CLK_UNHALTED.THREAD, CPU_CLK_UNHALTED.THREAD }
Performance counter stats for 'system wide':
17,639,501 cpu_atom/CPU_CLK_UNHALTED.CORE/ # 1.00 Kernel_Utilization
17,578,757 cpu_atom/CPU_CLK_UNHALTED.CORE:k/
1,005,350,226 ns duration_time
43,012,352 cpu_core/CPU_CLK_UNHALTED.THREAD_P:k/ # 0.99 Kernel_Utilization
17,608,010 cpu_atom/CPU_CLK_UNHALTED.THREAD_P:k/
43,608,755 cpu_core/CPU_CLK_UNHALTED.THREAD/
17,630,838 cpu_atom/CPU_CLK_UNHALTED.THREAD/
1,005,350,226 ns duration_time
1.005350226 seconds time elapsed
After:
# ./perf stat -M Kernel_Utilization -a sleep 1
Performance counter stats for 'system wide':
17,981,895 CPU_CLK_UNHALTED.CORE [cpu_atom] # 1.00 Kernel_Utilization
17,925,405 CPU_CLK_UNHALTED.CORE:k [cpu_atom]
1,004,811,366 ns duration_time
41,246,425 CPU_CLK_UNHALTED.THREAD_P:k [cpu_core] # 0.99 Kernel_Utilization
41,819,129 CPU_CLK_UNHALTED.THREAD [cpu_core]
1,004,811,366 ns duration_time
1.004811366 seconds time elapsed
Reviewed-by: Kan Liang <kan.liang@linux.intel.com >
Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@intel.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Link: https://lore.kernel.org/r/20220422065635.767648-1-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2022-04-22 14:23:17 -03:00