Ian Rogers
1a8c2e0177
perf mem-info: Add reference count checking
...
Add reference count checking and switch 'struct mem_info' usage to use
accessor functions.
Signed-off-by: Ian Rogers <irogers@google.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com >
Cc: Ben Gainey <ben.gainey@arm.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: K Prateek Nayak <kprateek.nayak@amd.com >
Cc: Kajol Jain <kjain@linux.ibm.com >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Li Dong <lidong@vivo.com >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Oliver Upton <oliver.upton@linux.dev >
Cc: Paran Lee <p4ranlee@gmail.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Ravi Bangoria <ravi.bangoria@amd.com >
Cc: Sun Haiyong <sunhaiyong@loongson.cn >
Cc: Tim Chen <tim.c.chen@linux.intel.com >
Cc: Yanteng Si <siyanteng@loongson.cn >
Cc: Yicong Yang <yangyicong@hisilicon.com >
Link: https://lore.kernel.org/r/20240507183545.1236093-8-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2024-05-07 18:06:44 -03:00
Ian Rogers
ad3003a65a
perf mem-info: Move mem-info out of mem-events and symbol
...
Move mem-info to its own header rather than having it split between
mem-events and symbol.
Signed-off-by: Ian Rogers <irogers@google.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com >
Cc: Ben Gainey <ben.gainey@arm.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: K Prateek Nayak <kprateek.nayak@amd.com >
Cc: Kajol Jain <kjain@linux.ibm.com >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Li Dong <lidong@vivo.com >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Oliver Upton <oliver.upton@linux.dev >
Cc: Paran Lee <p4ranlee@gmail.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Ravi Bangoria <ravi.bangoria@amd.com >
Cc: Sun Haiyong <sunhaiyong@loongson.cn >
Cc: Tim Chen <tim.c.chen@linux.intel.com >
Cc: Yanteng Si <siyanteng@loongson.cn >
Cc: Yicong Yang <yangyicong@hisilicon.com >
Link: https://lore.kernel.org/r/20240507183545.1236093-7-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2024-05-07 18:06:44 -03:00
Ian Rogers
37862d6fdc
perf dso: Use container_of() to avoid a pointer in 'struct dso_data'
...
The dso pointer in 'struct dso_data' is necessary for reference count
checking to account for the dso_data forming a global list of open dso's
with references to the dso.
The dso pointer also allows for the indirection that reference count
checking needs. Outside of reference count checking the indirection
isn't needed and container_of() is more efficient and saves space.
The reference count won't be increased by placing items onto the global
list, matching how things were before the reference count checking
change, but we assert the dso is in dsos holding it live (and that the
set of open dsos is a subset of all dsos for the machine).
Update the DSO data tests so that they use a dsos struct to make the
invariant true.
Signed-off-by: Ian Rogers <irogers@google.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com >
Cc: Changbin Du <changbin.du@huawei.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Tiezhu Yang <yangtiezhu@loongson.cn >
Link: https://lore.kernel.org/r/20240506180104.485674-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2024-05-06 16:08:31 -03:00
Ian Rogers
ee756ef749
perf dso: Add reference count checking and accessor functions
...
Add reference count checking to struct dso, this can help with
implementing correct reference counting discipline. To avoid
RC_CHK_ACCESS everywhere, add accessor functions for the variables in
struct dso.
The majority of the change is mechanical in nature and not easy to
split up.
Committer testing:
'perf test' up to this patch shows no regressions.
But:
util/symbol.c: In function ‘dso__load_bfd_symbols’:
util/symbol.c:1683:9: error: too few arguments to function ‘dso__set_adjust_symbols’
1683 | dso__set_adjust_symbols(dso);
| ^~~~~~~~~~~~~~~~~~~~~~~
In file included from util/symbol.c:21:
util/dso.h:268:20: note: declared here
268 | static inline void dso__set_adjust_symbols(struct dso *dso, bool val)
| ^~~~~~~~~~~~~~~~~~~~~~~
make[6]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:106: /tmp/tmp.ZWHbQftdN6/util/symbol.o] Error 1
MKDIR /tmp/tmp.ZWHbQftdN6/tests/workloads/
make[6]: *** Waiting for unfinished jobs....
This was updated:
- symbols__fixup_end(&dso->symbols, false);
- symbols__fixup_duplicate(&dso->symbols);
- dso->adjust_symbols = 1;
+ symbols__fixup_end(dso__symbols(dso), false);
+ symbols__fixup_duplicate(dso__symbols(dso));
+ dso__set_adjust_symbols(dso);
But not build tested with BUILD_NONDISTRO and libbfd devel files installed
(binutils-devel on fedora).
Add the missing argument:
symbols__fixup_end(dso__symbols(dso), false);
symbols__fixup_duplicate(dso__symbols(dso));
- dso__set_adjust_symbols(dso);
+ dso__set_adjust_symbols(dso, true);
Signed-off-by: Ian Rogers <irogers@google.com >
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com >
Cc: Ben Gainey <ben.gainey@arm.com >
Cc: Changbin Du <changbin.du@huawei.com >
Cc: Chengen Du <chengen.du@canonical.com >
Cc: Colin Ian King <colin.i.king@gmail.com >
Cc: Dima Kogan <dima@secretsauce.net >
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: K Prateek Nayak <kprateek.nayak@amd.com >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Leo Yan <leo.yan@linux.dev >
Cc: Li Dong <lidong@vivo.com >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Masami Hiramatsu <mhiramat@kernel.org >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Paran Lee <p4ranlee@gmail.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Song Liu <song@kernel.org >
Cc: Sun Haiyong <sunhaiyong@loongson.cn >
Cc: Thomas Richter <tmricht@linux.ibm.com >
Cc: Tiezhu Yang <yangtiezhu@loongson.cn >
Cc: Yanteng Si <siyanteng@loongson.cn >
Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com >
Link: https://lore.kernel.org/r/20240504213803.218974-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2024-05-06 15:28:49 -03:00
Ian Rogers
6debc5aa32
perf test pmu: Test all sysfs PMU event names are the same case
...
Being either lower or upper case means event name probes can avoid
scanning the directory doing case insensitive comparisons, just the
lower or upper case version of the name can be checked for
existence.
For the majority of PMUs event names are all lower case, upper case
names are present on S390.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com >
Signed-off-by: Ian Rogers <irogers@google.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Bjorn Helgaas <bhelgaas@google.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: James Clark <james.clark@arm.com >
Cc: Jing Zhang <renyu.zj@linux.alibaba.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Jonathan Corbet <corbet@lwn.net >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Randy Dunlap <rdunlap@infradead.org >
Cc: Ravi Bangoria <ravi.bangoria@amd.com >
Cc: Thomas Richter <tmricht@linux.ibm.com >
Link: https://lore.kernel.org/r/20240502213507.2339733-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2024-05-03 17:08:20 -03:00
Ian Rogers
18eb2ca8c1
perf test pmu: Add an eagerly loaded event test
...
Allow events/aliases to be eagerly loaded for a PMU. Factor out the
pmu_aliases_parse to allow this.
Parse a test event and check it configures the attribute as expected.
There is overlap with the parse-events tests, but this test is done with
a PMU created in a temp directory and doesn't rely on PMUs in sysfs.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com >
Signed-off-by: Ian Rogers <irogers@google.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Bjorn Helgaas <bhelgaas@google.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: James Clark <james.clark@arm.com >
Cc: Jing Zhang <renyu.zj@linux.alibaba.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Jonathan Corbet <corbet@lwn.net >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Randy Dunlap <rdunlap@infradead.org >
Cc: Ravi Bangoria <ravi.bangoria@amd.com >
Cc: Thomas Richter <tmricht@linux.ibm.com >
Link: https://lore.kernel.org/r/20240502213507.2339733-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2024-05-03 17:08:20 -03:00
Ian Rogers
aa1551f299
perf test pmu: Refactor format test and exposed test APIs
...
In tests/pmu.c, make a common utility that creates a PMU in a mkdtemp
directory and uses regular PMU parsing logic to load that PMU. Formats
must still be eagerly loaded as by default the PMU code assumes devices
are going to be in sysfs.
In util/pmu.[ch], hide perf_pmu__format_parse but add the eager argument
to perf_pmu__lookup called by perf_pmus__add_test_pmu. Later patches
will eagerly load other non-sysfs files when eager loading is enabled.
In tests/pmu.c, rather than manually constructing a list of term
arguments, just use the term parsing code from a string.
Add more comments and debug logging.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com >
Signed-off-by: Ian Rogers <irogers@google.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Bjorn Helgaas <bhelgaas@google.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: James Clark <james.clark@arm.com >
Cc: Jing Zhang <renyu.zj@linux.alibaba.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Jonathan Corbet <corbet@lwn.net >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Randy Dunlap <rdunlap@infradead.org >
Cc: Ravi Bangoria <ravi.bangoria@amd.com >
Cc: Thomas Richter <tmricht@linux.ibm.com >
Link: https://lore.kernel.org/r/20240502213507.2339733-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2024-05-03 17:08:20 -03:00
Ian Rogers
97c48ea8ff
perf test pmu-events: Make it clearer that pmu-events tests JSON events
...
Add JSON to the test name.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com >
Signed-off-by: Ian Rogers <irogers@google.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Bjorn Helgaas <bhelgaas@google.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: James Clark <james.clark@arm.com >
Cc: Jing Zhang <renyu.zj@linux.alibaba.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Jonathan Corbet <corbet@lwn.net >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Randy Dunlap <rdunlap@infradead.org >
Cc: Ravi Bangoria <ravi.bangoria@amd.com >
Cc: Thomas Richter <tmricht@linux.ibm.com >
Link: https://lore.kernel.org/r/20240502213507.2339733-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2024-05-03 17:08:04 -03:00
Arnaldo Carvalho de Melo
8c618b58c8
perf test: Reintroduce -p/--parallel and make -S/--sequential the default
...
We can't default to doing parallel tests as there are tests that compete
for the same resources and thus clash, for instance tests that put in
place 'perf probe' probes, that clean the probes without regard to other
tests needs, ARM64 coresight tests, Intel PT ones, etc.
So reintroduce --p/--parallel and make -S/--sequential the default.
We need to come up with infrastructure that state which tests can't run
in parallel because they need exclusive access to some resource,
something as simple as "probes" that would then avoid 'perf probe' tests
from running while other such test is running, or make the tests more
resilient, till then we can't use parallel mode as default.
While at it, document all these options in the 'perf test' man page.
Reported-by: Adrian Hunter <adrian.hunter@intel.com >
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com >
Reported-by: James Clark <james.clark@arm.com >
Reviewed-by: Ian Rogers <irogers@google.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Link: https://lore.kernel.org/lkml/Ziwm18BqIn_vc1vn@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2024-04-26 22:28:08 -03:00
Namhyung Kim
281bf8f63f
perf test: Add a new test for 'perf annotate'
...
Add a basic 'perf annotate' test:
$ ./perf test annotate -vv
76: perf annotate basic tests:
--- start ---
test child forked, pid 846989
fbcd0-fbd55 l noploop
perf does have symbol 'noploop'
Basic perf annotate test
: 0 0xfbcd0 <noploop>:
0.00 : fbcd0: pushq %rbp
0.00 : fbcd1: movq %rsp, %rbp
0.00 : fbcd4: pushq %r12
0.00 : fbcd6: pushq %rbx
0.00 : fbcd7: movl $1, %ebx
0.00 : fbcdc: subq $0x10, %rsp
0.00 : fbce0: movq %fs:0x28, %rax
0.00 : fbce9: movq %rax, -0x18(%rbp)
0.00 : fbced: xorl %eax, %eax
0.00 : fbcef: testl %edi, %edi
0.00 : fbcf1: jle 0xfbd04
0.00 : fbcf3: movq (%rsi), %rdi
0.00 : fbcf6: movl $0xa, %edx
0.00 : fbcfb: xorl %esi, %esi
0.00 : fbcfd: callq 0x41920
0.00 : fbd02: movl %eax, %ebx
0.00 : fbd04: leaq -0x7b(%rip), %r12 # fbc90 <sighandler>
0.00 : fbd0b: movl $2, %edi
0.00 : fbd10: movq %r12, %rsi
0.00 : fbd13: callq 0x40a00
0.00 : fbd18: movl $0xe, %edi
0.00 : fbd1d: movq %r12, %rsi
0.00 : fbd20: callq 0x40a00
0.00 : fbd25: movl %ebx, %edi
0.00 : fbd27: callq 0x407c0
0.10 : fbd2c: movl 0x89785e(%rip), %eax # 993590 <done>
0.00 : fbd32: testl %eax, %eax
99.90 : fbd34: je 0xfbd2c
0.00 : fbd36: movq -0x18(%rbp), %rax
0.00 : fbd3a: subq %fs:0x28, %rax
0.00 : fbd43: jne 0xfbd50
0.00 : fbd45: addq $0x10, %rsp
0.00 : fbd49: xorl %eax, %eax
0.00 : fbd4b: popq %rbx
0.00 : fbd4c: popq %r12
0.00 : fbd4e: popq %rbp
0.00 : fbd4f: retq
0.00 : fbd50: callq 0x407e0
0.00 : fbcd0: pushq %rbp
0.00 : fbcd1: movq %rsp, %rbp
0.00 : fbcd4: pushq %r12
0.00 : fbcd0: push %rbp
0.00 : fbcd1: mov %rsp,%rbp
0.00 : fbcd4: push %r12
Basic annotate test [Success]
---- end(0) ----
76: perf annotate basic tests : Ok
Reviewed-by: Ian Rogers <irogers@google.com >
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Link: https://lore.kernel.org/r/20240424001231.849972-1-namhyung@kernel.org
[ Improved a bit the error messages ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2024-04-26 22:07:21 -03:00
Ian Rogers
78fae2071f
perf tests parse-events: Use "branches" rather than "cache-references"
...
Switch from "cache-references" to "branches" in test as Intel has a
sysfs event for "cache-references" and changing the priority for sysfs
over legacy causes the test to fail.
Signed-off-by: Ian Rogers <irogers@google.com >
Reviewed-by: Kan Liang <kan.liang@linux.intel.com >
Tested-by: Atish Patra <atishp@rivosinc.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Beeman Strong <beeman@rivosinc.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Link: https://lore.kernel.org/r/20240416061533.921723-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2024-04-26 22:07:20 -03:00
Adrian Hunter
e0c48bf9e8
perf scripts python: Add a script to run instances of 'perf script' in parallel
...
Add a Python script to run a perf script command multiple times in
parallel, using perf script options --cpu and --time so that each job
processes a different chunk of the data.
Extend perf script tests to test also the new script.
The script supports the use of normal 'perf script' options like
--dlfilter and --script, so that the benefit of running parallel jobs
naturally extends to them also. In addition, a command can be provided
(refer --pipe-to option) to pipe standard output to a custom command.
Refer to the script's own help text at the end of the patch for more
details.
The script is useful for Intel PT traces, that can be efficiently
decoded by 'perf script' when split by CPU and/or time ranges. Running
jobs in parallel can decrease the overall decoding time.
Committer testing:
Ian reported that shellcheck found some issues, I installed it as there
are no warnings about it not being available, but when available it
fails the build with:
TEST /tmp/build/perf-tools-next/tests/shell/script.sh.shellcheck_log
CC /tmp/build/perf-tools-next/util/header.o
In tests/shell/script.sh line 20:
rm -rf "${temp_dir}/"*
^-------------^ SC2115 (warning): Use "${var:?}" to ensure this never expands to /* .
In tests/shell/script.sh line 83:
output1_dir="${temp_dir}/output1"
^---------^ SC2034 (warning): output1_dir appears unused. Verify use (or export if used externally).
In tests/shell/script.sh line 84:
output2_dir="${temp_dir}/output2"
^---------^ SC2034 (warning): output2_dir appears unused. Verify use (or export if used externally).
In tests/shell/script.sh line 86:
python3 "${pp}" -o "${output_dir}" --jobs 4 --verbose -- perf script -i "${perf_data}"
^-----------^ SC2154 (warning): output_dir is referenced but not assigned (did you mean 'output1_dir'?).
For more information:
https://www.shellcheck.net/wiki/SC2034 -- output1_dir appears unused. Verif...
https://www.shellcheck.net/wiki/SC2115 -- Use "${var:?}" to ensure this nev...
https://www.shellcheck.net/wiki/SC2154 -- output_dir is referenced but not ...
Did these fixes:
- rm -rf "${temp_dir}/"*
+ rm -rf "${temp_dir:?}/"*
And:
@@ -83,8 +83,8 @@ test_parallel_perf()
output1_dir="${temp_dir}/output1"
output2_dir="${temp_dir}/output2"
perf record -o "${perf_data}" --sample-cpu uname
- python3 "${pp}" -o "${output_dir}" --jobs 4 --verbose -- perf script -i "${perf_data}"
- python3 "${pp}" -o "${output_dir}" --jobs 4 --verbose --per-cpu -- perf script -i "${perf_data}"
+ python3 "${pp}" -o "${output1_dir}" --jobs 4 --verbose -- perf script -i "${perf_data}"
+ python3 "${pp}" -o "${output2_dir}" --jobs 4 --verbose --per-cpu -- perf script -i "${perf_data}"
After that:
root@number:~# perf test -vv "perf script tests"
97: perf script tests:
--- start ---
test child forked, pid 4084139
DB test
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.032 MB /tmp/perf-test-script.T4MJDr0L6J/perf.data (7 samples) ]
<SNIP>
DB test [Success]
parallel-perf test
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.034 MB /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data (7 samples) ]
Starting: perf script --time=,91898.301878499 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Starting: perf script --time=91898.301878500,91898.301905999 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Starting: perf script --time=91898.301906000,91898.301933499 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Starting: perf script --time=91898.301933500, -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Finished: perf script --time=91898.301878500,91898.301905999 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Finished: perf script --time=91898.301906000,91898.301933499 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
There are 4 jobs: 2 completed, 2 running
Finished: perf script --time=,91898.301878499 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Finished: perf script --time=91898.301933500, -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
There are 4 jobs: 4 completed, 0 running
All jobs finished successfully
parallel-perf.py done
Starting: perf script --cpu=0 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Starting: perf script --cpu=1 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Starting: perf script --cpu=2 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Starting: perf script --cpu=3 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Finished: perf script --cpu=0 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Finished: perf script --cpu=1 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Finished: perf script --cpu=2 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Finished: perf script --cpu=3 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
There are 28 jobs: 4 completed, 0 running
Starting: perf script --cpu=4 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Starting: perf script --cpu=5 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Starting: perf script --cpu=6 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Starting: perf script --cpu=7 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Finished: perf script --cpu=4 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Finished: perf script --cpu=5 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Finished: perf script --cpu=6 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Finished: perf script --cpu=7 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
There are 28 jobs: 8 completed, 0 running
Starting: perf script --cpu=8 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Starting: perf script --cpu=9 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Starting: perf script --cpu=10 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Starting: perf script --cpu=11 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Finished: perf script --cpu=8 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Finished: perf script --cpu=9 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Finished: perf script --cpu=10 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Finished: perf script --cpu=11 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
There are 28 jobs: 12 completed, 0 running
Starting: perf script --cpu=12 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Starting: perf script --cpu=13 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Starting: perf script --cpu=14 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Starting: perf script --cpu=15 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Finished: perf script --cpu=12 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Finished: perf script --cpu=13 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Finished: perf script --cpu=14 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Finished: perf script --cpu=15 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
There are 28 jobs: 16 completed, 0 running
Starting: perf script --cpu=16 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Starting: perf script --cpu=17 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Starting: perf script --cpu=18 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Starting: perf script --cpu=19 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Finished: perf script --cpu=16 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Finished: perf script --cpu=17 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Finished: perf script --cpu=18 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Finished: perf script --cpu=19 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
There are 28 jobs: 20 completed, 0 running
Starting: perf script --cpu=20 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Starting: perf script --cpu=21 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Starting: perf script --cpu=22 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Starting: perf script --cpu=23 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Finished: perf script --cpu=20 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Finished: perf script --cpu=21 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Finished: perf script --cpu=22 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Finished: perf script --cpu=23 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
There are 28 jobs: 24 completed, 0 running
Starting: perf script --cpu=24 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Starting: perf script --cpu=25 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Starting: perf script --cpu=26 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Starting: perf script --cpu=27 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Finished: perf script --cpu=25 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Finished: perf script --cpu=26 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
Finished: perf script --cpu=27 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
There are 28 jobs: 27 completed, 1 running
Finished: perf script --cpu=24 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data
There are 28 jobs: 28 completed, 0 running
All jobs finished successfully
parallel-perf.py done
parallel-perf test [Success]
--- Cleaning up ---
---- end(0) ----
97: perf script tests : Ok
root@number:~#
Reviewed-by: Andi Kleen <ak@linux.intel.com >
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com >
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Namhyung Kim <namhyung@kernel.org >
Link: https://lore.kernel.org/r/20240423133248.10206-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2024-04-26 22:07:19 -03:00
Arnaldo Carvalho de Melo
7255fcc80d
perf tests shell kprobes: Add missing description as used by 'perf test' output
...
Before:
root@x1:~# perf test 76
76: SPDX-License-Identifier: GPL-2.0 : Ok
root@x1:~#
After:
root@x1:~# perf test 76
76: Add 'perf probe's, list and remove them. : Ok
root@x1:~#
Reviewed-by: Ian Rogers <irogers@google.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kajol Jain <kjain@linux.ibm.com >
Cc: Michael Petlan <mpetlan@redhat.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Veronika Molnarova <vmolnaro@redhat.com >
Link: https://lore.kernel.org/lkml/ZigRDKUGkcDqD-yW@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2024-04-26 22:07:19 -03:00
James Clark
10b6ee3b59
perf test shell arm_coresight: Increase buffer size for Coresight basic tests
...
These tests record in a mode that includes kernel trace but look for
samples of a userspace process. This makes them sensitive to any kernel
compilation options that increase the amount of time spent in the
kernel. If the trace buffer is completely filled before userspace is
reached then the test will fail. Double the buffer size to fix this.
The other tests in the same file aren't sensitive to this for various
reasons, for example the iterate devices test filters by userspace trace
only. But in order to keep coverage of all the modes, increase the
buffer size rather than filtering by userspace for the basic tests.
Fixes: d1efa4a0a6 ("perf cs-etm: Add separate decode paths for timeless and per-thread modes")
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com >
Signed-off-by: James Clark <james.clark@arm.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Mike Leach <mike.leach@linaro.org >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Suzuki Poulouse <suzuki.poulose@arm.com >
Link: https://lore.kernel.org/r/20240326113749.257250-1-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2024-04-18 22:22:51 -03:00
Ian Rogers
d9bd1d4264
perf test bpf-counters: Add test for BPF event modifier
...
Refactor test to better enable sharing of logic, to give an idea of
progress and introduce test functions. Add test of measuring both
cycles and cycles:b simultaneously.
Signed-off-by: Ian Rogers <irogers@google.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Ravi Bangoria <ravi.bangoria@amd.com >
Cc: Song Liu <song@kernel.org >
Cc: Thomas Richter <tmricht@linux.ibm.com >
Link: https://lore.kernel.org/r/20240416170014.985191-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2024-04-18 22:22:51 -03:00
Chaitanya S Prakash
6b718ac687
perf tools: Enable configs required for test_uprobe_from_different_cu.sh
...
Test "perf probe of function from different CU" fails due to certain
configs not being enabled. Building the kernel with
CONFIG_KPROBE_EVENTS=y and CONFIG_UPROBE_EVENTS=y fixes the issue. As
CONFIG_KPROBE_EVENTS is dependent on CONFIG_KPROBES, enable it as well.
Some platforms enable these configs as a part of their defconfig, so
this change is only required for the ones that don't do so.
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org >
Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com >
Cc: Anshuman Khandual <anshuman.khandual@arm.com >
Cc: James Clark <james.clark@arm.com >
Link: https://lore.kernel.org/r/20240408062230.1949882-1-ChaitanyaS.Prakash@arm.com
Link: https://lore.kernel.org/r/20240408062230.1949882-7-ChaitanyaS.Prakash@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2024-04-17 12:21:39 -03:00
James Clark
7aa8749979
perf tests: Remove dependency on lscpu
...
This check can be done with uname which is more portable. At the same
time re-arrange it into a standard if statement so that it's more
readable.
Reviewed-by: Ian Rogers <irogers@google.com >
Signed-off-by: James Clark <james.clark@arm.com >
Acked-by: Namhyung Kim <namhyung@kernel.org >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Leo Yan <leo.yan@linux.dev >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Spoorthy S <spoorts2@in.ibm.com >
Link: https://lore.kernel.org/r/20240410103458.813656-5-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2024-04-12 12:02:06 -03:00
James Clark
2dade41a53
perf tests: Apply attributes to all events in object code reading test
...
PERF_PMU_CAP_EXTENDED_HW_TYPE results in multiple events being opened on
heterogeneous systems. Currently this test only sets its required
attributes on the first event. Not disabling enable_on_exec on the other
events causes the test to fail because the forked objdump processes are
sampled. No tracking event is opened so Perf only knows about its own
mappings causing the objdump samples to give the following error:
$ perf test -vvv "object code reading"
Reading object code for memory address: 0xffff9aaa55ec
thread__find_map failed
---- end(-1) ----
24: Object code reading : FAILED!
Fixes: 251aa04024 ("perf parse-events: Wildcard most "numeric" events")
Reviewed-by: Ian Rogers <irogers@google.com >
Signed-off-by: James Clark <james.clark@arm.com >
Acked-by: Namhyung Kim <namhyung@kernel.org >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Leo Yan <leo.yan@linux.dev >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Spoorthy S <spoorts2@in.ibm.com >
Link: https://lore.kernel.org/r/20240410103458.813656-3-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2024-04-12 12:02:05 -03:00
James Clark
256ef072b3
perf tests: Make "test data symbol" more robust on Neoverse N1
...
To prevent anyone from seeing a test failure appear as a regression and
thinking that it was caused by their code change, insert some noise into
the loop which makes it immune to sampling bias issues (errata 1694299).
The "test data symbol" test can fail with any unrelated change that
shifts the loop into an unfortunate position in the Perf binary which is
almost impossible to debug as the root cause of the test failure.
Ultimately it's caused by the referenced errata.
Fixes: 60abedb8aa ("perf test: Introduce script for data symbol testing")
Reviewed-by: Ian Rogers <irogers@google.com >
Signed-off-by: James Clark <james.clark@arm.com >
Acked-by: Namhyung Kim <namhyung@kernel.org >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Leo Yan <leo.yan@linux.dev >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Spoorthy S <spoorts2@in.ibm.com >
Link: https://lore.kernel.org/r/20240410103458.813656-2-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2024-04-12 12:02:05 -03:00
Yang Jihong
09d2056efe
perf evsel: Use evsel__name_is() helper
...
Code cleanup, replace strcmp(evsel__name(evsel, {NAME})) with
evsel__name_is() helper.
No functional change.
Committer notes:
Fix this build error:
trace.syscalls.events.bpf_output = evlist__last(trace.evlist);
- assert(evsel__name_is(trace.syscalls.events.bpf_output), "__augmented_syscalls__");
+ assert(evsel__name_is(trace.syscalls.events.bpf_output, "__augmented_syscalls__"));
Reviewed-by: Ian Rogers <irogers@google.com >
Signed-off-by: Yang Jihong <yangjihong@bytedance.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Link: https://lore.kernel.org/r/20240401062724.1006010-3-yangjihong@bytedance.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2024-04-03 11:48:56 -03:00
Ian Rogers
4cef0e7ae7
perf tests: Run tests in parallel by default
...
Switch from running tests sequentially to running in parallel by
default. Change the opt-in '-p' or '--parallel' flag to '-S' or
'--sequential'.
On an 8 core tigerlake an address sanitizer run time changes from:
326.54user 622.73system 6:59.91elapsed 226%CPU
to:
973.02user 583.98system 3:01.17elapsed 859%CPU
So over twice as fast, saving 4 minutes.
Signed-off-by: Ian Rogers <irogers@google.com >
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Link: https://lore.kernel.org/r/20240301174711.2646944-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2024-03-21 13:54:40 -03:00
Ian Rogers
5f2f051a93
perf test: Read child test 10 times a second rather than 1
...
Make the perf test output smoother by timing out the poll of the child
process after 100ms rather than 1s.
Signed-off-by: Ian Rogers <irogers@google.com >
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com >
Cc: Christian Brauner <brauner@kernel.org >
Cc: Disha Goel <disgoel@linux.ibm.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: K Prateek Nayak <kprateek.nayak@amd.com >
Cc: Kajol Jain <kjain@linux.ibm.com >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Song Liu <songliubraving@fb.com >
Cc: Tim Chen <tim.c.chen@linux.intel.com >
Cc: Yicong Yang <yangyicong@hisilicon.com >
Link: https://lore.kernel.org/r/20240301074639.2260708-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2024-03-21 13:54:39 -03:00
Ian Rogers
e120f7091a
perf test: Use a single fd for the child process out/err
...
Switch from dumping err then out, to a single file descriptor for both
of them. This allows the err and output to be correctly interleaved in
verbose output.
Fixes: b482f5f8e0 ("perf tests: Add option to run tests in parallel")
Signed-off-by: Ian Rogers <irogers@google.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com >
Cc: Christian Brauner <brauner@kernel.org >
Cc: Disha Goel <disgoel@linux.ibm.com >
Cc: Ian Rogers <irogers@google.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kajol Jain <kjain@linux.ibm.com >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: K Prateek Nayak <kprateek.nayak@amd.com >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Song Liu <songliubraving@fb.com >
Cc: Tim Chen <tim.c.chen@linux.intel.com >
Cc: Yicong Yang <yangyicong@hisilicon.com >
Link: https://lore.kernel.org/r/20240301074639.2260708-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2024-03-21 13:54:39 -03:00
Ian Rogers
f68c981be0
perf test: Stat output per thread of just the parent process
...
Per-thread mode requires either system-wide (-a), a pid (-p) or a tid
(-t).
The stat output tests were using system-wide mode but this is racy when
threads are starting and exiting - something that happens a lot when
running the tests in parallel (perf test -p).
Avoid the race conditions by using pid mode with the pid of the parent
process.
Signed-off-by: Ian Rogers <irogers@google.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com >
Cc: Christian Brauner <brauner@kernel.org >
Cc: Disha Goel <disgoel@linux.ibm.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: James Clark <james.clark@arm.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: K Prateek Nayak <kprateek.nayak@amd.com >
Cc: Kajol Jain <kjain@linux.ibm.com >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Song Liu <songliubraving@fb.com >
Cc: Tim Chen <tim.c.chen@linux.intel.com >
Cc: Yicong Yang <yangyicong@hisilicon.com >
Link: https://lore.kernel.org/r/20240301074639.2260708-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2024-03-21 13:54:39 -03:00
Ian Rogers
71bc3ac8e8
perf cpumap: Use perf_cpu_map__for_each_cpu when possible
...
Rather than manually iterating the CPU map, use
perf_cpu_map__for_each_cpu(). When possible tidy local variables.
Reviewed-by: James Clark <james.clark@arm.com >
Signed-off-by: Ian Rogers <irogers@google.com >
Acked-by: Namhyung Kim <namhyung@kernel.org >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Alexandre Ghiti <alexghiti@rivosinc.com >
Cc: Andrew Jones <ajones@ventanamicro.com >
Cc: André Almeida <andrealmeid@igalia.com >
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com >
Cc: Atish Patra <atishp@rivosinc.com >
Cc: Changbin Du <changbin.du@huawei.com >
Cc: Darren Hart <dvhart@infradead.org >
Cc: Davidlohr Bueso <dave@stgolabs.net >
Cc: Huacai Chen <chenhuacai@kernel.org >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: John Garry <john.g.garry@oracle.com >
Cc: K Prateek Nayak <kprateek.nayak@amd.com >
Cc: Kajol Jain <kjain@linux.ibm.com >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Leo Yan <leo.yan@linaro.org >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Mike Leach <mike.leach@linaro.org >
Cc: Nick Desaulniers <ndesaulniers@google.com >
Cc: Paolo Bonzini <pbonzini@redhat.com >
Cc: Paran Lee <p4ranlee@gmail.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Ravi Bangoria <ravi.bangoria@amd.com >
Cc: Sandipan Das <sandipan.das@amd.com >
Cc: Sean Christopherson <seanjc@google.com >
Cc: Steinar H. Gunderson <sesse@google.com >
Cc: Suzuki Poulouse <suzuki.poulose@arm.com >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Will Deacon <will@kernel.org >
Cc: Yang Jihong <yangjihong1@huawei.com >
Cc: Yang Li <yang.lee@linux.alibaba.com >
Cc: Yanteng Si <siyanteng@loongson.cn >
Link: https://lore.kernel.org/r/20240202234057.2085863-9-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2024-03-21 10:41:28 -03:00
Colin Ian King
eb94225eb4
perf test: Fix spelling mistake "curent" -> "current"
...
There is a spelling mistake in a pr_debug message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com >
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com >
Cc: kernel-janitors@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Link: https://lore.kernel.org/r/20240226105326.3944887-1-colin.i.king@gmail.com
2024-02-26 21:41:27 -08:00
Arnaldo Carvalho de Melo
8680999dbe
perf test: Use TEST_FAIL in the TEST_ASSERT macros instead of -1
...
Just to make things clearer, return TEST_FAIL (-1) instead of an open
coded -1.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
Reviewed-by: Ian Rogers <irogers@google.com >
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Link: https://lore.kernel.org/r/ZdepeMsjagbf1ufD@x1
2024-02-26 08:31:24 -08:00
Ian Rogers
b482f5f8e0
perf tests: Add option to run tests in parallel
...
By default tests are forked, add an option (-p or --parallel) so that
the forked tests are all started in parallel and then their output
gathered serially. This is opt-in as running in parallel can cause
test flakes.
Rather than fork within the code, the start_command/finish_command
from libsubcmd are used. This changes how stderr and stdout are
handled. The child stderr and stdout are always read to avoid the
child blocking. If verbose is 1 (-v) then if the test fails the child
stdout and stderr are displayed. If the verbose is >1 (e.g. -vv) then
the stdout and stderr from the child are immediately displayed.
An unscientific test on my laptop shows the wall clock time for perf
test without parallel being 5 minutes 21 seconds and with parallel
(-p) being 1 minute 50 seconds.
Signed-off-by: Ian Rogers <irogers@google.com >
Cc: James Clark <james.clark@arm.com >
Cc: Justin Stitt <justinstitt@google.com >
Cc: Bill Wendling <morbo@google.com >
Cc: Nick Desaulniers <ndesaulniers@google.com >
Cc: Yang Jihong <yangjihong1@huawei.com >
Cc: Nathan Chancellor <nathan@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com >
Cc: llvm@lists.linux.dev
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Link: https://lore.kernel.org/r/20240221034155.1500118-9-irogers@google.com
2024-02-22 09:13:20 -08:00
Ian Rogers
964461ee37
perf tests: Run time generate shell test suites
...
Rather than special shell test logic, do a single pass to create an
array of test suites. Hold the shell test file name in the test suite
priv field. This makes the special shell test logic in builtin-test.c
redundant so remove it.
Signed-off-by: Ian Rogers <irogers@google.com >
Cc: James Clark <james.clark@arm.com >
Cc: Justin Stitt <justinstitt@google.com >
Cc: Bill Wendling <morbo@google.com >
Cc: Nick Desaulniers <ndesaulniers@google.com >
Cc: Yang Jihong <yangjihong1@huawei.com >
Cc: Nathan Chancellor <nathan@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com >
Cc: llvm@lists.linux.dev
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Link: https://lore.kernel.org/r/20240221034155.1500118-8-irogers@google.com
2024-02-22 09:13:06 -08:00
Ian Rogers
f3295f5b06
perf tests: Use scandirat for shell script finding
...
Avoid filename appending buffers by using openat, faccessat and
scandirat more widely. Turn the script's path back to a file name
using readlink from /proc/<pid>/fd/<fd>.
Read the script's description using api/io.h to avoid fdopen
conversions. Whilst reading perform additional sanity checks on the
script's contents.
Signed-off-by: Ian Rogers <irogers@google.com >
Cc: James Clark <james.clark@arm.com >
Cc: Justin Stitt <justinstitt@google.com >
Cc: Bill Wendling <morbo@google.com >
Cc: Nick Desaulniers <ndesaulniers@google.com >
Cc: Yang Jihong <yangjihong1@huawei.com >
Cc: Nathan Chancellor <nathan@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com >
Cc: llvm@lists.linux.dev
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Link: https://lore.kernel.org/r/20240221034155.1500118-7-irogers@google.com
2024-02-22 09:12:53 -08:00
Ian Rogers
d5bcade989
perf test: Rename builtin-test-list and add missed header guard
...
builtin-test-list is primarily concerned with shell script
tests. Rename the file to better reflect this and add a missed header
guard.
Signed-off-by: Ian Rogers <irogers@google.com >
Cc: James Clark <james.clark@arm.com >
Cc: Justin Stitt <justinstitt@google.com >
Cc: Bill Wendling <morbo@google.com >
Cc: Nick Desaulniers <ndesaulniers@google.com >
Cc: Yang Jihong <yangjihong1@huawei.com >
Cc: Nathan Chancellor <nathan@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com >
Cc: llvm@lists.linux.dev
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Link: https://lore.kernel.org/r/20240221034155.1500118-6-irogers@google.com
2024-02-22 09:12:40 -08:00
Ian Rogers
526f2ac9f6
perf tests: Avoid fork in perf_has_symbol test
...
perf test -vv Symbols is used to indentify symbols within the perf
binary. Add the -F flag so that the test command doesn't fork the test
before running. This removes a little overhead.
Acked-by: Adrian Hunter <adrian.hunter@intel.com >
Signed-off-by: Ian Rogers <irogers@google.com >
Cc: James Clark <james.clark@arm.com >
Cc: Justin Stitt <justinstitt@google.com >
Cc: Bill Wendling <morbo@google.com >
Cc: Nick Desaulniers <ndesaulniers@google.com >
Cc: Yang Jihong <yangjihong1@huawei.com >
Cc: Nathan Chancellor <nathan@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com >
Cc: llvm@lists.linux.dev
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Link: https://lore.kernel.org/r/20240221034155.1500118-4-irogers@google.com
2024-02-22 09:12:04 -08:00
Changbin Du
8b767db330
perf: build: introduce the libcapstone
...
Later we will use libcapstone to disassemble instructions of samples.
Signed-off-by: Changbin Du <changbin.du@huawei.com >
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com >
Cc: changbin.du@gmail.com
Cc: Thomas Richter <tmricht@linux.ibm.com >
Cc: Andi Kleen <ak@linux.intel.com >
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Link: https://lore.kernel.org/r/20240217074046.4100789-2-changbin.du@huawei.com
2024-02-20 18:06:25 -08:00
Veronika Molnarova
e7d759f31c
perf testsuite: Add test for kprobe handling
...
Test perf interface to kprobes: listing, adding and removing probes. It
is run as a part of perftool-testsuite_probe test case.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com >
Signed-off-by: Michael Petlan <mpetlan@redhat.com >
Cc: kjain@linux.ibm.com
Cc: atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Link: https://lore.kernel.org/r/20240215110231.15385-7-mpetlan@redhat.com
2024-02-16 11:49:47 -08:00
Veronika Molnarova
61d348f1e9
perf testsuite: Add common output checking helpers
...
As a form of validation, it is a common practice to check the outputs
of commands whether they contain expected patterns or match a certain
regex.
Add helpers for verifying that all regexes are found in the output, that
all lines match any pattern from a set and that a certain expression is
not present in the output.
In verbose mode these helpers log mismatches for easier failure
investigation.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com >
Signed-off-by: Michael Petlan <mpetlan@redhat.com >
Cc: kjain@linux.ibm.com
Cc: atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Link: https://lore.kernel.org/r/20240215110231.15385-6-mpetlan@redhat.com
2024-02-16 11:49:36 -08:00
Veronika Molnarova
c8eb2a9ff8
perf testsuite: Add test case for perf probe
...
Add new perf probe test case that acts as an entry element in perf test
list. Runs multiple subtests from directory "base_probe", which will be
added in incomming patches and can be expanded without further editing.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com >
Signed-off-by: Michael Petlan <mpetlan@redhat.com >
Cc: kjain@linux.ibm.com
Cc: atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Link: https://lore.kernel.org/r/20240215110231.15385-5-mpetlan@redhat.com
2024-02-16 11:49:22 -08:00
Veronika Molnarova
e3425864a9
perf testsuite: Add initialization script for shell tests
...
Initialize reporting and logging functions that unifies formatting
of the test output used for shell tests.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com >
Signed-off-by: Michael Petlan <mpetlan@redhat.com >
Cc: kjain@linux.ibm.com
Cc: atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Link: https://lore.kernel.org/r/20240215110231.15385-4-mpetlan@redhat.com
2024-02-16 11:48:58 -08:00
Veronika Molnarova
451af6a790
perf testsuite: Add common setting for shell tests
...
Add settings defining sample commands later shared by shell tests. This
adds the possibility to globally adjust the default values for the whole
testsuite.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com >
Signed-off-by: Michael Petlan <mpetlan@redhat.com >
Cc: kjain@linux.ibm.com
Cc: atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Link: https://lore.kernel.org/r/20240215110231.15385-3-mpetlan@redhat.com
2024-02-16 11:48:40 -08:00
Veronika Molnarova
0aa8142871
perf testsuite: Add common regex patters
...
Unify perf regexes for checking testing output into a single file
to reduce duplicates and prevent errors when editing.
This will be used in upcomming patches in shell tests.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com >
Signed-off-by: Michael Petlan <mpetlan@redhat.com >
Cc: kjain@linux.ibm.com
Cc: atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Link: https://lore.kernel.org/r/20240215110231.15385-2-mpetlan@redhat.com
2024-02-16 11:48:18 -08:00
Adrian Hunter
6f04d664a9
perf test: Enable Symbols test to work with a current module dso
...
The test needs a struct machine and creates one for the current host,
but a side-effect is that struct machine has set up kernel maps
including module maps.
If the 'Symbols' test --dso option specifies a current kernel module,
it will already be present as a kernel dso, and a map with kmaps needs
to be used otherwise there will be a segfault - see below.
For that case, find the existing map and use that. In that case also,
the dso is split by section into multiple dsos, so test those dsos
also. That in turn, shows up that those dsos have not had overlapping
symbols removed, so the test fails.
Example:
Before:
$ perf test -F -v Symbols --dso /lib/modules/$(uname -r)/kernel/arch/x86/kvm/kvm-intel.ko
70: Symbols :
--- start ---
Testing /lib/modules/6.7.2-local/kernel/arch/x86/kvm/kvm-intel.ko
Segmentation fault (core dumped)
After:
$ perf test -F -v Symbols --dso /lib/modules/$(uname -r)/kernel/arch/x86/kvm/kvm-intel.ko
70: Symbols :
--- start ---
Testing /lib/modules/6.7.2-local/kernel/arch/x86/kvm/kvm-intel.ko
Overlapping symbols:
41d30-41fbb l vmx_init
41d30-41fbb g init_module
---- end ----
Symbols: FAILED!
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com >
Reviewed-by: Ian Rogers <irogers@google.com >
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Link: https://lore.kernel.org/r/20240131192416.16387-1-adrian.hunter@intel.com
2024-02-16 11:44:04 -08:00
Ian Rogers
ff0bd79980
perf maps: Hide maps internals
...
Move the struct into the C file. Add maps__equal to work around
exposing the struct for reference count checking. Add accessors for
the unwind_libunwind_ops. Move maps_list_node to its only use in
symbol.c.
Signed-off-by: Ian Rogers <irogers@google.com >
Acked-by: Namhyung Kim <namhyung@kernel.org >
Cc: K Prateek Nayak <kprateek.nayak@amd.com >
Cc: James Clark <james.clark@arm.com >
Cc: Vincent Whitchurch <vincent.whitchurch@axis.com >
Cc: Alexey Dobriyan <adobriyan@gmail.com >
Cc: Colin Ian King <colin.i.king@gmail.com >
Cc: Changbin Du <changbin.du@huawei.com >
Cc: Masami Hiramatsu <mhiramat@kernel.org >
Cc: Song Liu <song@kernel.org >
Cc: Leo Yan <leo.yan@linux.dev >
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com >
Cc: Liam Howlett <liam.howlett@oracle.com >
Cc: Artem Savkov <asavkov@redhat.com >
Cc: bpf@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Link: https://lore.kernel.org/r/20240210031746.4057262-6-irogers@google.com
2024-02-12 12:35:41 -08:00
Ian Rogers
107ef66cb0
perf maps: Get map before returning in maps__find_by_name
...
Finding a map is done under a lock, returning the map without a
reference count means it can be removed without notice and causing
uses after free. Grab a reference count to the map within the lock
region and return this. Fix up locations that need a map__put
following this. Also fix some reference counted pointer comparisons.
Signed-off-by: Ian Rogers <irogers@google.com >
Acked-by: Namhyung Kim <namhyung@kernel.org >
Cc: K Prateek Nayak <kprateek.nayak@amd.com >
Cc: James Clark <james.clark@arm.com >
Cc: Vincent Whitchurch <vincent.whitchurch@axis.com >
Cc: Alexey Dobriyan <adobriyan@gmail.com >
Cc: Colin Ian King <colin.i.king@gmail.com >
Cc: Changbin Du <changbin.du@huawei.com >
Cc: Masami Hiramatsu <mhiramat@kernel.org >
Cc: Song Liu <song@kernel.org >
Cc: Leo Yan <leo.yan@linux.dev >
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com >
Cc: Liam Howlett <liam.howlett@oracle.com >
Cc: Artem Savkov <asavkov@redhat.com >
Cc: bpf@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Link: https://lore.kernel.org/r/20240210031746.4057262-4-irogers@google.com
2024-02-12 12:35:33 -08:00
Ian Rogers
42fd623b58
perf maps: Get map before returning in maps__find
...
Finding a map is done under a lock, returning the map without a
reference count means it can be removed without notice and causing
uses after free. Grab a reference count to the map within the lock
region and return this. Fix up locations that need a map__put
following this.
Signed-off-by: Ian Rogers <irogers@google.com >
Acked-by: Namhyung Kim <namhyung@kernel.org >
Cc: K Prateek Nayak <kprateek.nayak@amd.com >
Cc: James Clark <james.clark@arm.com >
Cc: Vincent Whitchurch <vincent.whitchurch@axis.com >
Cc: Alexey Dobriyan <adobriyan@gmail.com >
Cc: Colin Ian King <colin.i.king@gmail.com >
Cc: Changbin Du <changbin.du@huawei.com >
Cc: Masami Hiramatsu <mhiramat@kernel.org >
Cc: Song Liu <song@kernel.org >
Cc: Leo Yan <leo.yan@linux.dev >
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com >
Cc: Liam Howlett <liam.howlett@oracle.com >
Cc: Artem Savkov <asavkov@redhat.com >
Cc: bpf@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Link: https://lore.kernel.org/r/20240210031746.4057262-3-irogers@google.com
2024-02-12 12:35:26 -08:00
Ian Rogers
659ad3492b
perf maps: Switch from rbtree to lazily sorted array for addresses
...
Maps is a collection of maps primarily sorted by the starting address
of the map. Prior to this change the maps were held in an rbtree
requiring 4 pointers per node. Prior to reference count checking, the
rbnode was embedded in the map so 3 pointers per node were
necessary. This change switches the rbtree to an array lazily sorted
by address, much as the array sorting nodes by name. 1 pointer is
needed per node, but to avoid excessive resizing the backing array may
be twice the number of used elements. Meaning the memory overhead is
roughly half that of the rbtree. For a perf record with
"--no-bpf-event -g -a" of true, the memory overhead of perf inject is
reduce fom 3.3MB to 3MB, so 10% or 300KB is saved.
Map inserts always happen at the end of the array. The code tracks
whether the insertion violates the sorting property. O(log n) rb-tree
complexity is switched to O(1).
Remove slides the array, so O(log n) rb-tree complexity is degraded to
O(n).
A find may need to sort the array using qsort which is O(n*log n), but
in general the maps should be sorted and so average performance should
be O(log n) as with the rbtree.
An rbtree node consumes a cache line, but with the array 4 nodes fit
on a cache line. Iteration is simplified to scanning an array rather
than pointer chasing.
Overall it is expected the performance after the change should be
comparable to before, but with half of the memory consumed.
To avoid a list and repeated logic around splitting maps,
maps__merge_in is rewritten in terms of
maps__fixup_overlap_and_insert. maps_merge_in splits the given mapping
inserting remaining gaps. maps__fixup_overlap_and_insert splits the
existing mappings, then adds the incoming mapping. By adding the new
mapping first, then re-inserting the existing mappings the splitting
behavior matches.
Signed-off-by: Ian Rogers <irogers@google.com >
Acked-by: Namhyung Kim <namhyung@kernel.org >
Cc: K Prateek Nayak <kprateek.nayak@amd.com >
Cc: James Clark <james.clark@arm.com >
Cc: Vincent Whitchurch <vincent.whitchurch@axis.com >
Cc: Alexey Dobriyan <adobriyan@gmail.com >
Cc: Colin Ian King <colin.i.king@gmail.com >
Cc: Changbin Du <changbin.du@huawei.com >
Cc: Masami Hiramatsu <mhiramat@kernel.org >
Cc: Song Liu <song@kernel.org >
Cc: Leo Yan <leo.yan@linux.dev >
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com >
Cc: Liam Howlett <liam.howlett@oracle.com >
Cc: Artem Savkov <asavkov@redhat.com >
Cc: bpf@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Link: https://lore.kernel.org/r/20240210031746.4057262-2-irogers@google.com
2024-02-12 12:35:14 -08:00
Namhyung Kim
39d14c0dd6
Merge branch 'perf-tools' into perf-tools-next
...
To get some fixes in the perf test and JSON metrics into the development
branch.
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
2024-02-12 12:19:21 -08:00
Yicong Yang
cbc917a1b0
perf stat: Support per-cluster aggregation
...
Some platforms have 'cluster' topology and CPUs in the cluster will
share resources like L3 Cache Tag (for HiSilicon Kunpeng SoC) or L2
cache (for Intel Jacobsville). Currently parsing and building cluster
topology have been supported since [1].
perf stat has already supported aggregation for other topologies like
die or socket, etc. It'll be useful to aggregate per-cluster to find
problems like L3T bandwidth contention.
This patch add support for "--per-cluster" option for per-cluster
aggregation. Also update the docs and related test. The output will
be like:
[root@localhost tmp]# perf stat -a -e LLC-load --per-cluster -- sleep 5
Performance counter stats for 'system wide':
S56-D0-CLS158 4 1,321,521,570 LLC-load
S56-D0-CLS594 4 794,211,453 LLC-load
S56-D0-CLS1030 4 41,623 LLC-load
S56-D0-CLS1466 4 41,646 LLC-load
S56-D0-CLS1902 4 16,863 LLC-load
S56-D0-CLS2338 4 15,721 LLC-load
S56-D0-CLS2774 4 22,671 LLC-load
[...]
On a legacy system without cluster or cluster support, the output will
be look like:
[root@localhost perf]# perf stat -a -e cycles --per-cluster -- sleep 1
Performance counter stats for 'system wide':
S56-D0-CLS0 64 18,011,485 cycles
S7182-D0-CLS0 64 16,548,835 cycles
Note that this patch doesn't mix the cluster information in the outputs
of --per-core to avoid breaking any tools/scripts using it.
Note that perf recently supports "--per-cache" aggregation, but it's not
the same with the cluster although cluster CPUs may share some cache
resources. For example on my machine all clusters within a die share the
same L3 cache:
$ cat /sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_list
0-31
$ cat /sys/devices/system/cpu/cpu0/topology/cluster_cpus_list
0-3
[1] commit c5e22feffd ("topology: Represent clusters of CPUs within a die")
Tested-by: Jie Zhan <zhanjie9@hisilicon.com >
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com >
Reviewed-by: Ian Rogers <irogers@google.com >
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com >
Cc: james.clark@arm.com
Cc: 21cnbao@gmail.com
Cc: prime.zeng@hisilicon.com
Cc: Jonathan.Cameron@huawei.com
Cc: fanghao11@huawei.com
Cc: linuxarm@huawei.com
Cc: tim.c.chen@intel.com
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Link: https://lore.kernel.org/r/20240208024026.2691-1-yangyicong@huawei.com
2024-02-09 14:59:53 -08:00
Yicong Yang
5f70c6c559
perf test: Skip metric w/o event name on arm64 in stat STD output linter
...
stat+std_output.sh test fails on my arm64 machine:
[root@localhost shell]# ./stat+std_output.sh
Checking STD output: no args Unknown event name in TopDownL1 # 0.18 retiring
[root@localhost shell]# ./stat+std_output.sh
Checking STD output: no args [Success]
Checking STD output: system wide [Success]
Checking STD output: interval [Success]
Checking STD output: per thread Unknown event name in tmux: server-1114960 # 0.41 frontend_bound
When no args specified `perf stat` will add TopdownL1 metric group
and the output will be like:
[root@localhost shell]# perf stat -- stress-ng --vm 1 --timeout 1
stress-ng: info: [3351733] setting to a 1 second run per stressor
stress-ng: info: [3351733] dispatching hogs: 1 vm
stress-ng: info: [3351733] successful run completed in 1.02s
Performance counter stats for 'stress-ng --vm 1 --timeout 1':
1,037.71 msec task-clock # 1.000 CPUs utilized
13 context-switches # 12.528 /sec
1 cpu-migrations # 0.964 /sec
67,544 page-faults # 65.090 K/sec
2,691,932,561 cycles # 2.594 GHz (74.56%)
6,571,333,653 instructions # 2.44 insn per cycle (74.92%)
521,863,142 branches # 502.901 M/sec (75.21%)
425,879 branch-misses # 0.08% of all branches (87.57%)
TopDownL1 # 0.61 retiring (87.67%)
# 0.03 frontend_bound (87.67%)
# 0.02 bad_speculation (87.67%)
# 0.34 backend_bound (74.61%)
1.038138390 seconds time elapsed
0.844849000 seconds user
0.189053000 seconds sys
Metrics in group TopDownL1 don't have event name on arm64 but are not
listed in the $skip_metric list which they should be listed. Add them
to the skip list as what does for x86 platforms in [1].
[1] commit 4d60e83dfc ("perf test: Skip metrics w/o event name in stat STD output linter")
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com >
Reviewed-by: Ian Rogers <irogers@google.com >
Cc: linuxarm@huawei.com
Cc: kan.liang@linux.intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Link: https://lore.kernel.org/r/20240207091222.54096-1-yangyicong@huawei.com
2024-02-08 15:59:47 -08:00
Ian Rogers
b8db070f38
perf jevents: Drop or simplify small integer values
...
Prior to this patch '0' would be dropped as the config values default
to 0. Some json values are hex and the string '0' wouldn't match '0x0'
as zero. Add a more robust is_zero test to drop these event terms.
When encoding numbers as hex, if the number is between 0 and 9
inclusive then don't add a 0x prefix.
Update test expectations for these changes.
On x86 this reduces the event/metric C string by 58,411 bytes.
Signed-off-by: Ian Rogers <irogers@google.com >
Reviewed-by: Kan Liang <kan.liang@linux.intel.com >
Cc: Edward Baker <edward.baker@intel.com >
Cc: Perry Taylor <perry.taylor@intel.com >
Cc: Weilin Wang <weilin.wang@intel.com >
Cc: John Garry <john.g.garry@oracle.com >
Cc: Jing Zhang <renyu.zj@linux.alibaba.com >
Cc: Kajol Jain <kjain@linux.ibm.com >
Cc: Michael Petlan <mpetlan@redhat.com >
Cc: Veronika Molnarova <vmolnaro@redhat.com >
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Link: https://lore.kernel.org/r/20240131201429.792138-1-irogers@google.com
2024-02-02 13:09:30 -08:00
Ian Rogers
fd7b8e8fb2
perf parse-events: Print all errors
...
Prior to this patch the first and the last error encountered during
parsing are printed. To see other errors verbose needs
enabling. Unfortunately this can drop useful errors, in particular on
terms. This patch changes the errors so that instead of the first and
last all errors are recorded and printed, the underlying data
structure is changed to a list.
Before:
```
$ perf stat -e 'slots/edge=2/' true
event syntax error: 'slots/edge=2/'
\___ Bad event or PMU
Unable to find PMU or event on a PMU of 'slots'
Initial error:
event syntax error: 'slots/edge=2/'
\___ Cannot find PMU `slots'. Missing kernel support?
Run 'perf list' for a list of valid events
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
```
After:
```
$ perf stat -e 'slots/edge=2/' true
event syntax error: 'slots/edge=2/'
\___ Bad event or PMU
Unable to find PMU or event on a PMU of 'slots'
event syntax error: 'slots/edge=2/'
\___ value too big for format (edge), maximum is 1
event syntax error: 'slots/edge=2/'
\___ Cannot find PMU `slots'. Missing kernel support?
Run 'perf list' for a list of valid events
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
```
Signed-off-by: Ian Rogers <irogers@google.com >
Reviewed-by: James Clark <james.clark@arm.com >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: tchen168@asu.edu
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Michael Petlan <mpetlan@redhat.com >
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Link: https://lore.kernel.org/r/20240131134940.593788-3-irogers@google.com
2024-02-02 13:08:05 -08:00
Weilin Wang
8f95b29c73
perf test: Simplify metric value validation test final report
...
The original test report was too complicated to read with information
that not really useful. This new update simplify the report which should
largely improve the readibility.
Signed-off-by: Weilin Wang <weilin.wang@intel.com >
Reviewed-by: Ian Rogers <irogers@google.com >
Cc: Caleb Biggers <caleb.biggers@intel.com >
Cc: Perry Taylor <perry.taylor@intel.com >
Cc: Samantha Alt <samantha.alt@intel.com >
Cc: Kan Liang <kan.liang@linux.intel.com >
Signed-off-by: Namhyung Kim <namhyung@kernel.org >
Link: https://lore.kernel.org/r/20240130180907.639729-1-weilin.wang@intel.com
2024-02-01 22:16:37 -08:00