mirror of
https://github.com/raspberrypi/linux.git
synced 2025-12-07 10:29:52 +00:00
e032368e8cb15ab1f11b92f078caa9bae995b8fe
15131 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
2480232c61 |
perf test task_exit: No need for a cycles event to check if we get an PERF_RECORD_EXIT
The intent of this test is to check we get a PERF_RECORD_EXIT as asked for by setting perf_event_attr.task=1. When the test was written we didn't had the "dummy" event so we went with the default event, "cycles". There were reports of this test failing sometimes, one of these reports was with a PREEMPT_RT_FULL, but I noticed it failing sometimes with an aarch64 Firefly board. In the kernel the call to perf_event_task_output(), that generates the PERF_RECORD_EXIT may fail when there is not enough memory in the ring buffer, if the ring buffer is paused, etc. So switch to using the "dummy" event to use the ring buffer just for what the test was designed for, avoiding uneeded PERF_RECORD_SAMPLEs. Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZLGXmMuNRpx1ubFm@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
0e022f5bf7 |
perf beauty: Update copy of linux/socket.h with the kernel sources
To pick the changes in: |
||
|
|
5b10c18d1b |
perf parse-events: Avoid SEGV if PMU lookup fails for legacy cache terms
libfuzzer found the following command could SEGV:
$ perf stat -e cpu/L2,L2/ true
This is because the L2 term rewrites the perf_event_attr type to
PERF_TYPE_HW_CACHE which then fails the PMU lookup for the second
legacy cache term.
The new failure is consistent with repeated hardware terms:
$ perf stat -e cpu/L2,L2/ true
event syntax error: 'cpu/L2,L2/'
\___ Failed to find PMU for type 3
Initial error:
event syntax error: 'cpu/L2,L2/'
\___ Failed to find PMU for type 3
Run 'perf list' for a list of valid events
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
$ perf stat -e cpu/cycles,cycles/ true
event syntax error: 'cpu/cycles,cycles/'
\___ Failed to find PMU for type 0
Initial error:
event syntax error: 'cpu/cycles,cycles/'
\___ Failed to find PMU for type 0
Run 'perf list' for a list of valid events
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
Committer testing:
Before:
$ perf stat -e cpu/L2,L2/ true
Segmentation fault (core dumped)
$
After:
$ perf stat -e cpu/L2,L2/ true
event syntax error: 'cpu/L2,L2/'
\___ Failed to find PMU for type 3
Initial error:
event syntax error: 'cpu/L2,L2/'
\___ Failed to find PMU for type 3
Run 'perf list' for a list of valid events
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
$
Fixes:
|
||
|
|
920b91d927 |
tools include UAPI: Sync linux/mount.h copy with the kernel sources
To pick the changes from:
|
||
|
|
8d40f74ebf |
perf vendor events amd: Fix large metrics
There are cases where a metric requires more events than the number of available counters. E.g. AMD Zen, Zen 2 and Zen 3 processors have four data fabric counters but the "nps1_die_to_dram" metric has eight events. By default, the constituent events are placed in a group and since the events cannot be scheduled at the same time, the metric is not computed. The "all metrics" test also fails because of this. Use the NO_GROUP_EVENTS constraint for such metrics which anyway expect the user to run perf with "--metric-no-group". E.g. $ sudo perf test -v 101 Before: 101: perf all metrics test : --- start --- test child forked, pid 37131 Testing branch_misprediction_ratio Testing all_remote_links_outbound Testing nps1_die_to_dram Metric 'nps1_die_to_dram' not printed in: Error: Invalid event (dram_channel_data_controller_4) in per-thread mode, enable system wide with '-a'. Testing macro_ops_dispatched Testing all_l2_cache_accesses Testing all_l2_cache_hits Testing all_l2_cache_misses Testing ic_fetch_miss_ratio Testing l2_cache_accesses_from_l2_hwpf Testing l2_cache_misses_from_l2_hwpf Testing op_cache_fetch_miss_ratio Testing l3_read_miss_latency Testing l1_itlb_misses test child finished with -1 ---- end ---- perf all metrics test: FAILED! After: 101: perf all metrics test : --- start --- test child forked, pid 43766 Testing branch_misprediction_ratio Testing all_remote_links_outbound Testing nps1_die_to_dram Testing macro_ops_dispatched Testing all_l2_cache_accesses Testing all_l2_cache_hits Testing all_l2_cache_misses Testing ic_fetch_miss_ratio Testing l2_cache_accesses_from_l2_hwpf Testing l2_cache_misses_from_l2_hwpf Testing op_cache_fetch_miss_ratio Testing l3_read_miss_latency Testing l1_itlb_misses test child finished with 0 ---- end ---- perf all metrics test: Ok Reported-by: Ayush Jain <ayush.jain3@amd.com> Suggested-by: Ian Rogers <irogers@google.com> Signed-off-by: Sandipan Das <sandipan.das@amd.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Link: https://lore.kernel.org/r/20230706063440.54189-1-sandipan.das@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
1feece2780 |
perf build: Fix library not found error when using CSLIBS
-L only specifies the search path for libraries directly provided in the link line with -l. Because -lopencsd isn't specified, it's only linked because it's a dependency of -lopencsd_c_api. Dependencies like this are resolved using the default system search paths or -rpath-link=... rather than -L. This means that compilation only works if OpenCSD is installed to the system rather than provided with the CSLIBS (-L) option. This could be fixed by adding -Wl,-rpath-link=$(CSLIBS) but that is less conventional than just adding -lopencsd to the link line so that it uses -L. -lopencsd seems to have been removed in commit |
||
|
|
9350a91791 |
tools headers UAPI: Sync files changed by new cachestat syscall with the kernel sources
To pick the changes in these csets:
|
||
|
|
c66e1c68c1 |
perf probe: Read DWARF files from the correct CU
After switching from dwarf_decl_file() to die_get_decl_file(), it is not
possible to add probes for certain functions:
$ perf probe -x /usr/lib/systemd/systemd-logind match_unit_removed
A function DIE doesn't have decl_line. Maybe broken DWARF?
A function DIE doesn't have decl_line. Maybe broken DWARF?
Probe point 'match_unit_removed' not found.
Error: Failed to add events.
The problem is that die_get_decl_file() uses the wrong CU to search for
the file. elfutils commit e1db5cdc9f has some good explanation for this:
dwarf_decl_file uses dwarf_attr_integrate to get the DW_AT_decl_file
attribute. This means the attribute might come from a different DIE
in a different CU. If so, we need to use the CU associated with the
attribute, not the original DIE, to resolve the file name.
This patch uses the same source of information as elfutils: use attribute
DW_AT_decl_file and use this CU to search for the file.
Fixes:
|
||
|
|
56cbeacf14 |
perf probe: Add test for regression introduced by switch to die_get_decl_file()
This patch adds a test to validate that 'perf probe' works for binaries where DWARF info is split into multiple CUs Signed-off-by: Georg Müller <georgmueller@gmx.net> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: regressions@lists.linux.dev Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230628084551.1860532-5-georgmueller@gmx.net Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
c206353dfd |
Merge tag 'perf-tools-for-v6.5-2-2023-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next
Pull more perf tools updates from Namhyung Kim:
"These are remaining changes and fixes for this cycle.
Build:
- Allow generating vmlinux.h from BTF using `make GEN_VMLINUX_H=1`
and skip if the vmlinux has no BTF.
- Replace deprecated clang -target xxx option by --target=xxx.
perf record:
- Print event attributes with well known type and config symbols in
the debug output like below:
# perf record -e cycles,cpu-clock -C0 -vv true
<SNIP>
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0 (PERF_COUNT_HW_CPU_CYCLES)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5
------------------------------------------------------------
perf_event_attr:
type 1 (PERF_TYPE_SOFTWARE)
size 136
config 0 (PERF_COUNT_SW_CPU_CLOCK)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
- Update AMD IBS event error message since it now support per-process
profiling but no priviledge filters.
$ sudo perf record -e ibs_op//k -C 0
Error:
AMD IBS doesn't support privilege filtering. Try again without
the privilege modifiers (like 'k') at the end.
perf lock contention:
- Support CSV style output using -x option
$ sudo perf lock con -ab -x, sleep 1
# output: contended, total wait, max wait, avg wait, type, caller
19, 194232, 21415, 10222, spinlock, process_one_work+0x1f0
15, 162748, 23843, 10849, rwsem:R, do_user_addr_fault+0x40e
4, 86740, 23415, 21685, rwlock:R, ep_poll_callback+0x2d
1, 84281, 84281, 84281, mutex, iwl_mvm_async_handlers_wk+0x135
8, 67608, 27404, 8451, spinlock, __queue_work+0x174
3, 58616, 31125, 19538, rwsem:W, do_mprotect_pkey+0xff
3, 52953, 21172, 17651, rwlock:W, do_epoll_wait+0x248
2, 30324, 19704, 15162, rwsem:R, do_madvise+0x3ad
1, 24619, 24619, 24619, spinlock, rcu_core+0xd4
- Add --output option to save the data to a file not to be interfered
by other debug messages.
Test:
- Fix event parsing test on ARM where there's no raw PMU nor supports
PERF_PMU_CAP_EXTENDED_HW_TYPE.
- Update the lock contention test case for CSV output.
- Fix a segfault in the daemon command test.
Vendor events (JSON):
- Add has_event() to check if the given event is available on system
at runtime. On Intel machines, some transaction events may not be
present when TSC extensions are disabled.
- Update Intel event metrics.
Misc:
- Sort symbols by name using an external array of pointers instead of
a rbtree node in the symbol. This will save 16-bytes or 24-bytes
per symbol whether the sorting is actually requested or not.
- Fix unwinding DWARF callstacks using libdw when --symfs option is
used"
* tag 'perf-tools-for-v6.5-2-2023-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next: (38 commits)
perf test: Fix event parsing test when PERF_PMU_CAP_EXTENDED_HW_TYPE isn't supported.
perf test: Fix event parsing test on Arm
perf evsel amd: Fix IBS error message
perf: unwind: Fix symfs with libdw
perf symbol: Fix uninitialized return value in symbols__find_by_name()
perf test: Test perf lock contention CSV output
perf lock contention: Add --output option
perf lock contention: Add -x option for CSV style output
perf lock: Remove stale comments
perf vendor events intel: Update tigerlake to 1.13
perf vendor events intel: Update skylakex to 1.31
perf vendor events intel: Update skylake to 57
perf vendor events intel: Update sapphirerapids to 1.14
perf vendor events intel: Update icelakex to 1.21
perf vendor events intel: Update icelake to 1.19
perf vendor events intel: Update cascadelakex to 1.19
perf vendor events intel: Update meteorlake to 1.03
perf vendor events intel: Add rocketlake events/metrics
perf vendor metrics intel: Make transaction metrics conditional
perf jevents: Support for has_event function
...
|
||
|
|
bcd981db12 |
perf test: Fix event parsing test when PERF_PMU_CAP_EXTENDED_HW_TYPE isn't supported.
Arm has multiple PMU types for heterogeneous systems, but doesn't
currently support PERF_PMU_CAP_EXTENDED_HW_TYPE. Make the tests
support both scenarios so that they pass on Arm, and will still pass
once PERF_PMU_CAP_EXTENDED_HW_TYPE support is added.
Fixes:
|
||
|
|
808ce56e7d |
perf test: Fix event parsing test on Arm
The test looks for a PMU from sysfs with type = PERF_TYPE_RAW when
opening a raw event. Arm doesn't have a real raw PMU, only core PMUs
with unique types other than raw.
Instead of looking for a matching PMU, just test that the event type
was parsed as raw and skip the PMU search on Arm. The raw event type
test should also apply to all platforms so add it outside of the ifdef.
Fixes:
|
||
|
|
b2ad9549bf |
perf evsel amd: Fix IBS error message
AMD IBS can do per-process profiling[1] and is no longer restricted to per-cpu or systemwide only. Remove stale error message. Also, checking just exclude_kernel is not sufficient since IBS does not support any privilege filters. So include all exclude_* checks. And finally, move these checks under tools/perf/arch/x86/ from generic code. Before: $ sudo ./perf record -e ibs_op//k -C 0 Error: AMD IBS may only be available in system-wide/per-cpu mode. Try using -a, or -C and workload affinity After: $ sudo ./perf record -e ibs_op//k -C 0 Error: AMD IBS doesn't support privilege filtering. Try again without the privilege modifiers (like 'k') at the end. [1] https://git.kernel.org/torvalds/c/30093056f7b2 Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: ananth.narayan@amd.com Cc: sandipan.das@amd.com Cc: santosh.shukla@amd.com Cc: irogers@google.com Cc: peterz@infradead.org Cc: adrian.hunter@intel.com Cc: acme@kernel.org Cc: jolsa@kernel.org Link: https://lore.kernel.org/r/20230630085230.437-1-ravi.bangoria@amd.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
|
|
5f06267b6e |
perf: unwind: Fix symfs with libdw
Pass the full path including the symfs (if any) to libdw. Without this unwinding fails with errors like this when a symfs is used: unwind: failed with 'No such file or directory'" Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: kernel@axis.com Cc: Ian Rogers <irogers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/r/20230630-perf-libdw-symfs-v2-1-469760dd4d5b@axis.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
|
|
78a175c462 |
perf symbol: Fix uninitialized return value in symbols__find_by_name()
found_idx and s aren't initialized, so if no symbol is found then the
assert at the end will index off the end of the array causing a
segfault. The function also doesn't return NULL when the symbol isn't
found even if the assert passes. Fix it by initializing the values and
only setting them when something is found.
Fixes the following test failure:
$ perf test 1
1: vmlinux symtab matches kallsyms : FAILED!
Fixes:
|
||
|
|
2aefb4cc90 |
perf test: Test perf lock contention CSV output
To verify CSV output, just check the number of separators (",") using
the tr and wc commands like this.
grep -v "^#" ${result} | tr -d -c | wc -c
Now it expects 6 columns (and 5 separators) in the output, but it may
be changed later so count the field in the header first and compare it
to the actual output lines.
$ cat ${result}
# output: contended, total wait, max wait, avg wait, type, caller
1, 28787, 28787, 28787, spinlock, raw_spin_rq_lock_nested+0x1b
The test looks like below now:
$ sudo ./perf test -v contention
86: kernel lock contention analysis test :
--- start ---
test child forked, pid 2705822
Testing perf lock record and perf lock contention
Testing perf lock contention --use-bpf
Testing perf lock record and perf lock contention at the same time
Testing perf lock contention --threads
Testing perf lock contention --lock-addr
Testing perf lock contention --type-filter (w/ spinlock)
Testing perf lock contention --lock-filter (w/ tasklist_lock)
Testing perf lock contention --callstack-filter (w/ unix_stream)
Testing perf lock contention --callstack-filter with task aggregation
Testing perf lock contention CSV output
test child finished with 0
---- end ----
kernel lock contention analysis test: Ok
Acked-by: Ian Rogers <irogers@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230628200141.2739587-5-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
||
|
|
f6027053f8 |
perf lock contention: Add --output option
To avoid formatting failures for example in CSV output due to debug
messages, add --output option to put the result in a file.
Unfortunately the short -o option was taken by the --owner already.
$ sudo ./perf lock con -ab --output lock-out.txt -v sleep 1
Looking at the vmlinux_path (8 entries long)
symsrc__init: cannot get elf header.
Using /proc/kcore for kernel data
Using /proc/kallsyms for symbols
$ head lock-out.txt
contended total wait max wait avg wait type caller
3 76.79 us 26.89 us 25.60 us rwlock:R ep_poll_callback+0x2d
0xffffffff9a23f4b5 _raw_read_lock_irqsave+0x45
0xffffffff99bbd4dd ep_poll_callback+0x2d
0xffffffff999029f3 __wake_up_common+0x73
0xffffffff99902b82 __wake_up_common_lock+0x82
0xffffffff99fa5b1c sock_def_readable+0x3c
0xffffffff9a11521d unix_stream_sendmsg+0x18d
0xffffffff99f9fc9c sock_sendmsg+0x5c
Suggested-by: Ian Rogers <irogers@google.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230628200141.2739587-4-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
||
|
|
69c5c9930d |
perf lock contention: Add -x option for CSV style output
Sometimes we want to process the output by external programs. Let's add
the -x option to specify the field separator like perf stat.
$ sudo ./perf lock con -ab -x, sleep 1
# output: contended, total wait, max wait, avg wait, type, caller
19, 194232, 21415, 10222, spinlock, process_one_work+0x1f0
15, 162748, 23843, 10849, rwsem:R, do_user_addr_fault+0x40e
4, 86740, 23415, 21685, rwlock:R, ep_poll_callback+0x2d
1, 84281, 84281, 84281, mutex, iwl_mvm_async_handlers_wk+0x135
8, 67608, 27404, 8451, spinlock, __queue_work+0x174
3, 58616, 31125, 19538, rwsem:W, do_mprotect_pkey+0xff
3, 52953, 21172, 17651, rwlock:W, do_epoll_wait+0x248
2, 30324, 19704, 15162, rwsem:R, do_madvise+0x3ad
1, 24619, 24619, 24619, spinlock, rcu_core+0xd4
The first line is a comment that shows the output format. Each line is
separated by the given string ("," in this case). The time is printed
in nsec without the unit so that it can be parsed easily.
The characters can be used in the output like (":", "+" and ".") are not
allowed for the -x option.
$ ./perf lock con -x:
Cannot use the separator that is already used
Usage: perf lock contention [<options>]
-x, --field-separator <separator>
print result in CSV format with custom separator
The stacktraces are printed in the same line separated by ":". The
header is updated to show the stacktrace. Also the debug output is
added at the end as a comment.
$ sudo ./perf lock con -abv -x, -F wait_total sleep 1
Looking at the vmlinux_path (8 entries long)
symsrc__init: cannot get elf header.
Using /proc/kcore for kernel data
Using /proc/kallsyms for symbols
# output: total wait, type, caller, stacktrace
37134, spinlock, rcu_core+0xd4, 0xffffffff9d0401e4 _raw_spin_lock_irqsave+0x44: 0xffffffff9c738114 rcu_core+0xd4: ...
21213, spinlock, raw_spin_rq_lock_nested+0x1b, 0xffffffff9d0407c0 _raw_spin_lock+0x30: 0xffffffff9c6d9cfb raw_spin_rq_lock_nested+0x1b: ...
20506, rwlock:W, ep_done_scan+0x2d, 0xffffffff9c9bc4dd ep_done_scan+0x2d: 0xffffffff9c9bd5f1 do_epoll_wait+0x6d1: ...
18044, rwlock:R, ep_poll_callback+0x2d, 0xffffffff9d040555 _raw_read_lock_irqsave+0x45: 0xffffffff9c9bc81d ep_poll_callback+0x2d: ...
17890, rwlock:W, do_epoll_wait+0x47b, 0xffffffff9c9bd39b do_epoll_wait+0x47b: 0xffffffff9c9be9ef __x64_sys_epoll_wait+0x6d1: ...
12114, spinlock, futex_wait_queue+0x60, 0xffffffff9d0407c0 _raw_spin_lock+0x30: 0xffffffff9d037cae __schedule+0xbe: ...
# debug: total=7, bad=0, bad_task=0, bad_stack=0, bad_time=0, bad_data=0
Also note that some field (like lock symbols) can be empty.
$ sudo ./perf lock con -abl -x, -E 10 sleep 1
# output: contended, total wait, max wait, avg wait, address, symbol, type
6, 275025, 61764, 45837, ffff9dcc9f7d60d0, , spinlock
18, 87716, 11196, 4873, ffff9dc540059000, , spinlock
2, 6472, 5499, 3236, ffff9dcc7f730e00, rq_lock, spinlock
3, 4429, 2341, 1476, ffff9dcc7f7b0e00, rq_lock, spinlock
3, 3974, 1635, 1324, ffff9dcc7f7f0e00, rq_lock, spinlock
4, 3290, 1326, 822, ffff9dc5f4e2cde0, , rwlock
3, 2894, 1023, 964, ffffffff9e0d7700, rcu_state, spinlock
1, 2567, 2567, 2567, ffff9dcc7f6b0e00, rq_lock, spinlock
4, 1259, 596, 314, ffff9dc69c2adde0, , rwlock
1, 934, 934, 934, ffff9dcc7f670e00, rq_lock, spinlock
Acked-by: Ian Rogers <irogers@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230628200141.2739587-3-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
||
|
|
7b83d597c8 |
perf lock: Remove stale comments
The comment was for symbol_conf.sort_by_name which was deleted already. Let's get rid of the stale comments as well. Acked-by: Ian Rogers <irogers@google.com> Cc: Hao Luo <haoluo@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230628200141.2739587-2-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
|
|
b30d7a77c5 |
Merge tag 'perf-tools-for-v6.5-1-2023-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next
Pull perf tools updates from Namhyung Kim:
"Internal cleanup:
- Refactor PMU data management to handle hybrid systems in a generic
way.
Do more work in the lexer so that legacy event types parse more
easily. A side-effect of this is that if a PMU is specified,
scanning sysfs is avoided improving start-up time.
- Fix hybrid metrics, for example, the TopdownL1 works for both
performance and efficiency cores on Intel machines. To support
this, sort and regroup events after parsing.
- Add reference count checking for the 'thread' data structure.
- Lots of fixes for memory leaks in various places thanks to the ASAN
and Ian's refcount checker.
- Reduce the binary size by replacing static variables with local or
dynamically allocated memory.
- Introduce shared_mutex for annotate data to reduce memory
footprint.
- Make filesystem access library functions more thread safe.
Test:
- Organize cpu_map tests into a single suite.
- Add metric value validation test to check if the values are within
correct value ranges.
- Add perf stat stdio output test to check if event and metric names
match.
- Add perf data converter JSON output test.
- Fix a lot of issues reported by shellcheck(1). This is a
preparation to enable shellcheck by default.
- Make the large x86 new instructions test optional at build time
using EXTRA_TESTS=1.
- Add a test for libpfm4 events.
perf script:
- Add 'dsoff' outpuf field to display offset from the DSO.
$ perf script -F comm,pid,event,ip,dsoff
ls 2695501 cycles: 152cc73ef4b5 (/usr/lib/x86_64-linux-gnu/ld-2.31.so+0x1c4b5)
ls 2695501 cycles: ffffffff99045b3e ([kernel.kallsyms])
ls 2695501 cycles: ffffffff9968e107 ([kernel.kallsyms])
ls 2695501 cycles: ffffffffc1f54afb ([kernel.kallsyms])
ls 2695501 cycles: ffffffff9968382f ([kernel.kallsyms])
ls 2695501 cycles: ffffffff99e00094 ([kernel.kallsyms])
ls 2695501 cycles: 152cc718a8d0 (/usr/lib/x86_64-linux-gnu/libselinux.so.1+0x68d0)
ls 2695501 cycles: ffffffff992a6db0 ([kernel.kallsyms])
- Adjust width for large PID/TID values.
perf report:
- Robustify reading addr2line output for srcline by checking sentinel
output before the actual data and by using timeout of 1 second.
- Allow config terms (like 'name=ABC') with breakpoint events.
$ perf record -e mem:0x55feb98dd169:x/name=breakpoint/ -p 19646 -- sleep 1
perf annotate:
- Handle x86 instruction suffix like 'l' in 'movl' generally.
- Parse instruction operands properly even with a whitespace. This is
needed for llvm-objdump output.
- Support RISC-V binutils lookup using the triplet prefixes.
- Add '<' and '>' key to navigate to prev/next symbols in TUI.
- Fix instruction association and parsing for LoongArch.
perf stat:
- Add --per-cache aggregation option, optionally specify a cache
level like `--per-cache=L2`.
$ sudo perf stat --per-cache -a -e ls_dmnd_fills_from_sys.ext_cache_remote --\
taskset -c 0-15,64-79,128-143,192-207\
perf bench sched messaging -p -t -l 100000 -g 8
# Running 'sched/messaging' benchmark:
# 20 sender and receiver threads per group
# 8 groups == 320 threads run
Total time: 7.648 [sec]
Performance counter stats for 'system wide':
S0-D0-L3-ID0 16 17,145,912 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID8 16 14,977,628 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID16 16 262,539 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID24 16 3,140 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID32 16 27,403 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID40 16 17,026 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID48 16 7,292 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID56 16 2,464 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID64 16 22,489,306 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID72 16 21,455,257 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID80 16 11,619 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID88 16 30,978 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID96 16 37,628 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID104 16 13,594 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID112 16 10,164 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID120 16 11,259 ls_dmnd_fills_from_sys.ext_cache_remote
7.779171484 seconds time elapsed
- Change default (no event/metric) formatting for default metrics so
that events are hidden and the metric and group appear.
Performance counter stats for 'ls /':
1.85 msec task-clock # 0.594 CPUs utilized
0 context-switches # 0.000 /sec
0 cpu-migrations # 0.000 /sec
97 page-faults # 52.517 K/sec
2,187,173 cycles # 1.184 GHz
2,474,459 instructions # 1.13 insn per cycle
531,584 branches # 287.805 M/sec
13,626 branch-misses # 2.56% of all branches
TopdownL1 # 23.5 % tma_backend_bound
# 11.5 % tma_bad_speculation
# 39.1 % tma_frontend_bound
# 25.9 % tma_retiring
- Allow --cputype option to have any PMU name (not just hybrid).
- Fix output value not to added when it runs multiple times with -r
option.
perf list:
- Show metricgroup description from JSON file called
metricgroups.json.
- Allow 'pfm' argument to list only libpfm4 events and check each
event is supported before showing it.
JSON vendor events:
- Avoid event grouping using "NO_GROUP_EVENTS" constraints. The
topdown events are correctly grouped even if no group exists.
- Add "Default" metric group to print it in the default output. And
use "DefaultMetricgroupName" to indicate the real metric group
name.
- Add AmpereOne core PMU events.
Misc:
- Define man page date correctly.
- Track exception level properly on ARM CoreSight ETM.
- Allow anonymous struct, union or enum when retrieving type names
from DWARF.
- Fix incorrect filename when calling `perf inject --jit`.
- Handle PLT size correctly on LoongArch"
* tag 'perf-tools-for-v6.5-1-2023-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next: (269 commits)
perf test: Skip metrics w/o event name in stat STD output linter
perf test: Reorder event name checks in stat STD output linter
perf pmu: Remove a hard coded cpu PMU assumption
perf pmus: Add notion of default PMU for JSON events
perf unwind: Fix map reference counts
perf test: Set PERF_EXEC_PATH for script execution
perf script: Initialize buffer for regs_map()
perf tests: Fix test_arm_callgraph_fp variable expansion
perf symbol: Add LoongArch case in get_plt_sizes()
perf test: Remove x permission from lib/stat_output.sh
perf test: Rerun failed metrics with longer workload
perf test: Add skip list for metrics known would fail
perf test: Add metric value validation test
perf jit: Fix incorrect file name in DWARF line table
perf annotate: Fix instruction association and parsing for LoongArch
perf annotation: Switch lock from a mutex to a sharded_mutex
perf sharded_mutex: Introduce sharded_mutex
tools: Fix incorrect calculation of object size by sizeof
perf subcmd: Fix missing check for return value of malloc() in add_cmdname()
perf parse-events: Remove unneeded semicolon
...
|
||
|
|
887e845f8c |
perf vendor events intel: Update tigerlake to 1.13
Updates were released in:
|
||
|
|
b5d2644da6 |
perf vendor events intel: Update skylakex to 1.31
Updates were released in:
|
||
|
|
ea3eafa08a |
perf vendor events intel: Update skylake to 57
Updates were released in:
|
||
|
|
938e4ad310 |
perf vendor events intel: Update sapphirerapids to 1.14
Updates were released in:
|
||
|
|
663655c91c |
perf vendor events intel: Update icelakex to 1.21
Updates were released in:
|
||
|
|
d1363b9454 |
perf vendor events intel: Update icelake to 1.19
Updates were released in:
|
||
|
|
0c9e39421c |
perf vendor events intel: Update cascadelakex to 1.19
Updates were released in:
|
||
|
|
dfc83cc877 |
perf vendor events intel: Update meteorlake to 1.03
1.03 events were released in:
|
||
|
|
7e74ece31a |
perf vendor events intel: Add rocketlake events/metrics
Add RocketLake events added to Intel perfmon in:
|
||
|
|
8076dc8c68 |
perf vendor metrics intel: Make transaction metrics conditional
Make the transaction metrics conditional on the cycles-tx event being present. This event may not be present when TSX extensions have been disabled. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Sohom Datta <sohomdatta1@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/r/20230623151016.4193660-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
|
|
eadc0040a9 |
perf jevents: Support for has_event function
Support for the new has_event function in metrics. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Sohom Datta <sohomdatta1@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/r/20230623151016.4193660-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
|
|
4a4a9bf907 |
perf expr: Add has_event function
Some events are dependent on firmware/kernel enablement. Allow such events to be detected when the metric is parsed so that the metric's event parsing doesn't fail. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Sohom Datta <sohomdatta1@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/r/20230623151016.4193660-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
|
|
36cee69f57 |
perf tools: Do not remove addr_location.thread in thread__find_map()
The thread__find_map() is to find a map for a given address in the
given thread's address space. It searches maps based on the cpu mode
and fills various information in the addr_location data structure.
It might change al->maps and al->map, but not al->thread. Then I
think no reason to not set the al->thread at the beginning.
Also get rid of the duplicate 'al->map = NULL' part.
Fixes:
|
||
|
|
3a8a670eee |
Merge tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking changes from Jakub Kicinski:
"WiFi 7 and sendpage changes are the biggest pieces of work for this
release. The latter will definitely require fixes but I think that we
got it to a reasonable point.
Core:
- Rework the sendpage & splice implementations
Instead of feeding data into sockets page by page extend sendmsg
handlers to support taking a reference on the data, controlled by a
new flag called MSG_SPLICE_PAGES
Rework the handling of unexpected-end-of-file to invoke an
additional callback instead of trying to predict what the right
combination of MORE/NOTLAST flags is
Remove the MSG_SENDPAGE_NOTLAST flag completely
- Implement SCM_PIDFD, a new type of CMSG type analogous to
SCM_CREDENTIALS, but it contains pidfd instead of plain pid
- Enable socket busy polling with CONFIG_RT
- Improve reliability and efficiency of reporting for ref_tracker
- Auto-generate a user space C library for various Netlink families
Protocols:
- Allow TCP to shrink the advertised window when necessary, prevent
sk_rcvbuf auto-tuning from growing the window all the way up to
tcp_rmem[2]
- Use per-VMA locking for "page-flipping" TCP receive zerocopy
- Prepare TCP for device-to-device data transfers, by making sure
that payloads are always attached to skbs as page frags
- Make the backoff time for the first N TCP SYN retransmissions
linear. Exponential backoff is unnecessarily conservative
- Create a new MPTCP getsockopt to retrieve all info
(MPTCP_FULL_INFO)
- Avoid waking up applications using TLS sockets until we have a full
record
- Allow using kernel memory for protocol ioctl callbacks, paving the
way to issuing ioctls over io_uring
- Add nolocalbypass option to VxLAN, forcing packets to be fully
encapsulated even if they are destined for a local IP address
- Make TCPv4 use consistent hash in TIME_WAIT and SYN_RECV. Ensure
in-kernel ECMP implementation (e.g. Open vSwitch) select the same
link for all packets. Support L4 symmetric hashing in Open vSwitch
- PPPoE: make number of hash bits configurable
- Allow DNS to be overwritten by DHCPACK in the in-kernel DHCP client
(ipconfig)
- Add layer 2 miss indication and filtering, allowing higher layers
(e.g. ACL filters) to make forwarding decisions based on whether
packet matched forwarding state in lower devices (bridge)
- Support matching on Connectivity Fault Management (CFM) packets
- Hide the "link becomes ready" IPv6 messages by demoting their
printk level to debug
- HSR: don't enable promiscuous mode if device offloads the proto
- Support active scanning in IEEE 802.15.4
- Continue work on Multi-Link Operation for WiFi 7
BPF:
- Add precision propagation for subprogs and callbacks. This allows
maintaining verification efficiency when subprograms are used, or
in fact passing the verifier at all for complex programs,
especially those using open-coded iterators
- Improve BPF's {g,s}setsockopt() length handling. Previously BPF
assumed the length is always equal to the amount of written data.
But some protos allow passing a NULL buffer to discover what the
output buffer *should* be, without writing anything
- Accept dynptr memory as memory arguments passed to helpers
- Add routing table ID to bpf_fib_lookup BPF helper
- Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands
- Drop bpf_capable() check in BPF_MAP_FREEZE command (used to mark
maps as read-only)
- Show target_{obj,btf}_id in tracing link fdinfo
- Addition of several new kfuncs (most of the names are
self-explanatory):
- Add a set of new dynptr kfuncs: bpf_dynptr_adjust(),
bpf_dynptr_is_null(), bpf_dynptr_is_rdonly(), bpf_dynptr_size()
and bpf_dynptr_clone().
- bpf_task_under_cgroup()
- bpf_sock_destroy() - force closing sockets
- bpf_cpumask_first_and(), rework bpf_cpumask_any*() kfuncs
Netfilter:
- Relax set/map validation checks in nf_tables. Allow checking
presence of an entry in a map without using the value
- Increase ip_vs_conn_tab_bits range for 64BIT builds
- Allow updating size of a set
- Improve NAT tuple selection when connection is closing
Driver API:
- Integrate netdev with LED subsystem, to allow configuring HW
"offloaded" blinking of LEDs based on link state and activity
(i.e. packets coming in and out)
- Support configuring rate selection pins of SFP modules
- Factor Clause 73 auto-negotiation code out of the drivers, provide
common helper routines
- Add more fool-proof helpers for managing lifetime of MDIO devices
associated with the PCS layer
- Allow drivers to report advanced statistics related to Time Aware
scheduler offload (taprio)
- Allow opting out of VF statistics in link dump, to allow more VFs
to fit into the message
- Split devlink instance and devlink port operations
New hardware / drivers:
- Ethernet:
- Synopsys EMAC4 IP support (stmmac)
- Marvell 88E6361 8 port (5x1GE + 3x2.5GE) switches
- Marvell 88E6250 7 port switches
- Microchip LAN8650/1 Rev.B0 PHYs
- MediaTek MT7981/MT7988 built-in 1GE PHY driver
- WiFi:
- Realtek RTL8192FU, 2.4 GHz, b/g/n mode, 2T2R, 300 Mbps
- Realtek RTL8723DS (SDIO variant)
- Realtek RTL8851BE
- CAN:
- Fintek F81604
Drivers:
- Ethernet NICs:
- Intel (100G, ice):
- support dynamic interrupt allocation
- use meta data match instead of VF MAC addr on slow-path
- nVidia/Mellanox:
- extend link aggregation to handle 4, rather than just 2 ports
- spawn sub-functions without any features by default
- OcteonTX2:
- support HTB (Tx scheduling/QoS) offload
- make RSS hash generation configurable
- support selecting Rx queue using TC filters
- Wangxun (ngbe/txgbe):
- add basic Tx/Rx packet offloads
- add phylink support (SFP/PCS control)
- Freescale/NXP (enetc):
- report TAPRIO packet statistics
- Solarflare/AMD:
- support matching on IP ToS and UDP source port of outer
header
- VxLAN and GENEVE tunnel encapsulation over IPv4 or IPv6
- add devlink dev info support for EF10
- Virtual NICs:
- Microsoft vNIC:
- size the Rx indirection table based on requested
configuration
- support VLAN tagging
- Amazon vNIC:
- try to reuse Rx buffers if not fully consumed, useful for ARM
servers running with 16kB pages
- Google vNIC:
- support TCP segmentation of >64kB frames
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- enable USXGMII (88E6191X)
- Microchip:
- lan966x: add support for Egress Stage 0 ACL engine
- lan966x: support mapping packet priority to internal switch
priority (based on PCP or DSCP)
- Ethernet PHYs:
- Broadcom PHYs:
- support for Wake-on-LAN for BCM54210E/B50212E
- report LPI counter
- Microsemi PHYs: support RGMII delay configuration (VSC85xx)
- Micrel PHYs: receive timestamp in the frame (LAN8841)
- Realtek PHYs: support optional external PHY clock
- Altera TSE PCS: merge the driver into Lynx PCS which it is a
variant of
- CAN: Kvaser PCIEcan:
- support packet timestamping
- WiFi:
- Intel (iwlwifi):
- major update for new firmware and Multi-Link Operation (MLO)
- configuration rework to drop test devices and split the
different families
- support for segmented PNVM images and power tables
- new vendor entries for PPAG (platform antenna gain) feature
- Qualcomm 802.11ax (ath11k):
- Multiple Basic Service Set Identifier (MBSSID) and Enhanced
MBSSID Advertisement (EMA) support in AP mode
- support factory test mode
- RealTek (rtw89):
- add RSSI based antenna diversity
- support U-NII-4 channels on 5 GHz band
- RealTek (rtl8xxxu):
- AP mode support for 8188f
- support USB RX aggregation for the newer chips"
* tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1602 commits)
net: scm: introduce and use scm_recv_unix helper
af_unix: Skip SCM_PIDFD if scm->pid is NULL.
net: lan743x: Simplify comparison
netlink: Add __sock_i_ino() for __netlink_diag_dump().
net: dsa: avoid suspicious RCU usage for synced VLAN-aware MAC addresses
Revert "af_unix: Call scm_recv() only after scm_set_cred()."
phylink: ReST-ify the phylink_pcs_neg_mode() kdoc
libceph: Partially revert changes to support MSG_SPLICE_PAGES
net: phy: mscc: fix packet loss due to RGMII delays
net: mana: use vmalloc_array and vcalloc
net: enetc: use vmalloc_array and vcalloc
ionic: use vmalloc_array and vcalloc
pds_core: use vmalloc_array and vcalloc
gve: use vmalloc_array and vcalloc
octeon_ep: use vmalloc_array and vcalloc
net: usb: qmi_wwan: add u-blox 0x1312 composition
perf trace: fix MSG_SPLICE_PAGES build error
ipvlan: Fix return value of ipvlan_queue_xmit()
netfilter: nf_tables: fix underflow in chain reference counter
netfilter: nf_tables: unbind non-anonymous set if rule construction fails
...
|
||
|
|
628eaa4e87 |
perf pmus: Add placeholder core PMU
If loading a core PMU fails, legacy hardware/cache events may segv due to there being no PMU. Create a placeholder empty PMU for this case. This was discussed in: https://lore.kernel.org/lkml/20230614151625.2077-1-yangjihong1@huawei.com/ Reported-by: Yang Jihong <yangjihong1@huawei.com> Tested-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Rob Herring <robh@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/r/20230627182834.117565-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
|
|
6aeadf7896 |
Merge tag 'docs-arm64-move' of git://git.lwn.net/linux
Pull arm64 documentation move from Jonathan Corbet: "Move the arm64 architecture documentation under Documentation/arch/. This brings some order to the documentation directory, declutters the top-level directory, and makes the documentation organization more closely match that of the source" * tag 'docs-arm64-move' of git://git.lwn.net/linux: perf arm-spe: Fix a dangling Documentation/arm64 reference mm: Fix a dangling Documentation/arm64 reference arm64: Fix dangling references to Documentation/arm64 dt-bindings: fix dangling Documentation/arm64 reference docs: arm64: Move arm64 documentation under Documentation/arch/ |
||
|
|
a193cc7506 |
Merge tag 'perf-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events updates from Ingo Molnar:
- Rework & fix the event forwarding logic by extending the core
interface.
This fixes AMD PMU events that have to be forwarded from the
core PMU to the IBS PMU.
- Add self-tests to test AMD IBS invocation via core PMU events
- Clean up Intel FixCntrCtl MSR encoding & handling
* tag 'perf-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Re-instate the linear PMU search
perf/x86/intel: Define bit macros for FixCntrCtl MSR
perf test: Add selftest to test IBS invocation via core pmu events
perf/core: Remove pmu linear searching code
perf/ibs: Fix interface via core pmu events
perf/core: Rework forwarding of {task|cpu}-clock events
|
||
|
|
ad5f604e18 |
perf test: Fix a compile error on pe-file-parsing.c
The dso__find_symbol_by_name() should be have idx pointer argument.
Found during the build-test.
$ make build-test
...
CC /tmp/tmp.6JwPK1xbWG/tests/pe-file-parsing.o
tests/pe-file-parsing.c: In function ‘run_dir’:
tests/pe-file-parsing.c:64:15: error: too few arguments to function ‘dso__find_symbol_by_name’
64 | sym = dso__find_symbol_by_name(dso, "main");
| ^~~~~~~~~~~~~~~~~~~~~~~~
In file included from tests/pe-file-parsing.c:16:
/usr/local/google/home/namhyung/project/linux/tools/perf/util/symbol.h:135:16: note: declared here
135 | struct symbol *dso__find_symbol_by_name(struct dso *dso, const char *name, size_t *idx);
| ^~~~~~~~~~~~~~~~~~~~~~~~
Fixes:
|
||
|
|
78987bb02a |
perf: Replace deprecated -target with --target= for Clang
-target has been deprecated since Clang 3.4 in 2013. Use the preferred
--target=bpf form instead. This matches how we use --target= in
scripts/Makefile.clang.
Signed-off-by: Fangrui Song <maskray@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: llvm@lists.linux.dev
Cc: Ingo Molnar <mingo@redhat.com>
Cc: bpf@vger.kernel.org
Link:
|
||
|
|
710dffc969 |
perf pmu: Correct auto_merge_stats test
The original logic was to check is_pmu_hybrid() like in the below.
It just checks the name of PMU specifically for Intel hybrid systems
which means uncore PMU events should return false.
https://lore.kernel.org/all/20230527072210.2900565-35-irogers@google.com/
The is_pmu_hybrid() was replaced by arch-agnostic way but with the
incorrect condition which was fixed for core PMUs but not uncore.
This change fixes both.
Fixes:
|
||
|
|
49a5e3edd3 |
perf tools: Add missing else to cmd_daemon subcommand condition
Namhyung reported segfault in perf daemon start command.
It's caused by extra check on argv[0] which is set to NULL by previous
__cmd_start call. Adding missing else to skip the extra check.
Fixes:
|
||
|
|
2553a5270d |
perf trace: fix MSG_SPLICE_PAGES build error
Our MPTCP CI and Stephen got this error:
In file included from builtin-trace.c:907:
trace/beauty/msg_flags.c: In function 'syscall_arg__scnprintf_msg_flags':
trace/beauty/msg_flags.c:28:21: error: 'MSG_SPLICE_PAGES' undeclared (first use in this function)
28 | if (flags & MSG_##n) { | ^~~~
trace/beauty/msg_flags.c:50:9: note: in expansion of macro 'P_MSG_FLAG'
50 | P_MSG_FLAG(SPLICE_PAGES);
| ^~~~~~~~~~
trace/beauty/msg_flags.c:28:21: note: each undeclared identifier is reported only once for each function it appears in
28 | if (flags & MSG_##n) { | ^~~~
trace/beauty/msg_flags.c:50:9: note: in expansion of macro 'P_MSG_FLAG'
50 | P_MSG_FLAG(SPLICE_PAGES);
| ^~~~~~~~~~
The fix is similar to what was done with MSG_FASTOPEN: the new macro is
defined if it is not defined in the system headers.
Fixes:
|
||
|
|
b848b26c66 |
net: Kill MSG_SENDPAGE_NOTLAST
Now that ->sendpage() has been removed, MSG_SENDPAGE_NOTLAST can be cleaned up. Things were converted to use MSG_MORE instead, but the protocol sendpage stubs still convert MSG_SENDPAGE_NOTLAST to MSG_MORE, which is now unnecessary. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> cc: linux-afs@lists.infradead.org cc: mptcp@lists.linux.dev cc: rds-devel@oss.oracle.com cc: tipc-discussion@lists.sourceforge.net cc: virtualization@lists.linux-foundation.org Link: https://lore.kernel.org/r/20230623225513.2732256-17-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
929ff679b6 |
perf tools: Add printing perf_event_attr config symbol in perf_event_attr__fprintf()
When printing perf_event_attr, always display perf_event_attr config and
its symbol to improve the readability of debugging information.
Before:
# perf --debug verbose=2 record -e cycles,cpu-clock,sched:sched_switch,branch-load-misses,r101,mem:0x0 -C 0 true
<SNIP>
------------------------------------------------------------
perf_event_attr:
size 136
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5
------------------------------------------------------------
perf_event_attr:
type 1
size 136
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 6
------------------------------------------------------------
perf_event_attr:
type 2
size 136
config 0x143
{ sample_period, sample_freq } 1
sample_type IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER
read_format ID
disabled 1
inherit 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 7
------------------------------------------------------------
perf_event_attr:
type 3
size 136
config 0x10005
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 9
------------------------------------------------------------
perf_event_attr:
type 4
size 136
config 0x101
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 10
------------------------------------------------------------
perf_event_attr:
type 5
size 136
{ sample_period, sample_freq } 1
sample_type IP|TID|TIME|CPU|IDENTIFIER
read_format ID
disabled 1
inherit 1
sample_id_all 1
exclude_guest 1
bp_type 3
{ bp_len, config2 } 0x4
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 11
<SNIP>
After:
# perf --debug verbose=2 record -e cycles,cpu-clock,sched:sched_switch,branch-load-misses,r101,mem:0x0 -C 0 true
<SNIP>
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0 (PERF_COUNT_HW_CPU_CYCLES)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5
------------------------------------------------------------
perf_event_attr:
type 1 (PERF_TYPE_SOFTWARE)
size 136
config 0 (PERF_COUNT_SW_CPU_CLOCK)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 6
------------------------------------------------------------
perf_event_attr:
type 2 (PERF_TYPE_TRACEPOINT)
size 136
config 0x143 (sched:sched_switch)
{ sample_period, sample_freq } 1
sample_type IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER
read_format ID
disabled 1
inherit 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 7
------------------------------------------------------------
perf_event_attr:
type 3 (PERF_TYPE_HW_CACHE)
size 136
config 0x10005 (PERF_COUNT_HW_CACHE_RESULT_MISS | PERF_COUNT_HW_CACHE_OP_READ | PERF_COUNT_HW_CACHE_BPU)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 9
------------------------------------------------------------
perf_event_attr:
type 4 (PERF_TYPE_RAW)
size 136
config 0x101
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 10
------------------------------------------------------------
perf_event_attr:
type 5 (PERF_TYPE_BREAKPOINT)
size 136
config 0
{ sample_period, sample_freq } 1
sample_type IP|TID|TIME|CPU|IDENTIFIER
read_format ID
disabled 1
inherit 1
sample_id_all 1
exclude_guest 1
bp_type 3
{ bp_len, config2 } 0x4
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 11
------------------------------------------------------------
perf_event_attr:
type 1 (PERF_TYPE_SOFTWARE)
size 136
config 0x9 (PERF_COUNT_SW_DUMMY)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
inherit 1
mmap 1
comm 1
freq 1
task 1
sample_id_all 1
mmap2 1
comm_exec 1
ksymbol 1
bpf_event 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 12
<SNIP>
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: anshuman.khandual@arm.com
Cc: mark.rutland@arm.com
Cc: irogers@google.com
Cc: jesussanp@google.com
Cc: peterz@infradead.org
Cc: acme@kernel.org
Cc: jolsa@kernel.org
Cc: alexander.shishkin@linux.intel.com
Cc: mingo@redhat.com
Link: https://lore.kernel.org/r/20230623054416.160858-5-yangjihong1@huawei.com
[ fix perf import test by adding a dummy tracepoint_id__to_name() ]
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
||
|
|
2e4dd65d7d |
perf tools: Add printing perf_event_attr type symbol in perf_event_attr__fprintf()
When printing perf_event_attr, always display perf_event_attr type and
its symbol to improve the readability of debugging information.
Before:
# perf --debug verbose=2 record -e cycles,cpu-clock,sched:sched_switch,branch-load-misses,r101,mem:0x0 -C 0 true
<SNIP>
------------------------------------------------------------
perf_event_attr:
size 136
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5
------------------------------------------------------------
perf_event_attr:
type 1
size 136
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 6
------------------------------------------------------------
perf_event_attr:
type 2
size 136
config 0x143
{ sample_period, sample_freq } 1
sample_type IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER
read_format ID
disabled 1
inherit 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 7
------------------------------------------------------------
perf_event_attr:
type 3
size 136
config 0x10005
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 9
------------------------------------------------------------
perf_event_attr:
type 4
size 136
config 0x101
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 10
------------------------------------------------------------
perf_event_attr:
type 5
size 136
{ sample_period, sample_freq } 1
sample_type IP|TID|TIME|CPU|IDENTIFIER
read_format ID
disabled 1
inherit 1
sample_id_all 1
exclude_guest 1
bp_type 3
{ bp_len, config2 } 0x4
------------------------------------------------------------
<SNIP>
After:
# perf --debug verbose=2 record -e cycles,cpu-clock,sched:sched_switch,branch-load-misses,r101,mem:0x0 -C 0 true
<SNIP>
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5
------------------------------------------------------------
perf_event_attr:
type 1 (PERF_TYPE_SOFTWARE)
size 136
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 6
------------------------------------------------------------
perf_event_attr:
type 2 (PERF_TYPE_TRACEPOINT)
size 136
config 0x143
{ sample_period, sample_freq } 1
sample_type IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER
read_format ID
disabled 1
inherit 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 7
------------------------------------------------------------
perf_event_attr:
type 3 (PERF_TYPE_HW_CACHE)
size 136
config 0x10005
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 9
------------------------------------------------------------
perf_event_attr:
type 4 (PERF_TYPE_RAW)
size 136
config 0x101
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 10
------------------------------------------------------------
perf_event_attr:
type 5 (PERF_TYPE_BREAKPOINT)
size 136
{ sample_period, sample_freq } 1
sample_type IP|TID|TIME|CPU|IDENTIFIER
read_format ID
disabled 1
inherit 1
sample_id_all 1
exclude_guest 1
bp_type 3
{ bp_len, config2 } 0x4
------------------------------------------------------------
<SNIP>
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: anshuman.khandual@arm.com
Cc: mark.rutland@arm.com
Cc: irogers@google.com
Cc: jesussanp@google.com
Cc: peterz@infradead.org
Cc: acme@kernel.org
Cc: jolsa@kernel.org
Cc: alexander.shishkin@linux.intel.com
Cc: mingo@redhat.com
Link: https://lore.kernel.org/r/20230623054416.160858-4-yangjihong1@huawei.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
||
|
|
5492e72500 |
perf tools: Extend PRINT_ATTRf to support printing of members with a value of 0
When printing attr, members whose value is 0 will not be printed, we want to print the case where attr->type is 0(PERF_TYPE_HARDWARE), add `_a` param to PRINT_ATTRf macro to always print member when it is true No functional change. Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: anshuman.khandual@arm.com Cc: mark.rutland@arm.com Cc: irogers@google.com Cc: jesussanp@google.com Cc: peterz@infradead.org Cc: acme@kernel.org Cc: jolsa@kernel.org Cc: alexander.shishkin@linux.intel.com Cc: mingo@redhat.com Link: https://lore.kernel.org/r/20230623054416.160858-3-yangjihong1@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
|
|
220f88b5a1 |
perf trace-event-info: Add tracepoint_id_to_name() helper
Add tracepoint_id_to_name() helper to search for the trace events directory by given event id and return the corresponding tracepoint. Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: anshuman.khandual@arm.com Cc: mark.rutland@arm.com Cc: irogers@google.com Cc: jesussanp@google.com Cc: peterz@infradead.org Cc: acme@kernel.org Cc: jolsa@kernel.org Cc: alexander.shishkin@linux.intel.com Cc: mingo@redhat.com Link: https://lore.kernel.org/r/20230623054416.160858-2-yangjihong1@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
|
|
d82257d7f6 |
perf symbol: Remove now unused symbol_conf.sort_by_name
Previously used to specify symbol_name_rb_node was in use. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Carsten Haitzler <carsten.haitzler@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Jason Wang <wangborong@cdjrlc.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/20230623054520.4118442-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
|
|
259dce914e |
perf symbol: Remove symbol_name_rb_node
Most perf commands want to sort symbols by name and this is done via an invasive rbtree that on 64-bit systems costs 24 bytes. Sorting the symbols in a DSO by name is optional and not done by default, however, if sorting is requested the 24 bytes is allocated for every symbol. This change removes the rbtree and uses a sorted array of symbol pointers instead (costing 8 bytes per symbol). As the array is created on demand then there are further memory savings. The complexity of sorting the array and using the rbtree are the same. To support going to the next symbol, the index of the current symbol needs to be passed around as a pair with the current symbol. This requires some API changes. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Carsten Haitzler <carsten.haitzler@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Jason Wang <wangborong@cdjrlc.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/20230623054520.4118442-3-irogers@google.com [ minimize change in symbols__sort_by_name() ] Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
|
|
ce5b293405 |
perf dso: Sort symbols under lock
Determine if symbols are sorted, set the sorted flag and sort under the dso lock. Done in the interest of thread safety. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Carsten Haitzler <carsten.haitzler@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Jason Wang <wangborong@cdjrlc.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/20230623054520.4118442-2-irogers@google.com [ handle the similar code in util/probe-event.c ] Signed-off-by: Namhyung Kim <namhyung@kernel.org> |