Commit Graph

9632 Commits

Author SHA1 Message Date
Ian Rogers
3f4ac23a99 perf dsos: Switch backing storage to array from rbtree/list
DSOs were held on a list for fast iteration and in an rbtree for fast
finds.

Switch to using a lazily sorted array where iteration is just iterating
through the array and binary searches are the same complexity as
searching the rbtree.

The find may need to sort the array first which does increase the
complexity, but add operations have lower complexity and overall the
complexity should remain about the same.

The set name operations on the dso just records that the array is no
longer sorted, avoiding complexity in rebalancing the rbtree.

Tighter locking discipline is enforced to avoid the array being resorted
while long and short names or ids are changed.

The array is smaller in size, replacing 6 pointers with 2, and so even
with extra allocated space in the array, the array may be 50%
unoccupied, the memory saving should be at least 2x.

Committer testing:

On a previous version of this patchset we were getting a lot of warnings
about deleting a DSO still on a list, now it is ok:

  root@x1:~# perf probe -l
  root@x1:~# perf probe finish_task_switch
  Added new event:
    probe:finish_task_switch (on finish_task_switch)

  You can now use it in all perf tools, such as:

  	perf record -e probe:finish_task_switch -aR sleep 1

  root@x1:~# perf probe -l
    probe:finish_task_switch (on finish_task_switch@kernel/sched/core.c)
  root@x1:~# perf trace -e probe:finish_task_switch/max-stack=8/ --max-events=1
       0.000 migration/0/19 probe:finish_task_switch(__probe_ip: -1894408688)
                                         finish_task_switch.isra.0 ([kernel.kallsyms])
                                         __schedule ([kernel.kallsyms])
                                         schedule ([kernel.kallsyms])
                                         smpboot_thread_fn ([kernel.kallsyms])
                                         kthread ([kernel.kallsyms])
                                         ret_from_fork ([kernel.kallsyms])
                                         ret_from_fork_asm ([kernel.kallsyms])
  root@x1:~#
  root@x1:~# perf probe -d probe:*
  Removed event: probe:finish_task_switch
  root@x1:~# perf probe -l
  root@x1:~#

I also ran the full 'perf test' suite after applying this one, no
regressions.

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Chengen Du <chengen.du@canonical.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Li Dong <lidong@vivo.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paran Lee <p4ranlee@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com>
Link: https://lore.kernel.org/r/20240504213803.218974-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-06 09:13:11 -03:00
Ian Rogers
7b6dd7a923 perf pmu: Assume sysfs events are always the same case
Perf event names aren't case sensitive. For sysfs events the entire
directory of events is read then iterated comparing names in a case
insensitive way, most often to see if an event is present.

Consider:

  $ perf stat -e inst_retired.any true

The event inst_retired.any may be present in any PMU, so every PMU's
sysfs events are loaded and then searched with strcasecmp to see if
any match. This event is only present on the cpu PMU as a JSON event
so a lot of events were loaded from sysfs unnecessarily just to prove
an event didn't exist there.

This change avoids loading all the events by assuming sysfs event
names are always either lower or uppercase. It uses file exists and
only loads the events when the desired event is present.

For the example above, the number of openat calls measured by 'perf
trace' on a tigerlake laptop goes from 325 down to 255. The reduction
will be larger for machines with many PMUs, particularly replicated
uncore PMUs.

Ensure pmu_aliases_parse() is called before all uses of the aliases
list, but remove some "pmu->sysfs_aliases_loaded" tests as they are now
part of the function.

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20240502213507.2339733-7-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-03 17:08:20 -03:00
Ian Rogers
18eb2ca8c1 perf test pmu: Add an eagerly loaded event test
Allow events/aliases to be eagerly loaded for a PMU. Factor out the
pmu_aliases_parse to allow this.

Parse a test event and check it configures the attribute as expected.

There is overlap with the parse-events tests, but this test is done with
a PMU created in a temp directory and doesn't rely on PMUs in sysfs.

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20240502213507.2339733-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-03 17:08:20 -03:00
Ian Rogers
aa1551f299 perf test pmu: Refactor format test and exposed test APIs
In tests/pmu.c, make a common utility that creates a PMU in a mkdtemp
directory and uses regular PMU parsing logic to load that PMU. Formats
must still be eagerly loaded as by default the PMU code assumes devices
are going to be in sysfs.

In util/pmu.[ch], hide perf_pmu__format_parse but add the eager argument
to perf_pmu__lookup called by perf_pmus__add_test_pmu. Later patches
will eagerly load other non-sysfs files when eager loading is enabled.

In tests/pmu.c, rather than manually constructing a list of term
arguments, just use the term parsing code from a string.

Add more comments and debug logging.

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20240502213507.2339733-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-03 17:08:20 -03:00
Namhyung Kim
3cdd98b42d perf maps: Remove check_invariants() from maps__lock()
I found that the debug build was a slowed down a lot by the maps lock
code since it checks the invariants whenever it gets the pointer to the
lock.  This means it checks twice the invariants before and after the
access.

Instead, let's move the checking code within the lock area but after any
modification and remove it from the read paths.  This would remove (more
than) half of the maps lock overhead.

The time for perf report with a huge data file (200k+ of MMAP2 events).

  Non-debug     Before      After
  ---------   --------   --------
     2m 43s     6m 45s     4m 21s

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240429225738.1491791-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-02 16:35:47 -03:00
Namhyung Kim
b7d4aacfc8 perf annotate-data: Check kind of stack variables
I sometimes see ("unknown type") in the result and it was because it
didn't check the type of stack variables properly during the instruction
tracking.  The stack can carry constant values (without type info) and
if the target instruction is accessing the stack location, it resulted
in the "unknown type".

Maybe we could pick one of integer types for the constant, but it
doesn't really mean anything useful.  Let's just drop the stack slot if
it doesn't have a valid type info.

Here's an example how it got the unknown type.
Note that 0xffffff48 = -0xb8.
  -----------------------------------------------------------
  find data type for 0xffffff48(reg6) at ...
  CU for ...
  frame base: cfa=0 fbreg=6
  scope: [2/2] (die:11cb97f)
  bb: [37 - 3a]
  var [37] reg15 type='int' size=0x4 (die:0x1180633)
  bb: [40 - 4b]
  mov [40] imm=0x1 -> reg13
  var [45] reg8 type='sigset_t*' size=0x8 (die:0x11a39ee)
  mov [45] imm=0x1 -> reg2                     <---  here reg2 has a constant
  bb: [215 - 237]
  mov [218] reg2 -> -0xb8(stack) constant      <---  and save it to the stack
  mov [225] reg13 -> -0xc4(stack) constant
  call [22f] find_task_by_vgpid
  call [22f] return -> reg0 type='struct task_struct*' size=0x8 (die:0x11881e8)
  bb: [5c8 - 5cf]
  bb: [2fb - 302]
  mov [2fb] -0xc4(stack) -> reg13 constant
  bb: [13b - 14d]
  mov [143] 0xd50(reg3) -> reg5 type='struct task_struct*' size=0x8 (die:0xa31f3c)
  bb: [153 - 153]
  chk [153] reg6 offset=0xffffff48 ok=0 kind=0 fbreg    <--- access here
  found by insn track: 0xffffff48(reg6) type-offset=0
   type='G<EF>^K<F6><AF>U' size=0 (die:0xffffffffffffffff)

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240502060011.1838090-7-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-02 11:06:23 -03:00
Namhyung Kim
af89e8f2bd perf annotate-data: Handle multi regs in find_data_type_block()
The instruction tracking should be the same for the both registers.

Just do it once and compare the result with multi regs as with the
previous patches.

Then we don't need to call find_data_type_block() separately for each
reg.

Let's remove the 'reg' argument from the relevant functions.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240502060011.1838090-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-02 11:05:10 -03:00
Namhyung Kim
eba1f853ed perf annotate-data: Check memory access with two registers
The following instruction pattern is used to access a global variable.

  mov     $0x231c0, %rax
  movsql  %edi, %rcx
  mov     -0x7dc94ae0(,%rcx,8), %rcx
  cmpl    $0x0, 0xa60(%rcx,%rax,1)     <<<--- here

The first instruction set the address of the per-cpu variable (here, it
is 'runqueues' of type 'struct rq').  The second instruction seems like
a cpu number of the per-cpu base.  The third instruction get the base
offset of per-cpu area for that cpu.  The last instruction compares the
value of the per-cpu variable at the offset of 0xa60.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240502060011.1838090-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-02 10:54:31 -03:00
Namhyung Kim
4449c9047d perf annotate-data: Handle direct global variable access
Like per-cpu base offset array, sometimes it accesses the global
variable directly using the offset.  Allow this type of instructions as
long as it finds a global variable for the address.

  movslq  %edi, %rcx
  mov     -0x7dc94ae0(,%rcx,8), %rcx   <<<--- here

As %rcx has a valid type (i.e. array index) from the first instruction,
it will be checked by the first case in check_matching_type().  But as
it's not a pointer type, the match will fail.  But in this case, it
should check if it accesses the kernel global array variable.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240502060011.1838090-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-02 10:51:23 -03:00
Namhyung Kim
c1da8411e4 perf annotate-data: Collect global variables in advance
Currently it looks up global variables from the current CU using address
and name.  But it sometimes fails to find a variable as the variable can
come from a different CU - but it's still strange it failed to find a
declaration for some reason.

Anyway, it can collect all global variables from all CU once and then
lookup them later on.  This slightly improves the success rate of my
test data set.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240502060011.1838090-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-02 10:47:52 -03:00
Namhyung Kim
d7b60803a7 perf dwarf-aux: Add die_collect_global_vars()
This function is to search all global variables in the CU.  We want to
have the list of global variables at once and match them later.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240502060011.1838090-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-02 10:45:30 -03:00
Adrian Hunter
e101a05f79 perf intel-pt: Fix unassigned instruction op (discovered by MemorySanitizer)
MemorySanitizer discovered instances where the instruction op value was
not assigned.:

  WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x5581c00a76b3 in intel_pt_sample_flags tools/perf/util/intel-pt.c:1527:17
  Uninitialized value was stored to memory at
    #0 0x5581c005ddf8 in intel_pt_walk_insn tools/perf/util/intel-pt-decoder/intel-pt-decoder.c:1256:25

The op value is used to set branch flags for branch instructions
encountered when walking the code, so fix by setting op to
INTEL_PT_OP_OTHER in other cases.

Fixes: 4c761d805b ("perf intel-pt: Fix intel_pt_fup_event() assumptions about setting state type")
Reported-by: Ian Rogers <irogers@google.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Closes: https://lore.kernel.org/linux-perf-users/20240320162619.1272015-1-irogers@google.com/
Link: https://lore.kernel.org/r/20240326083223.10883-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-26 22:13:10 -03:00
Namhyung Kim
8f3ec810bb perf annotate: Update DSO binary type when trying build-id
dso__disassemble_filename() tries to get the filename for objdump (or
capstone) using build-id.  But I found sometimes it didn't disassemble
some functions.

It turned out that those functions belong to a DSO which has no binary
type set.  It seems it sets the binary type for some special files only
- like kernel (kallsyms or kcore) or BPF images.  And there's a logic to
skip dso with DSO_BINARY_TYPE__NOT_FOUND.

As it's checked the build-id cache link, it should set the binary type
as DSO_BINARY_TYPE__BUILD_ID_CACHE.

Fixes: 873a83731f ("perf annotate: Skip DSOs not found")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240425005157.1104789-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-26 22:12:55 -03:00
Namhyung Kim
f35847de2a perf annotate: Fallback disassemble to objdump when capstone fails
I found some cases that capstone failed to disassemble.  Probably my
capstone is an old version but anyway there's a chance it can fail.  And
then it silently stopped in the middle.  In my case, it didn't
understand "RDPKRU" instruction.

Let's check if the capstone disassemble reached the end of the function
and fallback to objdump if not.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240425005157.1104789-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-26 22:07:21 -03:00
Namhyung Kim
47557db99a perf annotate-data: Check if 'struct annotation_source' was allocated on 'perf report' TUI
As it removed the sample accounting for code when no symbol sort key is
given for 'perf report' TUI, it might not have allocated the
'struct annotated_source' yet.  Let's check if it's NULL first.

Fixes: 6cdd977ec2 ("perf report: Do not collect sample histogram unnecessarily")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240424230015.1054013-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-26 22:07:21 -03:00
Ian Rogers
bb65ff7810 perf parse-events: Tidy the setting of the default event name
Add comments. Pass ownership of the event name to save on a strdup.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Beeman Strong <beeman@rivosinc.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240416061533.921723-17-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-26 22:07:21 -03:00
Ian Rogers
afd876bbdc perf parse-events: Minor grouping tidy up
Add comments. Ensure leader->group_name is freed before overwriting
it.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Beeman Strong <beeman@rivosinc.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240416061533.921723-16-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-26 22:07:20 -03:00
Ian Rogers
4a20e79365 perf parse-event: Constify event_symbol arrays
Moves 352 bytes from .data to .data.rel.ro.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Beeman Strong <beeman@rivosinc.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240416061533.921723-15-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-26 22:07:20 -03:00
Ian Rogers
e30a7912f4 perf parse-events: Improvements to modifier parsing
Use a struct/bitmap rather than a copied string from lexer.

In lexer give improved error message when too many precise flags are
given or repeated modifiers.

Before:

  $ perf stat -e 'cycles:kuk' true
  event syntax error: 'cycles:kuk'
                              \___ Bad modifier
  ...
  $ perf stat -e 'cycles:pppp' true
  event syntax error: 'cycles:pppp'
                              \___ Bad modifier
  ...
  $ perf stat -e '{instructions:p,cycles:pp}:pp' -a true
  event syntax error: '..cycles:pp}:pp'
                                    \___ Bad modifier
  ...

After:

  $ perf stat -e 'cycles:kuk' true
  event syntax error: 'cycles:kuk'
                                \___ Duplicate modifier 'k' (kernel)
  ...
  $ perf stat -e 'cycles:pppp' true
  event syntax error: 'cycles:pppp'
                                 \___ Maximum precise value is 3
  ...
  $ perf stat -e '{instructions:p,cycles:pp}:pp' true
  event syntax error: '..cycles:pp}:pp'
                                    \___ Maximum combined precise value is 3, adding precision to "cycles:pp"
  ...

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Beeman Strong <beeman@rivosinc.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240416061533.921723-14-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-26 22:07:20 -03:00
Ian Rogers
e18601d80c perf parse-events: Inline parse_events_evlist_error
Inline parse_events_evlist_error that is only used in
parse_events_error. Modify parse_events_error to not report a parser
error unless errors haven't already been reported. Make it clearer
that the latter case only happens for unrecognized input.

Before:

  $ perf stat -e 'cycles/period=99999999999999999999/' true
  event syntax error: 'cycles/period=99999999999999999999/'
                                    \___ parser error

  event syntax error: '..les/period=99999999999999999999/'
                                    \___ Bad base 10 number "99999999999999999999"
  Run 'perf list' for a list of valid events

   Usage: perf stat [<options>] [<command>]

      -e, --event <event>   event selector. use 'perf list' to list available events
  $ perf stat -e 'cycles:xyz' true
  event syntax error: 'cycles:xyz'
                             \___ parser error
  Run 'perf list' for a list of valid events

   Usage: perf stat [<options>] [<command>]

      -e, --event <event>   event selector. use 'perf list' to list available events

After:

  $ perf stat -e 'cycles/period=99999999999999999999/xyz' true
  event syntax error: '..les/period=99999999999999999999/xyz'
                                    \___ Bad base 10 number "99999999999999999999"
  Run 'perf list' for a list of valid events

   Usage: perf stat [<options>] [<command>]

      -e, --event <event>   event selector. use 'perf list' to list available events
  $ perf stat -e 'cycles:xyz' true
  event syntax error: 'cycles:xyz'
                             \___ Unrecognized input
  Run 'perf list' for a list of valid events

   Usage: perf stat [<options>] [<command>]

      -e, --event <event>   event selector. use 'perf list' to list available events

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Beeman Strong <beeman@rivosinc.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240416061533.921723-13-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-26 22:07:20 -03:00
Ian Rogers
ba5c371edf perf parse-events: Improve error message for bad numbers
Use the error handler from the parse_state to give a more informative
error message.

Before:

  $ perf stat -e 'cycles/period=99999999999999999999/' true
  event syntax error: 'cycles/period=99999999999999999999/'
                                    \___ parser error
  Run 'perf list' for a list of valid events

   Usage: perf stat [<options>] [<command>]

      -e, --event <event>   event selector. use 'perf list' to list available events

After:

  $ perf stat -e 'cycles/period=99999999999999999999/' true
  event syntax error: 'cycles/period=99999999999999999999/'
                                    \___ parser error

  event syntax error: '..les/period=99999999999999999999/'
                                    \___ Bad base 10 number "99999999999999999999"
  Run 'perf list' for a list of valid events

   Usage: perf stat [<options>] [<command>]

      -e, --event <event>   event selector. use 'perf list' to list available events

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Beeman Strong <beeman@rivosinc.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240416061533.921723-12-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-26 22:07:20 -03:00
Ian Rogers
4e5484b4bf perf parse-events: Inline parse_events_update_lists
The helper function just wraps a splice and free. Making the free
inline removes a comment, so then it just wraps a splice which we can
make inline too.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Beeman Strong <beeman@rivosinc.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240416061533.921723-11-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-26 22:07:20 -03:00
Ian Rogers
617824a7f0 perf parse-events: Prefer sysfs/JSON hardware events over legacy
It was requested that RISC-V be able to add events to the perf tool so
the PMU driver didn't need to map legacy events to config encodings:
https://lore.kernel.org/lkml/20240217005738.3744121-1-atishp@rivosinc.com/

This change makes the priority of events specified without a PMU the
same as those specified with a PMU, namely sysfs and JSON events are
checked first before using the legacy encoding.

The hw_term is made more generic as a hardware_event that encodes a
pair of string and int value, allowing parse_events_multi_pmu_add to
fall back on a known encoding when the sysfs/JSON adding fails for
core events. As this covers PE_VALUE_SYM_HW, that token is removed and
related code simplified.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Beeman Strong <beeman@rivosinc.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-26 22:07:20 -03:00
Ian Rogers
5ccc4edfc2 perf parse-events: Constify parse_events_add_numeric
Allow the term list to be const so that other functions can pass const
term lists. Add const as necessary to called functions.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Beeman Strong <beeman@rivosinc.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240416061533.921723-9-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-26 22:07:20 -03:00
Ian Rogers
9d0dba2398 perf parse-events: Handle PE_TERM_HW in name_or_raw
Avoid duplicate logic for name_or_raw and PE_TERM_HW by having a rule
to turn PE_TERM_HW into a name_or_raw.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Beeman Strong <beeman@rivosinc.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240416061533.921723-8-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-26 22:07:20 -03:00
Ian Rogers
62593394f6 perf parse-events: Legacy cache names on all PMUs and lower priority
Prior behavior is to not look for legacy cache names in sysfs/JSON and
to create events on all core PMUs. New behavior is to look for
sysfs/JSON events first on all PMUs, for core PMUs add a legacy event
if the sysfs/JSON event isn't present.

This is done so that there is consistency with how event names in
terms are handled and their prioritization of sysfs/JSON over
legacy. It may make sense to use a legacy cache event name as an event
name on a non-core PMU so we should allow it.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Beeman Strong <beeman@rivosinc.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240416061533.921723-7-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-26 22:07:20 -03:00
Ian Rogers
f91fa2ae63 perf pmu: Refactor perf_pmu__match()
Move all implementation to pmu code. Don't allocate a fnmatch wildcard
pattern, matching ignoring the suffix already handles this, and only
use fnmatch if the given PMU name has a '*' in it.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Beeman Strong <beeman@rivosinc.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240416061533.921723-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-26 22:07:20 -03:00
Ian Rogers
90b2c210a5 perf parse-events: Avoid copying an empty list
In parse_events_add_pmu, delay copying the list of terms until it is
known the list contains terms.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Beeman Strong <beeman@rivosinc.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240416061533.921723-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-26 22:07:20 -03:00
Ian Rogers
63dfcde977 perf parse-events: Directly pass PMU to parse_events_add_pmu()
Avoid passing the name of a PMU then finding it again, just directly
pass the PMU. parse_events_multi_pmu_add_or_add_pmu() is the only version
that needs to find a PMU, so move the find there. Remove the error
message as parse_events_multi_pmu_add_or_add_pmu will given an error at
the end when a name isn't either a PMU name or event name. Without the
error message being created the location in the input parameter (loc)
can be removed.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Beeman Strong <beeman@rivosinc.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240416061533.921723-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-26 22:07:20 -03:00
Ian Rogers
8b734eaa98 perf parse-events: Factor out '<event_or_pmu>/.../' parsing
Factor out the case of an event or PMU name followed by a slash based
term list. This is with a view to sharing the code with new legacy
hardware parsing. Use early return to reduce indentation in the code.
Make parse_events_add_pmu static now it doesn't need sharing with
parse-events.y.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Beeman Strong <beeman@rivosinc.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240416061533.921723-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-26 22:07:20 -03:00
Jakub Kicinski
2bd87951de Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

drivers/net/ethernet/ti/icssg/icssg_prueth.c

net/mac80211/chan.c
  89884459a0 ("wifi: mac80211: fix idle calculation with multi-link")
  87f5500285 ("wifi: mac80211: simplify ieee80211_assign_link_chanctx()")
https://lore.kernel.org/all/20240422105623.7b1fbda2@canb.auug.org.au/

net/unix/garbage.c
  1971d13ffa ("af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().")
  4090fa373f ("af_unix: Replace garbage collection algorithm.")

drivers/net/ethernet/ti/icssg/icssg_prueth.c
drivers/net/ethernet/ti/icssg/icssg_common.c
  4dcd0e83ea ("net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns()")
  e2dc7bfd67 ("net: ti: icssg-prueth: Move common functions into a separate file")

No adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-25 12:41:37 -07:00
Arnaldo Carvalho de Melo
173b0b5b0e Merge remote-tracking branch 'torvalds/master' into perf-tools-next
To pick up fixes sent via perf-tools, by Namhyung Kim.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-22 13:35:18 -03:00
Dima Kogan
c15ed44429 perf probe-event: Better error message for a too-long probe name
This is a common failure mode when probing userspace C++ code (where the
mangling adds significant length to the symbol names).

Prior to this patch, only a very generic error message is produced,
making the user guess at what the issue is.

Signed-off-by: Dima Kogan <dima@secretsauce.net>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20240416045533.162692-3-dima@secretsauce.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-18 22:22:51 -03:00
Dima Kogan
a529bec023 perf probe-event: Un-hardcode sizeof(buf)
In several places we had

  char buf[64];
  ...
  snprintf(buf, 64, ...);

This patch changes it to

  char buf[64];
  ...
  snprintf(buf, sizeof(buf), ...);

so the "64" is only stated once.

Signed-off-by: Dima Kogan <dima@secretsauce.net>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20240416045533.162692-2-dima@secretsauce.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-18 22:22:51 -03:00
Weilin Wang
03f2357017 perf stat: Add new field in stat_config to enable hardware aware grouping
Hardware counter and event information could be used to help creating event
groups that better utilize hardware counters and improve multiplexing.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Weilin Wang <weilin.wang@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Link: https://lore.kernel.org/r/20240412210756.309828-2-weilin.wang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-18 22:22:51 -03:00
Chen Pei
b828a23a75 perf genelf: Fix compiling with libelf on rv32
When cross-compiling perf with libelf, the following error occurred:

	In file included from tests/genelf.c:14:
	tests/../util/genelf.h:50:2: error: #error "unsupported architecture"
	50 | #error "unsupported architecture"
		|  ^~~~~
	tests/../util/genelf.h:59:5: warning: "GEN_ELF_CLASS" is not defined, evaluates to 0 [-Wundef]
	59 | #if GEN_ELF_CLASS == ELFCLASS64

Fix this by adding GEN-ELF-ARCH and GEN-ELF-CLASS definitions for rv32.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Chen Pei <cp0613@linux.alibaba.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Link: https://lore.kernel.org/r/20240415095532.4930-1-cp0613@linux.alibaba.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-18 22:22:51 -03:00
Namhyung Kim
7043dc5286 perf report: Add weight[123] output fields
Add weight1, weight2 and weight3 fields to -F/--fields and their aliases
like 'ins_lat', 'p_stage_cyc' and 'retire_lat'.  Note that they are in
the sort keys too but the difference is that output fields will sum up
the weight values and display the average.

In the sort key, users can see the distribution of weight value and I
think it's confusing we have local vs. global weight for the same weight.

For example, I experiment with mem-loads events to get the weights.  On
my laptop, it seems only weight1 field is supported.

  $ perf mem record -- perf test -w noploop

Let's look at the noploop function only.  It has 7 samples.

  $ perf script -F event,ip,sym,weight | grep noploop
  # event                         weight     ip           sym
  cpu/mem-loads,ldlat=30/P:           43     55b3c122bffc noploop
  cpu/mem-loads,ldlat=30/P:           48     55b3c122bffc noploop
  cpu/mem-loads,ldlat=30/P:           38     55b3c122bffc noploop    <--- same weight
  cpu/mem-loads,ldlat=30/P:           38     55b3c122bffc noploop    <--- same weight
  cpu/mem-loads,ldlat=30/P:           59     55b3c122bffc noploop
  cpu/mem-loads,ldlat=30/P:           33     55b3c122bffc noploop
  cpu/mem-loads,ldlat=30/P:           38     55b3c122bffc noploop    <--- same weight

When you use the 'weight' sort key, it'd show entries with a separate
weight value separately.  Also note that the first entry has 3 samples
with weight value 38, so they are displayed together and the weight
value is the sum of 3 samples (114 = 38 * 3).

  $ perf report -n -s +weight | grep -e Weight -e noploop
  # Overhead  Samples  Command   Shared Object   Symbol         Weight
       0.53%        3     perf   perf            [.] noploop    114
       0.18%        1     perf   perf            [.] noploop    59
       0.18%        1     perf   perf            [.] noploop    48
       0.18%        1     perf   perf            [.] noploop    43
       0.18%        1     perf   perf            [.] noploop    33

If you use 'local_weight' sort key, you can see the actual weight.

  $ perf report -n -s +local_weight | grep -e Weight -e noploop
  # Overhead  Samples  Command   Shared Object   Symbol         Local Weight
       0.53%        3     perf   perf            [.] noploop    38
       0.18%        1     perf   perf            [.] noploop    59
       0.18%        1     perf   perf            [.] noploop    48
       0.18%        1     perf   perf            [.] noploop    43
       0.18%        1     perf   perf            [.] noploop    33

But when you use the -F/--field option instead, you can see the average
weight for the while noploop function (as it won't group samples by
weight value and use the default 'comm,dso,sym' sort keys).

  $ perf report -n -F +weight | grep -e Weight -e noploop
  Warning:
  --fields weight shows the average value unlike in the --sort key.
  # Overhead  Samples  Weight1  Command  Shared Object  Symbol
       1.23%        7     42.4  perf     perf           [.] noploop

The weight1 field shows the average value:
  (38 * 3 + 59 + 48 + 43 + 33) / 7 = 42.4

Also it'd show the warning that 'weight' field has the average value.
Using 'weight1' can remove the warning.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240411181718.2367948-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-17 12:21:39 -03:00
Namhyung Kim
6fcf1e6525 perf hist: Add weight fields to hist entry stats
Like period and sample numbers, it'd be better to track weight values
and display them in the output rather than having them as sort keys.

This patch just adds a few more fields to save the weights in a hist
entry.  It'll be displayed as new output fields in the later patch.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240411181718.2367948-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-17 12:21:39 -03:00
Namhyung Kim
0993d72467 perf hist: Move histogram related code to hist.h
It's strange that sort.h has the definition of struct hist_entry.  As
sort.h already includes hist.h, let's move the data structure to hist.h.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240411181718.2367948-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-17 12:21:39 -03:00
Namhyung Kim
a5a00497b9 perf annotate-data: Handle RSP if it's not the FB register
In some cases, the stack pointer on x86 (rsp = reg7) is used to point
variables on stack but it's not the frame base register.  Then it
should handle the register like normal registers (IOW not to access
the other stack variables using offset calculation) but it should not
assume it would have a pointer.

Before:
  -----------------------------------------------------------
  find data type for 0x7c(reg7) at tcp_getsockopt+0xb62
  CU for net/ipv4/tcp.c (die:0x7b5f516)
  frame base: cfa=0 fbreg=6
  no pointer or no type
  check variable "zc" failed (die: 0x7b9580a)
   variable location: base=reg7, offset=0x40
   type='struct tcp_zerocopy_receive' size=0x40 (die:0x7b947f4)

After:
  -----------------------------------------------------------
  find data type for 0x7c(reg7) at tcp_getsockopt+0xb62
  CU for net/ipv4/tcp.c (die:0x7b5f516)
  frame base: cfa=0 fbreg=6
  found "zc" in scope=3/3 (die: 0x7b957fc) type_offset=0x3c
   variable location: base=reg7, offset=0x40
   type='struct tcp_zerocopy_receive' size=0x40 (die:0x7b947f4)

Note that the type-offset was properly calculated to 0x3c as the
variable starts at 0x40.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240412183310.2518474-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-16 10:46:55 -03:00
Namhyung Kim
0519fadbbe perf dwarf-aux: Check variable address range properly
In match_var_offset(), it just checked the end address of the variable
with the given offset because it assumed the register holds a pointer
to the data type and the offset starts from the base.

But I found some cases that the stack pointer (rsp = reg7) register is
used to pointer a stack variable while the frame base is maintained by a
different register (rbp = reg6).  In that case, it cannot simply use the
stack pointer as it cannot guarantee that it points to the frame base.
So it needs to check both boundaries of the variable location.

Before:
  -----------------------------------------------------------
  find data type for 0x7c(reg7) at tcp_getsockopt+0xb62
  CU for net/ipv4/tcp.c (die:0x7b5f516)
  frame base: cfa=0 fbreg=6
  no pointer or no type
  check variable "tss" failed (die: 0x7b95801)
   variable location: base reg7, offset=0x110
   type='struct scm_timestamping_internal' size=0x30 (die:0x7b8c126)

So the current code just checks register number for the non-PC and
non-FB registers and assuming it has offset 0.  But this variable has
offset 0x110 so it should not match to this.

After:
  -----------------------------------------------------------
  find data type for 0x7c(reg7) at tcp_getsockopt+0xb62
  CU for net/ipv4/tcp.c (die:0x7b5f516)
  frame base: cfa=0 fbreg=6
  no pointer or no type
  check variable "zc" failed (die: 0x7b9580a)
   variable location: base=reg7, offset=0x40
   type='struct tcp_zerocopy_receive' size=0x40 (die:7b947f4)

Now it find the correct variable "zc".  It was located at reg7 + 0x40
and the size if 0x40 which means it should cover [0x40, 0x80).  And the
access was for reg7 + 0x7c so it found the right one.  But it still
failed to use the variable and it would be handled in the next patch.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240412183310.2518474-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-16 10:46:55 -03:00
Namhyung Kim
645af3fb62 perf dwarf-aux: Check pointer offset when checking variables
In match_var_offset(), it checks the offset range with the target type
only for non-pointer types.  But it also needs to check the pointer
types with the target type.

This is because there can be more than one pointer variable located in
the same register.  Let's look at the following example.  It's looking
up a variable for reg3 at tcp_get_info+0x62.  It found "sk" variable but
it wasn't the right one since it accesses beyond the target type (struct
'sock' in this case) size.

  -----------------------------------------------------------
  find data type for 0x7bc(reg3) at tcp_get_info+0x62
  CU for net/ipv4/tcp.c (die:0x7b5f516)
  frame base: cfa=0 fbreg=6
  offset: 1980 is bigger than size: 760
  check variable "sk" failed (die: 0x7b92b2c)
   variable location: reg3
   type='struct sock' size=0x2f8 (die:0x7b63c3a)

Actually there was another variable "tp" in the function and it's
located at the same (reg3) because it's just type-casted like below.

  void tcp_get_info(struct sock *sk, struct tcp_info *info)
  {
      const struct tcp_sock *tp = tcp_sk(sk);
      ...

The 'struct tcp_sock' contains the 'struct sock' at offset 0 so it can
just use the same address as a pointer to tcp_sock.  That means it
should match variables correctly by checking the offset and size.
Actually it cannot distinguish if the offset was smaller than the size
of the original struct sock.  But I think it's fine as they are the same
at that part.

So let's check the target type size and retry if it doesn't match.
Now it succeeded to find the correct variable.

  -----------------------------------------------------------
  find data type for 0x7bc(reg3) at tcp_get_info+0x62
  CU for net/ipv4/tcp.c (die:0x7b5f516)
  frame base: cfa=0 fbreg=6
  found "tp" in scope=1/1 (die: 0x7b92b16) type_offset=0x7bc
   variable location: reg3
   type='struct tcp_sock' size=0xa68 (die:0x7b81380)

Fixes: bc10db8eb8 ("perf annotate-data: Support stack variables")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240412183310.2518474-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-16 10:46:55 -03:00
Namhyung Kim
2bc3cf575a perf annotate-data: Improve debug message with location info
To verify it found the correct variable, let's add the location
expression to the debug message.

  $ perf --debug type-profile annotate --data-type
  ...
  -----------------------------------------------------------
  find data type for 0xaf0(reg15) at schedule+0xeb
  CU for kernel/sched/core.c (die:0x1180523)
  frame base: cfa=0 fbreg=6
  found "rq" in scope=3/4 (die: 0x11b6a00) type_offset=0xaf0
   variable location: reg15
   type='struct rq' size=0xfc0 (die:0x11892e2)
  -----------------------------------------------------------
  find data type for 0x7bc(reg3) at tcp_get_info+0x62
  CU for net/ipv4/tcp.c (die:0x7b5f516)
  frame base: cfa=0 fbreg=6
  offset: 1980 is bigger than size: 760
  check variable "sk" failed (die: 0x7b92b2c)
   variable location: reg3
   type='struct sock' size=0x2f8 (die:0x7b63c3a)
  -----------------------------------------------------------
  ...

The first case is fine.  It looked up a data type in r15 with offset of
0xaf0 at schedule+0xeb.  It found the CU die and the frame base info and
the variable "rq" was found in the scope 3/4.  Its location is the r15
register and the type size is 0xfc0 which includes 0xaf0.

But the second case is not good.  It looked up a data type in rbx (reg3)
with offset 0x7bc.  It found a CU and the frame base which is good so
far.  And it also found a variable "sk" but the access offset is bigger
than the type size (1980 vs. 760 or 0x7bc vs. 0x2f8).  The variable has
the right location (reg3) but I need to figure out why it accesses
beyond what it's supposed to.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240412183310.2518474-2-namhyung@kernel.org
[ Fix the build on 32-bit by casting Dwarf_Word to (long) in pr_debug_location() ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-16 10:46:08 -03:00
Ian Rogers
988052f4bf perf bench uprobe: Add uretprobe variant of uprobe benchmarks
Name benchmarks with _ret at the end to avoid creating a new set of
benchmarks.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrei Vagin <avagin@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kees Kook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240406040911.1603801-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 17:54:02 -03:00
Ian Rogers
61ff60aab7 perf util: Add shellcheck to generate-cmdlist.sh
Add shellcheck to generate-cmdlist.sh to avoid basic shell script
mistakes.

Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20240409023216.2342032-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 17:54:02 -03:00
Ian Rogers
0ffc8fca5c perf dsos: Switch more loops to dsos__for_each_dso()
Switch loops within dsos.c, add a version that isn't locked. Switch
some unlocked loops to hold the read lock.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Chengen Du <chengen.du@canonical.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Li Dong <lidong@vivo.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Markus Elfring <Markus.Elfring@web.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paran Lee <p4ranlee@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com>
Link: https://lore.kernel.org/r/20240410064214.2755936-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 12:04:13 -03:00
Ian Rogers
1d6eff9305 perf dso: Move dso functions out of dsos.c
Move dso and dso_id functions to dso.c to match the struct declarations.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Chengen Du <chengen.du@canonical.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Li Dong <lidong@vivo.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Markus Elfring <Markus.Elfring@web.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paran Lee <p4ranlee@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com>
Link: https://lore.kernel.org/r/20240410064214.2755936-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 12:04:13 -03:00
Ian Rogers
73f3fea2e1 perf dsos: Introduce dsos__for_each_dso()
To better abstract the dsos internals, introduce dsos__for_each_dso that
does a callback on each dso.

This also means the read lock can be correctly held.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Chengen Du <chengen.du@canonical.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Li Dong <lidong@vivo.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Markus Elfring <Markus.Elfring@web.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paran Lee <p4ranlee@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com>
Link: https://lore.kernel.org/r/20240410064214.2755936-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 12:04:13 -03:00
Ian Rogers
f649ed80f3 perf dsos: Tidy reference counting and locking
Move more functionality into dsos.c generally from machine.c, renaming
functions to match their new usage.

The find function is made to always "get" before returning a dso.

Reduce the scope of locks in vdso to match the locking paradigm.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Chengen Du <chengen.du@canonical.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Li Dong <lidong@vivo.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Markus Elfring <Markus.Elfring@web.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paran Lee <p4ranlee@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com>
Link: https://lore.kernel.org/r/20240410064214.2755936-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 12:04:13 -03:00
Ian Rogers
83acca9f90 perf dsos: Attempt to better abstract DSOs internals
Move functions from machine and build-id to dsos. Pass 'struct dsos'
rather than internal state.

Rename some functions to better represent which data structure they
operate on.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Chengen Du <chengen.du@canonical.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Li Dong <lidong@vivo.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Markus Elfring <Markus.Elfring@web.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paran Lee <p4ranlee@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com>
Link: https://lore.kernel.org/r/20240410064214.2755936-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12 12:04:13 -03:00