Pull networking updates from Jakub Kicinski:
"Core
----
- Defer freeing TCP skbs to the BH handler, whenever possible, or at
least perform the freeing outside of the socket lock section to
decrease cross-CPU allocator work and improve latency.
- Add netdevice refcount tracking to locate sources of netdevice and
net namespace refcount leaks.
- Make Tx watchdog less intrusive - avoid pausing Tx and restarting
all queues from a single CPU removing latency spikes.
- Various small optimizations throughout the stack from Eric Dumazet.
- Make netdev->dev_addr[] constant, force modifications to go via
appropriate helpers to allow us to keep addresses in ordered data
structures.
- Replace unix_table_lock with per-hash locks, improving performance
of bind() calls.
- Extend skb drop tracepoint with a drop reason.
- Allow SO_MARK and SO_PRIORITY setsockopt under CAP_NET_RAW.
BPF
---
- New helpers:
- bpf_find_vma(), find and inspect VMAs for profiling use cases
- bpf_loop(), runtime-bounded loop helper trading some execution
time for much faster (if at all converging) verification
- bpf_strncmp(), improve performance, avoid compiler flakiness
- bpf_get_func_arg(), bpf_get_func_ret(), bpf_get_func_arg_cnt()
for tracing programs, all inlined by the verifier
- Support BPF relocations (CO-RE) in the kernel loader.
- Further the support for BTF_TYPE_TAG annotations.
- Allow access to local storage in sleepable helpers.
- Convert verifier argument types to a composable form with different
attributes which can be shared across types (ro, maybe-null).
- Prepare libbpf for upcoming v1.0 release by cleaning up APIs,
creating new, extensible ones where missing and deprecating those
to be removed.
Protocols
---------
- WiFi (mac80211/cfg80211):
- notify user space about long "come back in N" AP responses,
allow it to react to such temporary rejections
- allow non-standard VHT MCS 10/11 rates
- use coarse time in airtime fairness code to save CPU cycles
- Bluetooth:
- rework of HCI command execution serialization to use a common
queue and work struct, and improve handling errors reported in
the middle of a batch of commands
- rework HCI event handling to use skb_pull_data, avoiding packet
parsing pitfalls
- support AOSP Bluetooth Quality Report
- SMC:
- support net namespaces, following the RDMA model
- improve connection establishment latency by pre-clearing buffers
- introduce TCP ULP for automatic redirection to SMC
- Multi-Path TCP:
- support ioctls: SIOCINQ, OUTQ, and OUTQNSD
- support socket options: IP_TOS, IP_FREEBIND, IP_TRANSPARENT,
IPV6_FREEBIND, and IPV6_TRANSPARENT, TCP_CORK and TCP_NODELAY
- support cmsgs: TCP_INQ
- improvements in the data scheduler (assigning data to subflows)
- support fastclose option (quick shutdown of the full MPTCP
connection, similar to TCP RST in regular TCP)
- MCTP (Management Component Transport) over serial, as defined by
DMTF spec DSP0253 - "MCTP Serial Transport Binding".
Driver API
----------
- Support timestamping on bond interfaces in active/passive mode.
- Introduce generic phylink link mode validation for drivers which
don't have any quirks and where MAC capability bits fully express
what's supported. Allow PCS layer to participate in the validation.
Convert a number of drivers.
- Add support to set/get size of buffers on the Rx rings and size of
the tx copybreak buffer via ethtool.
- Support offloading TC actions as first-class citizens rather than
only as attributes of filters, improve sharing and device resource
utilization.
- WiFi (mac80211/cfg80211):
- support forwarding offload (ndo_fill_forward_path)
- support for background radar detection hardware
- SA Query Procedures offload on the AP side
New hardware / drivers
----------------------
- tsnep - FPGA based TSN endpoint Ethernet MAC used in PLCs with
real-time requirements for isochronous communication with protocols
like OPC UA Pub/Sub.
- Qualcomm BAM-DMUX WWAN - driver for data channels of modems
integrated into many older Qualcomm SoCs, e.g. MSM8916 or MSM8974
(qcom_bam_dmux).
- Microchip LAN966x multi-port Gigabit AVB/TSN Ethernet Switch driver
with support for bridging, VLANs and multicast forwarding
(lan966x).
- iwlmei driver for co-operating between Intel's WiFi driver and
Intel's Active Management Technology (AMT) devices.
- mse102x - Vertexcom MSE102x Homeplug GreenPHY chips
- Bluetooth:
- MediaTek MT7921 SDIO devices
- Foxconn MT7922A
- Realtek RTL8852AE
Drivers
-------
- Significantly improve performance in the datapaths of: lan78xx,
ax88179_178a, lantiq_xrx200, bnxt.
- Intel Ethernet NICs:
- igb: support PTP/time PEROUT and EXTTS SDP functions on
82580/i354/i350 adapters
- ixgbevf: new PF -> VF mailbox API which avoids the risk of
mailbox corruption with ESXi
- iavf: support configuration of VLAN features of finer
granularity, stacked tags and filtering
- ice: PTP support for new E822 devices with sub-ns precision
- ice: support firmware activation without reboot
- Mellanox Ethernet NICs (mlx5):
- expose control over IRQ coalescing mode (CQE vs EQE) via ethtool
- support TC forwarding when tunnel encap and decap happen between
two ports of the same NIC
- dynamically size and allow disabling various features to save
resources for running in embedded / SmartNIC scenarios
- Broadcom Ethernet NICs (bnxt):
- use page frag allocator to improve Rx performance
- expose control over IRQ coalescing mode (CQE vs EQE) via ethtool
- Other Ethernet NICs:
- amd-xgbe: add Ryzen 6000 (Yellow Carp) Ethernet support
- Microsoft cloud/virtual NIC (mana):
- add XDP support (PASS, DROP, TX)
- Mellanox Ethernet switches (mlxsw):
- initial support for Spectrum-4 ASICs
- VxLAN with IPv6 underlay
- Marvell Ethernet switches (prestera):
- support flower flow templates
- add basic IP forwarding support
- NXP embedded Ethernet switches (ocelot & felix):
- support Per-Stream Filtering and Policing (PSFP)
- enable cut-through forwarding between ports by default
- support FDMA to improve packet Rx/Tx to CPU
- Other embedded switches:
- hellcreek: improve trapping management (STP and PTP) packets
- qca8k: support link aggregation and port mirroring
- Qualcomm 802.11ax WiFi (ath11k):
- qca6390, wcn6855: enable 802.11 power save mode in station mode
- BSS color change support
- WCN6855 hw2.1 support
- 11d scan offload support
- scan MAC address randomization support
- full monitor mode, only supported on QCN9074
- qca6390/wcn6855: report signal and tx bitrate
- qca6390: rfkill support
- qca6390/wcn6855: regdb.bin support
- Intel WiFi (iwlwifi):
- support SAR GEO Offset Mapping (SGOM) and Time-Aware-SAR (TAS)
in cooperation with the BIOS
- support for Optimized Connectivity Experience (OCE) scan
- support firmware API version 68
- lots of preparatory work for the upcoming Bz device family
- MediaTek WiFi (mt76):
- Specific Absorption Rate (SAR) support
- mt7921: 160 MHz channel support
- RealTek WiFi (rtw88):
- Specific Absorption Rate (SAR) support
- scan offload
- Other WiFi NICs
- ath10k: support fetching (pre-)calibration data from nvmem
- brcmfmac: configure keep-alive packet on suspend
- wcn36xx: beacon filter support"
* tag '5.17-net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2048 commits)
tcp: tcp_send_challenge_ack delete useless param `skb`
net/qla3xxx: Remove useless DMA-32 fallback configuration
rocker: Remove useless DMA-32 fallback configuration
hinic: Remove useless DMA-32 fallback configuration
lan743x: Remove useless DMA-32 fallback configuration
net: enetc: Remove useless DMA-32 fallback configuration
cxgb4vf: Remove useless DMA-32 fallback configuration
cxgb4: Remove useless DMA-32 fallback configuration
cxgb3: Remove useless DMA-32 fallback configuration
bnx2x: Remove useless DMA-32 fallback configuration
et131x: Remove useless DMA-32 fallback configuration
be2net: Remove useless DMA-32 fallback configuration
vmxnet3: Remove useless DMA-32 fallback configuration
bna: Simplify DMA setting
net: alteon: Simplify DMA setting
myri10ge: Simplify DMA setting
qlcnic: Simplify DMA setting
net: allwinner: Fix print format
page_pool: remove spinlock in page_pool_refill_alloc_cache()
amt: fix wrong return type of amt_send_membership_update()
...
This reverts commit 08efcb4a63.
This breaks the build as it will prefer using libbpf-devel header files,
even when not using LIBBPF_DYNAMIC=1, breaking the build.
This was detected on OpenSuSE Tumbleweed with libtraceevent-devel 1.3.0,
as described by Jiri Slaby:
=======================================================================
It breaks build with LIBTRACEEVENT_DYNAMIC and version 1.3.0:
> util/debug.c: In function ‘perf_debug_option’:
> util/debug.c:243:17: error: implicit declaration of function
‘tep_set_loglevel’ [-Werror=implicit-function-declaration]
> 243 | tep_set_loglevel(TEP_LOG_INFO);
> | ^~~~~~~~~~~~~~~~
> util/debug.c:243:34: error: ‘TEP_LOG_INFO’ undeclared (first use in this
function); did you mean ‘TEP_PRINT_INFO’?
> 243 | tep_set_loglevel(TEP_LOG_INFO);
> | ^~~~~~~~~~~~
> | TEP_PRINT_INFO
> util/debug.c:243:34: note: each undeclared identifier is reported only once
for each function it appears in
> util/debug.c:245:34: error: ‘TEP_LOG_DEBUG’ undeclared (first use in this
function)
> 245 | tep_set_loglevel(TEP_LOG_DEBUG);
> | ^~~~~~~~~~~~~
> util/debug.c:247:34: error: ‘TEP_LOG_ALL’ undeclared (first use in this
function)
> 247 | tep_set_loglevel(TEP_LOG_ALL);
> | ^~~~~~~~~~~
It is because the gcc's command line looks like:
gcc
...
-I/home/abuild/rpmbuild/BUILD/tools/lib/
...
-DLIBTRACEEVENT_VERSION=65790
...
=======================================================================
The proper way to fix this is more involved and so not suitable for this
late in the 5.16-rc stage.
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Link: https://lore.kernel.org/lkml/bc2b0786-8965-1bcd-2316-9d9bb37b9c31@kernel.org
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: https://lore.kernel.org/lkml/YddGjjmlMZzxUZbN@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When the following command is executed several times, a coredump file is
generated.
$ timeout -k 9 5 perf top -e task-clock
*******
*******
*******
0.01% [kernel] [k] __do_softirq
0.01% libpthread-2.28.so [.] __pthread_mutex_lock
0.01% [kernel] [k] __ll_sc_atomic64_sub_return
double free or corruption (!prev) perf top --sort comm,dso
timeout: the monitored command dumped core
When we terminate "perf top" using sending signal method,
SLsmg_reset_smg() called. SLsmg_reset_smg() resets the SLsmg screen
management routines by freeing all memory allocated while it was active.
However SLsmg_reinit_smg() maybe be called by another thread.
SLsmg_reinit_smg() will free the same memory accessed by
SLsmg_reset_smg(), thus it results in a double free.
SLsmg_reinit_smg() is called already protected by ui__lock, so we fix
the problem by adding pthread_mutex_trylock of ui__lock when calling
SLsmg_reset_smg().
Signed-off-by: Wenyu Liu <liuwenyu7@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: wuxu.wu@huawei.com
Link: http://lore.kernel.org/lkml/a91e3943-7ddc-f5c0-a7f5-360f073c20e6@huawei.com
Signed-off-by: Hewenliang <hewenliang4@huawei.com>
Signed-off-by: yaowenbin <yaowenbin1@huawei.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit 0e0ae87422 ("perf list: Display hybrid PMU events with cpu
type") changes the event list for uncore PMUs or arm64 heterogeneous CPU
systems, such that duplicate aliases are incorrectly listed per PMU
(which they should not be), like:
# perf list
...
unc_cbo_cache_lookup.any_es
[Unit: uncore_cbox L3 Lookup any request that access cache and found
line in E or S-state]
unc_cbo_cache_lookup.any_es
[Unit: uncore_cbox L3 Lookup any request that access cache and found
line in E or S-state]
unc_cbo_cache_lookup.any_i
[Unit: uncore_cbox L3 Lookup any request that access cache and found
line in I-state]
unc_cbo_cache_lookup.any_i
[Unit: uncore_cbox L3 Lookup any request that access cache and found
line in I-state]
...
Notice how the events are listed twice.
The named commit changed how we remove duplicate events, in that events
for different PMUs are not treated as duplicates. I suppose this is to
handle how "Each hybrid pmu event has been assigned with a pmu name".
Fix PMU alias listing by restoring behaviour to remove duplicates for
non-hybrid PMUs.
Fixes: 0e0ae87422 ("perf list: Display hybrid PMU events with cpu type")
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/1640103090-140490-1-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Alexei Starovoitov says:
====================
pull-request: bpf-next 2021-12-30
The following pull-request contains BPF updates for your *net-next* tree.
We've added 72 non-merge commits during the last 20 day(s) which contain
a total of 223 files changed, 3510 insertions(+), 1591 deletions(-).
The main changes are:
1) Automatic setrlimit in libbpf when bpf is memcg's in the kernel, from Andrii.
2) Beautify and de-verbose verifier logs, from Christy.
3) Composable verifier types, from Hao.
4) bpf_strncmp helper, from Hou.
5) bpf.h header dependency cleanup, from Jakub.
6) get_func_[arg|ret|arg_cnt] helpers, from Jiri.
7) Sleepable local storage, from KP.
8) Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support, from Kumar.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
commit 077cdda764 ("net/mlx5e: TC, Fix memory leak with rules with internal port")
commit 31108d142f ("net/mlx5: Fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'")
commit 4390c6edc0 ("net/mlx5: Fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'")
https://lore.kernel.org/all/20211229065352.30178-1-saeed@kernel.org/
net/smc/smc_wr.c
commit 49dc9013e3 ("net/smc: Use the bitmap API when applicable")
commit 349d43127d ("net/smc: fix kernel panic caused by race of smc_sock")
bitmap_zero()/memset() is removed by the fix
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Parser did not take ':' into account.
Example:
Before:
$ perf record -e intel_pt//u uname
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.026 MB perf.data ]
$ perf inject -i perf.data --vm-time-correlation="dry-run 123"
$ perf inject -i perf.data --vm-time-correlation="dry-run 123:456"
Failed to parse VM Time Correlation options
0x620 [0x98]: failed to process type: 70 [Invalid argument]
$
After:
$ perf inject -i perf.data --vm-time-correlation="dry-run 123:456"
$
Fixes: e3ff42bdeb ("perf intel-pt: Parse VM Time Correlation options and set up decoding")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211215080636.149562-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The fixed commit attempts to get the output file descriptor even if the
file was never opened e.g.
$ perf record uname
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB perf.data (7 samples) ]
$ perf inject -i perf.data --vm-time-correlation=dry-run
Segmentation fault (core dumped)
$ gdb --quiet perf
Reading symbols from perf...
(gdb) r inject -i perf.data --vm-time-correlation=dry-run
Starting program: /home/ahunter/bin/perf inject -i perf.data --vm-time-correlation=dry-run
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Program received signal SIGSEGV, Segmentation fault.
__GI___fileno (fp=0x0) at fileno.c:35
35 fileno.c: No such file or directory.
(gdb) bt
#0 __GI___fileno (fp=0x0) at fileno.c:35
#1 0x00005621e48dd987 in perf_data__fd (data=0x7fff4c68bd08) at util/data.h:72
#2 perf_data__fd (data=0x7fff4c68bd08) at util/data.h:69
#3 cmd_inject (argc=<optimized out>, argv=0x7fff4c69c1f0) at builtin-inject.c:1017
#4 0x00005621e4936783 in run_builtin (p=0x5621e4ee6878 <commands+600>, argc=4, argv=0x7fff4c69c1f0) at perf.c:313
#5 0x00005621e4897d5c in handle_internal_command (argv=<optimized out>, argc=<optimized out>) at perf.c:365
#6 run_argv (argcp=<optimized out>, argv=<optimized out>) at perf.c:409
#7 main (argc=4, argv=0x7fff4c69c1f0) at perf.c:539
(gdb)
Fixes: 0ae0389362 ("perf tools: Pass a fd to perf_file_header__read_pipe()")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: stable@vger.kernel.org
Link: http://lore.kernel.org/lkml/20211213084829.114772-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The fixed commit attempts to close inject.output even if it was never
opened e.g.
$ perf record uname
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB perf.data (7 samples) ]
$ perf inject -i perf.data --vm-time-correlation=dry-run
Segmentation fault (core dumped)
$ gdb --quiet perf
Reading symbols from perf...
(gdb) r inject -i perf.data --vm-time-correlation=dry-run
Starting program: /home/ahunter/bin/perf inject -i perf.data --vm-time-correlation=dry-run
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Program received signal SIGSEGV, Segmentation fault.
0x00007eff8afeef5b in _IO_new_fclose (fp=0x0) at iofclose.c:48
48 iofclose.c: No such file or directory.
(gdb) bt
#0 0x00007eff8afeef5b in _IO_new_fclose (fp=0x0) at iofclose.c:48
#1 0x0000557fc7b74f92 in perf_data__close (data=data@entry=0x7ffcdafa6578) at util/data.c:376
#2 0x0000557fc7a6b807 in cmd_inject (argc=<optimized out>, argv=<optimized out>) at builtin-inject.c:1085
#3 0x0000557fc7ac4783 in run_builtin (p=0x557fc8074878 <commands+600>, argc=4, argv=0x7ffcdafb6a60) at perf.c:313
#4 0x0000557fc7a25d5c in handle_internal_command (argv=<optimized out>, argc=<optimized out>) at perf.c:365
#5 run_argv (argcp=<optimized out>, argv=<optimized out>) at perf.c:409
#6 main (argc=4, argv=0x7ffcdafb6a60) at perf.c:539
(gdb)
Fixes: 02e6246f53 ("perf inject: Close inject.output on exit")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: stable@vger.kernel.org
Link: http://lore.kernel.org/lkml/20211213084829.114772-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
An overflow (OVF packet) is treated as an error because it represents a
loss of trace data, but there is no loss of synchronization, so the packet
state should be INTEL_PT_STATE_IN_SYNC not INTEL_PT_STATE_ERR_RESYNC.
To support that, some additional variables must be reset, and the FUP
packet that may follow OVF is treated as an FUP event.
Fixes: f4aa081949 ("perf tools: Add Intel PT decoder")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: stable@vger.kernel.org # v5.15+
Link: https://lore.kernel.org/r/20211210162303.2288710-5-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Andrii Nakryiko says:
====================
bpf-next 2021-12-10 v2
We've added 115 non-merge commits during the last 26 day(s) which contain
a total of 182 files changed, 5747 insertions(+), 2564 deletions(-).
The main changes are:
1) Various samples fixes, from Alexander Lobakin.
2) BPF CO-RE support in kernel and light skeleton, from Alexei Starovoitov.
3) A batch of new unified APIs for libbpf, logging improvements, version
querying, etc. Also a batch of old deprecations for old APIs and various
bug fixes, in preparation for libbpf 1.0, from Andrii Nakryiko.
4) BPF documentation reorganization and improvements, from Christoph Hellwig
and Dave Tucker.
5) Support for declarative initialization of BPF_MAP_TYPE_PROG_ARRAY in
libbpf, from Hengqi Chen.
6) Verifier log fixes, from Hou Tao.
7) Runtime-bounded loops support with bpf_loop() helper, from Joanne Koong.
8) Extend branch record capturing to all platforms that support it,
from Kajol Jain.
9) Light skeleton codegen improvements, from Kumar Kartikeya Dwivedi.
10) bpftool doc-generating script improvements, from Quentin Monnet.
11) Two libbpf v0.6 bug fixes, from Shuyi Cheng and Vincent Minet.
12) Deprecation warning fix for perf/bpf_counter, from Song Liu.
13) MAX_TAIL_CALL_CNT unification and MIPS build fix for libbpf,
from Tiezhu Yang.
14) BTF_KING_TYPE_TAG follow-up fixes, from Yonghong Song.
15) Selftests fixes and improvements, from Ilya Leoshkevich, Jean-Philippe
Brucker, Jiri Olsa, Maxim Mikityanskiy, Tirthendu Sarkar, Yucong Sun,
and others.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (115 commits)
libbpf: Add "bool skipped" to struct bpf_map
libbpf: Fix typo in btf__dedup@LIBBPF_0.0.2 definition
bpftool: Switch bpf_object__load_xattr() to bpf_object__load()
selftests/bpf: Remove the only use of deprecated bpf_object__load_xattr()
selftests/bpf: Add test for libbpf's custom log_buf behavior
selftests/bpf: Replace all uses of bpf_load_btf() with bpf_btf_load()
libbpf: Deprecate bpf_object__load_xattr()
libbpf: Add per-program log buffer setter and getter
libbpf: Preserve kernel error code and remove kprobe prog type guessing
libbpf: Improve logging around BPF program loading
libbpf: Allow passing user log setting through bpf_object_open_opts
libbpf: Allow passing preallocated log_buf when loading BTF into kernel
libbpf: Add OPTS-based bpf_btf_load() API
libbpf: Fix bpf_prog_load() log_buf logic for log_level 0
samples/bpf: Remove unneeded variable
bpf: Remove redundant assignment to pointer t
selftests/bpf: Fix a compilation warning
perf/bpf_counter: Use bpf_map_create instead of bpf_create_map
samples: bpf: Fix 'unknown warning group' build warning on Clang
samples: bpf: Fix xdp_sample_user.o linking with Clang
...
====================
Link: https://lore.kernel.org/r/20211210234746.2100561-1-andrii@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
bpf_create_map is deprecated. Replace it with bpf_map_create. Also add a
__weak bpf_map_create() so that when older version of libbpf is linked as
a shared library, it falls back to bpf_create_map().
Fixes: 992c422541 ("libbpf: Unify low-level map creation APIs w/ new bpf_map_create()")
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211207232340.2561471-1-song@kernel.org
Arnaldo reported that building all his containers with BUILD_BPF_SKEL=1
to then make this the default he found problems in some distros where
the system linux/bpf.h file was being used and lacked this:
util/bpf_skel/bperf_leader.bpf.c:13:20: error: use of undeclared identifier 'BPF_F_PRESERVE_ELEMS'
__uint(map_flags, BPF_F_PRESERVE_ELEMS);
So use instead the vmlinux.h file generated by bpftool from BTF info.
This fixed these as well, getting the build back working on debian:11,
debian:experimental and ubuntu:21.10:
In file included from In file included from util/bpf_skel/bperf_leader.bpf.cutil/bpf_skel/bpf_prog_profiler.bpf.c::33:
:
In file included from In file included from /usr/include/linux/bpf.h/usr/include/linux/bpf.h::1111:
:
/usr/include/linux/types.h/usr/include/linux/types.h::55::1010:: In file included from util/bpf_skel/bperf_follower.bpf.c:3fatal errorfatal error:
: : In file included from /usr/include/linux/bpf.h:'asm/types.h' file not found11'asm/types.h' file not found:
/usr/include/linux/types.h:5:10: fatal error: 'asm/types.h' file not found
#include <asm/types.h>#include <asm/types.h>
^~~~~~~~~~~~~ ^~~~~~~~~~~~~
#include <asm/types.h>
^~~~~~~~~~~~~
1 error generated.
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Song Liu <song@kernel.org>
Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/CF175681-8101-43D1-ABDB-449E644BE986@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Some platforms do not have CPU die support, for example s390.
Commit
Cc: Ian Rogers <irogers@google.com>
Fixes: fdf1e29b61 ("perf expr: Add metric literals for topology.")
fails on s390:
# perf test -Fv 7
...
# FAILED tests/expr.c:173 #num_dies >= #num_packages
---- end ----
Simple expression parser: FAILED!
#
Investigating this issue leads to these functions:
build_cpu_topology()
+--> has_die_topology(void)
{
struct utsname uts;
if (uname(&uts) < 0)
return false;
if (strncmp(uts.machine, "x86_64", 6))
return false;
....
}
which always returns false on s390. The caller build_cpu_topology()
checks has_die_topology() return value. On false the the struct
cpu_topology::die_cpu_list is not contructed and has zero entries. This
leads to the failing comparison: #num_dies >= #num_packages. s390 of
course has a positive number of packages.
Fix this and check if the function build_cpu_topology() did build up
a die_cpus_list. The number of entries in this list should be larger
than 0. If the number of list element is zero, the die_cpus_list has
not been created and the check in function test__expr():
TEST_ASSERT_VAL("#num_dies >= #num_packages", \
num_dies >= num_packages)
always fails.
Output after:
# perf test -Fv 7
7: Simple expression parser :
--- start ---
division by zero
syntax error
---- end ----
Simple expression parser: Ok
#
Fixes: fdf1e29b61 ("perf expr: Add metric literals for topology.")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: http://lore.kernel.org/lkml/20211129112339.3003036-1-tmricht@linux.ibm.com
[ Added comment in the added 'if (num_dies)' line about architectures not having die topology ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since 66dfdff03d ("perf tools: Add Python 3 support") we don't use
the tools/build/feature/test-libpython-version.c version in any Makefile
feature check:
$ find tools/ -type f | xargs grep feature-libpython-version
$
The only place where this was used was removed in 66dfdff03d:
- ifneq ($(feature-libpython-version), 1)
- $(warning Python 3 is not yet supported; please set)
- $(warning PYTHON and/or PYTHON_CONFIG appropriately.)
- $(warning If you also have Python 2 installed, then)
- $(warning try something like:)
- $(warning $(and ,))
- $(warning $(and ,) make PYTHON=python2)
- $(warning $(and ,))
- $(warning Otherwise, disable Python support entirely:)
- $(warning $(and ,))
- $(warning $(and ,) make NO_LIBPYTHON=1)
- $(warning $(and ,))
- $(error $(and ,))
- else
- LDFLAGS += $(PYTHON_EMBED_LDFLAGS)
- EXTLIBS += $(PYTHON_EMBED_LIBADD)
- LANG_BINDINGS += $(obj-perf)python/perf.so
- $(call detected,CONFIG_LIBPYTHON)
- endif
And nowadays we either build with PYTHON=python3 or just install the
python3 devel packages and perf will build against it.
But the leftover feature-libpython-version check made the fast path
feature detection to break in all cases except when python2 devel files
were installed:
$ rpm -qa | grep python.*devel
python3-devel-3.9.7-1.fc34.x86_64
$ rm -rf /tmp/build/perf ; mkdir -p /tmp/build/perf ;
$ make -C tools/perf O=/tmp/build/perf install-bin
make: Entering directory '/var/home/acme/git/perf/tools/perf'
BUILD: Doing 'make -j32' parallel build
HOSTCC /tmp/build/perf/fixdep.o
<SNIP>
$ cat /tmp/build/perf/feature/test-all.make.output
In file included from test-all.c:18:
test-libpython-version.c:5:10: error: #error
5 | #error
| ^~~~~
$ ldd ~/bin/perf | grep python
libpython3.9.so.1.0 => /lib64/libpython3.9.so.1.0 (0x00007fda6dbcf000)
$
As python3 is the norm these days, fix this by just removing the unused
feature-libpython-version feature check, making the test-all fast path
to work with the common case.
With this:
$ rm -rf /tmp/build/perf ; mkdir -p /tmp/build/perf ;
$ make -C tools/perf O=/tmp/build/perf install-bin |& head
make: Entering directory '/var/home/acme/git/perf/tools/perf'
BUILD: Doing 'make -j32' parallel build
HOSTCC /tmp/build/perf/fixdep.o
HOSTLD /tmp/build/perf/fixdep-in.o
LINK /tmp/build/perf/fixdep
Auto-detecting system features:
... dwarf: [ on ]
... dwarf_getlocations: [ on ]
... glibc: [ on ]
$ ldd ~/bin/perf | grep python
libpython3.9.so.1.0 => /lib64/libpython3.9.so.1.0 (0x00007f58800b0000)
$ cat /tmp/build/perf/feature/test-all.make.output
$
Reviewed-by: James Clark <james.clark@arm.com>
Fixes: 66dfdff03d ("perf tools: Add Python 3 support")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jaroslav Škarvada <jskarvad@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/YaYmeeC6CS2b8OSz@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The space allowed for new attributes can be too small if existing header
information is large. That can happen, for example, if there are very
many CPUs, due to having an event ID per CPU per event being stored in the
header information.
Fix by adding the existing header.data_offset. Also increase the extra
space allowed to 8KiB and align to a 4KiB boundary for neatness.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20211125071457.2066863-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This: This reverts commit 92723ea0f1.
# perf test 91
91: perf stat --bpf-counters test :RRRRRRRRRRRRR FAILED!
# perf test 91
91: perf stat --bpf-counters test :RRRRRRRRRRRRR FAILED!
# perf test 91
91: perf stat --bpf-counters test :RRRRRRRRRRRR FAILED!
# perf test 91
91: perf stat --bpf-counters test :RRRRRRRRRRRRRRRRRR Ok
# perf test 91
91: perf stat --bpf-counters test :RRRRRRRRR FAILED!
# perf test 91
91: perf stat --bpf-counters test :RRRRRRRRRRR Ok
# perf test 91
91: perf stat --bpf-counters test :RRRRRRRRRRRRRRR Ok
yep, it seems the perf bench is broken so the counts won't correlated if
I revert this one:
92723ea0f1 perf bench: Fix two memory leaks detected with ASan
it works for me again.. it seems to break -t option
[root@dell-r440-01 perf]# ./perf bench sched messaging -g 1 -l 100 -t
# Running 'sched/messaging' benchmark:
RRRperf: CLIENT: ready write: Bad file descriptor
Rperf: SENDER: write: Bad file descriptor
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Sohaib Mohamed <sohaib.amhmd@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/lkml/YZev7KClb%2Fud43Lc@krava/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Libbpf development version was bumped to 0.7 in c93faaaf2f
("libbpf: Deprecate bpf_prog_load_xattr() API"), activating a bunch of
previously scheduled deprecations. Most APIs are pretty straightforward
to replace with newer APIs, but perf has a complicated mixed setup with
libbpf used both as static and shared configurations, which makes it
non-trivial to migrate the APIs.
Further, bpf_program__set_prep() needs more involved refactoring, which
will require help from Arnaldo and/or Jiri.
So for now, mute deprecation warnings and work on migrating perf off of
deprecated APIs separately with the input from owners of the perf tool.
Fixes: c93faaaf2f ("libbpf: Deprecate bpf_prog_load_xattr() API")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211203004640.2455717-1-andrii@kernel.org
Commit 10269a2ca2 ("perf test sample-parsing: Add endian test for
struct branch_flags") broke the test case 27 (Sample parsing) on s390 on
linux-next tree:
# perf test -Fv 27
27: Sample parsing
--- start ---
parsing failed for sample_type 0x800
---- end ----
Sample parsing: FAILED!
#
The cause of the failure is a wrong #define BS_EXPECTED_BE statement in
above commit. Correct this define and the test case runs fine.
Output After:
# perf test -Fv 27
27: Sample parsing :
--- start ---
---- end ----
Sample parsing: Ok
#
Fixes: 10269a2ca2 ("perf test sample-parsing: Add endian test for struct branch_flags")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Tested-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Acked-by: Madhavan Srinivasan <maddy@linux.ibm.com>
CC: Sven Schnelle <svens@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/54077e81-503e-3405-6cb0-6541eb5532cc@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently, the 'weight' field in the perf sample has latency information
for some instructions like in memory accesses. And perf tool has 'weight'
and 'local_weight' sort keys to display the info.
But it's somewhat confusing what it shows exactly. In my understanding,
'local_weight' shows a weight in a single sample, and (global) 'weight'
shows a sum of the weights in the hist_entry.
For example:
$ perf mem record -t load dd if=/dev/zero of=/dev/null bs=4k count=1M
$ perf report --stdio -n -s +local_weight
...
#
# Overhead Samples Command Shared Object Symbol Local Weight
# ........ ....... ....... ................ ......................... ............
#
21.23% 313 dd [kernel.vmlinux] [k] lockref_get_not_zero 32
12.43% 183 dd [kernel.vmlinux] [k] lockref_get_not_zero 35
11.97% 159 dd [kernel.vmlinux] [k] lockref_get_not_zero 36
10.40% 141 dd [kernel.vmlinux] [k] lockref_put_return 32
7.63% 113 dd [kernel.vmlinux] [k] lockref_get_not_zero 33
6.37% 92 dd [kernel.vmlinux] [k] lockref_get_not_zero 34
6.15% 90 dd [kernel.vmlinux] [k] lockref_put_return 33
...
So let's look at the 'lockref_get_not_zero' symbols. The top entry
shows that 313 samples were captured with 'local_weight' 32, so the
total weight should be 313 x 32 = 10016. But it's not the case:
$ perf report --stdio -n -s +local_weight,weight -S lockref_get_not_zero
...
#
# Overhead Samples Command Shared Object Local Weight Weight
# ........ ....... ....... ................ ............ ......
#
1.36% 4 dd [kernel.vmlinux] 36 144
0.47% 4 dd [kernel.vmlinux] 37 148
0.42% 4 dd [kernel.vmlinux] 32 128
0.40% 4 dd [kernel.vmlinux] 34 136
0.35% 4 dd [kernel.vmlinux] 36 144
0.34% 4 dd [kernel.vmlinux] 35 140
0.30% 4 dd [kernel.vmlinux] 36 144
0.30% 4 dd [kernel.vmlinux] 34 136
0.30% 4 dd [kernel.vmlinux] 32 128
0.30% 4 dd [kernel.vmlinux] 32 128
...
With the 'weight' sort key, it's divided to 4 samples even with the same
info ('comm', 'dso', 'sym' and 'local_weight'). I don't think this is
what we want.
I found this because of the way it aggregates the 'weight' value. Since
it's not a period, we should not add them in the he->stat. Otherwise,
two 32 'weight' entries will create a 64 'weight' entry.
After that, new 32 'weight' samples don't have a matching entry so it'd
create a new entry and make it a 64 'weight' entry again and again.
Later, they will be merged into 128 'weight' entries during the
hists__collapse_resort() with 4 samples, multiple times like above.
Let's keep the weight and display it differently. For 'local_weight',
it can show the weight as is, and for (global) 'weight' it can display
the number multiplied by the number of samples.
With this change, I can see the expected numbers.
$ perf report --stdio -n -s +local_weight,weight -S lockref_get_not_zero
...
#
# Overhead Samples Command Shared Object Local Weight Weight
# ........ ....... ....... ................ ............ .....
#
21.23% 313 dd [kernel.vmlinux] 32 10016
12.43% 183 dd [kernel.vmlinux] 35 6405
11.97% 159 dd [kernel.vmlinux] 36 5724
7.63% 113 dd [kernel.vmlinux] 33 3729
6.37% 92 dd [kernel.vmlinux] 34 3128
4.17% 59 dd [kernel.vmlinux] 37 2183
0.08% 1 dd [kernel.vmlinux] 269 269
0.08% 1 dd [kernel.vmlinux] 38 38
Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20211105225617.151364-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>