Alexei reported crash by running test_progs -j on system
with 32 cpus.
It turned out the kprobe_multi bench test that attaches all
ftrace-able functions will race with bpf_dispatcher_update,
that calls bpf_arch_text_poke on bpf_dispatcher_xdp_func,
which is ftrace-able function.
Ftrace is not aware of this update so this will cause
ftrace_bug with:
WARNING: CPU: 6 PID: 1985 at
arch/x86/kernel/ftrace.c:94 ftrace_verify_code+0x27/0x50
...
ftrace_replace_code+0xa3/0x170
ftrace_modify_all_code+0xbd/0x150
ftrace_startup_enable+0x3f/0x50
ftrace_startup+0x98/0xf0
register_ftrace_function+0x20/0x60
register_fprobe_ips+0xbb/0xd0
bpf_kprobe_multi_link_attach+0x179/0x430
__sys_bpf+0x18a1/0x2440
...
------------[ ftrace bug ]------------
ftrace failed to modify
[<ffffffff818d9380>] bpf_dispatcher_xdp_func+0x0/0x10
actual: ffffffe9:7b:ffffff9c:77:1e
Setting ftrace call site to call ftrace function
It looks like we need some way to hide some functions
from ftrace, but meanwhile we workaround this by skipping
bpf_dispatcher_xdp_func from kprobe_multi bench test.
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220714082316.479181-1-jolsa@kernel.org
Return boolean values ("true" or "false") instead of 1 or 0 from bool
functions. This fixes the following warnings from coccicheck:
tools/testing/selftests/bpf/progs/test_xdp_noinline.c:407:9-10: WARNING:
return of 0/1 in function 'decap_v4' with return type bool
tools/testing/selftests/bpf/progs/test_xdp_noinline.c:389:9-10: WARNING:
return of 0/1 in function 'decap_v6' with return type bool
tools/testing/selftests/bpf/progs/test_xdp_noinline.c:290:9-10: WARNING:
return of 0/1 in function 'encap_v6' with return type bool
tools/testing/selftests/bpf/progs/test_xdp_noinline.c:264:9-10: WARNING:
return of 0/1 in function 'parse_tcp' with return type bool
tools/testing/selftests/bpf/progs/test_xdp_noinline.c:242:9-10: WARNING:
return of 0/1 in function 'parse_udp' with return type bool
Generated by: scripts/coccinelle/misc/boolreturn.cocci
Suggested-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Linkui Xiao <xiaolinkui@kylinos.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20220714015647.25074-1-xiaolinkui@kylinos.cn
add subtest verifying BPF ksym iter behaviour. The BPF ksym
iter program shows an example of dumping a format different to
/proc/kallsyms. It adds KIND and MAX_SIZE fields which represent the
kind of symbol (core kernel, module, ftrace, bpf, or kprobe) and
the maximum size the symbol can be. The latter is calculated from
the difference between current symbol value and the next symbol
value.
The key benefit for this iterator will likely be supporting in-kernel
data-gathering rather than dumping symbol details to userspace and
parsing the results.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/1657629105-7812-3-git-send-email-alan.maguire@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann says:
====================
pull-request: bpf-next 2022-07-09
We've added 94 non-merge commits during the last 19 day(s) which contain
a total of 125 files changed, 5141 insertions(+), 6701 deletions(-).
The main changes are:
1) Add new way for performing BTF type queries to BPF, from Daniel Müller.
2) Add inlining of calls to bpf_loop() helper when its function callback is
statically known, from Eduard Zingerman.
3) Implement BPF TCP CC framework usability improvements, from Jörn-Thorben Hinz.
4) Add LSM flavor for attaching per-cgroup BPF programs to existing LSM
hooks, from Stanislav Fomichev.
5) Remove all deprecated libbpf APIs in prep for 1.0 release, from Andrii Nakryiko.
6) Add benchmarks around local_storage to BPF selftests, from Dave Marchevsky.
7) AF_XDP sample removal (given move to libxdp) and various improvements around AF_XDP
selftests, from Magnus Karlsson & Maciej Fijalkowski.
8) Add bpftool improvements for memcg probing and bash completion, from Quentin Monnet.
9) Add arm64 JIT support for BPF-2-BPF coupled with tail calls, from Jakub Sitnicki.
10) Sockmap optimizations around throughput of UDP transmissions which have been
improved by 61%, from Cong Wang.
11) Rework perf's BPF prologue code to remove deprecated functions, from Jiri Olsa.
12) Fix sockmap teardown path to avoid sleepable sk_psock_stop, from John Fastabend.
13) Fix libbpf's cleanup around legacy kprobe/uprobe on error case, from Chuang Wang.
14) Fix libbpf's bpf_helpers.h to work with gcc for the case of its sec/pragma
macro, from James Hilliard.
15) Fix libbpf's pt_regs macros for riscv to use a0 for RC register, from Yixun Lan.
16) Fix bpftool to show the name of type BPF_OBJ_LINK, from Yafang Shao.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (94 commits)
selftests/bpf: Fix xdp_synproxy build failure if CONFIG_NF_CONNTRACK=m/n
bpf: Correctly propagate errors up from bpf_core_composites_match
libbpf: Disable SEC pragma macro on GCC
bpf: Check attach_func_proto more carefully in check_return_code
selftests/bpf: Add test involving restrict type qualifier
bpftool: Add support for KIND_RESTRICT to gen min_core_btf command
MAINTAINERS: Add entry for AF_XDP selftests files
selftests, xsk: Rename AF_XDP testing app
bpf, docs: Remove deprecated xsk libbpf APIs description
selftests/bpf: Add benchmark for local_storage RCU Tasks Trace usage
libbpf, riscv: Use a0 for RC register
libbpf: Remove unnecessary usdt_rel_ip assignments
selftests/bpf: Fix few more compiler warnings
selftests/bpf: Fix bogus uninitialized variable warning
bpftool: Remove zlib feature test from Makefile
libbpf: Cleanup the legacy uprobe_event on failed add/attach_event()
libbpf: Fix wrong variable used in perf_event_uprobe_open_legacy()
libbpf: Cleanup the legacy kprobe_event on failed add/attach_event()
selftests/bpf: Add type match test against kernel's task_struct
selftests/bpf: Add nested type to type based tests
...
====================
Link: https://lore.kernel.org/r/20220708233145.32365-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When CONFIG_NF_CONNTRACK=m, struct bpf_ct_opts and enum member
BPF_F_CURRENT_NETNS are not exposed. This commit allows building the
xdp_synproxy selftest in such cases. Note that nf_conntrack must be
loaded before running the test if it's compiled as a module.
This commit also allows this selftest to be successfully compiled when
CONFIG_NF_CONNTRACK is disabled.
One unused local variable of type struct bpf_ct_opts is also removed.
Fixes: fb5cd0ce70 ("selftests/bpf: Add selftests for raw syncookie helpers")
Reported-by: Yauheni Kaliuta <ykaliuta@redhat.com>
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220708130319.1016294-1-maximmi@nvidia.com
Syzkaller reports the following crash:
RIP: 0010:check_return_code kernel/bpf/verifier.c:10575 [inline]
RIP: 0010:do_check kernel/bpf/verifier.c:12346 [inline]
RIP: 0010:do_check_common+0xb3d2/0xd250 kernel/bpf/verifier.c:14610
With the following reproducer:
bpf$PROG_LOAD_XDP(0x5, &(0x7f00000004c0)={0xd, 0x3, &(0x7f0000000000)=ANY=[@ANYBLOB="1800000000000019000000000000000095"], &(0x7f0000000300)='GPL\x00', 0x0, 0x0, 0x0, 0x0, 0x0, '\x00', 0x0, 0x2b, 0xffffffffffffffff, 0x8, 0x0, 0x0, 0x10, 0x0}, 0x80)
Because we don't enforce expected_attach_type for XDP programs,
we end up in hitting 'if (prog->expected_attach_type == BPF_LSM_CGROUP'
part in check_return_code and follow up with testing
`prog->aux->attach_func_proto->type`, but `prog->aux->attach_func_proto`
is NULL.
Add explicit prog_type check for the "Note, BPF_LSM_CGROUP that
attach ..." condition. Also, don't skip return code check for
LSM/STRUCT_OPS.
The above actually brings an issue with existing selftest which
tries to return EPERM from void inet_csk_clone. Fix the
test (and move called_socket_clone to make sure it's not
incremented in case of an error) and add a new one to explicitly
verify this condition.
Fixes: 69fd337a97 ("bpf: per-cgroup lsm flavor")
Reported-by: syzbot+5cc0730bd4b4d2c5f152@syzkaller.appspotmail.com
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220708175000.2603078-1-sdf@google.com
Recently, xsk part of libbpf was moved to selftests/bpf directory and
lives on its own because there is an AF_XDP testing application that
needs it called xdpxceiver. That name makes it a bit hard to indicate
who maintains it as there are other XDP samples in there, whereas this
one is strictly about AF_XDP.
Do s/xdpxceiver/xskxceiver so that it will be easier to figure out who
maintains it. A follow-up patch will correct MAINTAINERS file.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220707111613.49031-2-maciej.fijalkowski@intel.com
Commit 13bbbfbea7 ("bpf: Add bpf_dynptr_read and bpf_dynptr_write")
added the bpf_dynptr_write() and bpf_dynptr_read() APIs.
However, it will be needed for some dynptr types to pass in flags as
well (e.g. when writing to a skb, the user may like to invalidate the
hash or recompute the checksum).
This patch adds a "u64 flags" arg to the bpf_dynptr_read() and
bpf_dynptr_write() APIs before their UAPI signature freezes where
we then cannot change them anymore with a 5.19.x released kernel.
Fixes: 13bbbfbea7 ("bpf: Add bpf_dynptr_read and bpf_dynptr_write")
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20220706232547.4016651-1-joannelkoong@gmail.com
This benchmark measures grace period latency and kthread cpu usage of
RCU Tasks Trace when many processes are creating/deleting BPF
local_storage. Intent here is to quantify improvement on these metrics
after Paul's recent RCU Tasks patches [0].
Specifically, fork 15k tasks which call a bpf prog that creates/destroys
task local_storage and sleep in a loop, resulting in many
call_rcu_tasks_trace calls.
To determine grace period latency, trace time elapsed between
rcu_tasks_trace_pregp_step and rcu_tasks_trace_postgp; for cpu usage
look at rcu_task_trace_kthread's stime in /proc/PID/stat.
On my virtualized test environment (Skylake, 8 cpus) benchmark results
demonstrate significant improvement:
BEFORE Paul's patches:
SUMMARY tasks_trace grace period latency avg 22298.551 us stddev 1302.165 us
SUMMARY ticks per tasks_trace grace period avg 2.291 stddev 0.324
AFTER Paul's patches:
SUMMARY tasks_trace grace period latency avg 16969.197 us stddev 2525.053 us
SUMMARY ticks per tasks_trace grace period avg 1.146 stddev 0.178
Note that since these patches are not in bpf-next benchmarking was done
by cherry-picking this patch onto rcu tree.
[0] https://lore.kernel.org/rcu/20220620225402.GA3842369@paulmck-ThinkPad-P17-Gen-1/
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220705190018.3239050-1-davemarchevsky@fb.com
When compiling with -O2, GCC detects few problems with selftests/bpf, so
fix all of them. Two are real issues (uninitialized err and nums
out-of-bounds access), but two other uninitialized variables warnings
are due to GCC not being able to prove that variables are indeed
initialized under conditions under which they are used.
Fix all 4 cases, though.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20220705224818.4026623-3-andrii@kernel.org
When compiling selftests/bpf in optimized mode (-O2), GCC erroneously
complains about uninitialized token variable:
In file included from network_helpers.c:22:
network_helpers.c: In function ‘open_netns’:
test_progs.h:355:22: error: ‘token’ may be used uninitialized [-Werror=maybe-uninitialized]
355 | int ___err = libbpf_get_error(___res); \
| ^~~~~~~~~~~~~~~~~~~~~~~~
network_helpers.c:440:14: note: in expansion of macro ‘ASSERT_OK_PTR’
440 | if (!ASSERT_OK_PTR(token, "malloc token"))
| ^~~~~~~~~~~~~
In file included from /data/users/andriin/linux/tools/testing/selftests/bpf/tools/include/bpf/libbpf.h:21,
from bpf_util.h:9,
from network_helpers.c:20:
/data/users/andriin/linux/tools/testing/selftests/bpf/tools/include/bpf/libbpf_legacy.h:113:17: note: by argument 1 of type ‘const void *’ to ‘libbpf_get_error’ declared here
113 | LIBBPF_API long libbpf_get_error(const void *ptr);
| ^~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
make: *** [Makefile:522: /data/users/andriin/linux/tools/testing/selftests/bpf/network_helpers.o] Error 1
This is completely bogus becuase libbpf_get_error() doesn't dereference
pointer, but the only easy way to silence this is to allocate initialized
memory with calloc().
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20220705224818.4026623-2-andrii@kernel.org
This change extends the type based tests with another struct type (in
addition to a_struct) to check relocations against: a_complex_struct.
This type is nested more deeply to provide additional coverage of
certain paths in the type match logic.
Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220628160127.607834-10-deso@posteo.net
This change adds another type-based self-test that specifically aims to
test some more characteristics of the TYPE_MATCH logic. Specifically, it
covers a few more potential differences between types, such as different
orders, enum variant values, and integer signedness.
Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220628160127.607834-9-deso@posteo.net
Now that we have type-match logic in both libbpf and the kernel, this
change adjusts the existing BPF self tests to check this functionality.
Specifically, we extend the existing type-based tests to check the
previously introduced bpf_core_type_matches macro.
Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220628160127.607834-8-deso@posteo.net
Daniel Borkmann says:
====================
pull-request: bpf 2022-07-02
We've added 7 non-merge commits during the last 14 day(s) which contain
a total of 6 files changed, 193 insertions(+), 86 deletions(-).
The main changes are:
1) Fix clearing of page contiguity when unmapping XSK pool, from Ivan Malov.
2) Two verifier fixes around bounds data propagation, from Daniel Borkmann.
3) Fix fprobe sample module's parameter descriptions, from Masami Hiramatsu.
4) General BPF maintainer entry revamp to better scale patch reviews.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf, selftests: Add verifier test case for jmp32's jeq/jne
bpf, selftests: Add verifier test case for imm=0,umin=0,umax=1 scalar
bpf: Fix insufficient bounds propagation from adjust_scalar_min_max_vals
bpf: Fix incorrect verifier simulation around jmp32's jeq/jne
xsk: Clear page contiguity bit when unmapping pool
bpf, docs: Better scale maintenance of BPF subsystem
fprobe, samples: Add module parameter descriptions
====================
Link: https://lore.kernel.org/r/20220701230121.10354-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a test case to trigger the verifier's incorrect conclusion in the
case of jmp32's jeq/jne. Also here, make use of dead code elimination,
so that we can see the verifier bailing out on unfixed kernels.
Before:
# ./test_verifier 724
#724/p jeq32/jne32: bounds checking FAIL
Failed to load prog 'Permission denied'!
R4 !read_ok
verification time 8 usec
stack depth 0
processed 8 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 0
Summary: 0 PASSED, 0 SKIPPED, 1 FAILED
After:
# ./test_verifier 724
#724/p jeq32/jne32: bounds checking OK
Summary: 1 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220701124727.11153-4-daniel@iogearbox.net
Add a test case to trigger the constant scalar issue which leaves the
register in scalar(imm=0,umin=0,umax=1,var_off=(0x0; 0x0)) state. Make
use of dead code elimination, so that we can see the verifier bailing
out on unfixed kernels. For the condition, we use jle given it checks
on umax bound.
Before:
# ./test_verifier 743
#743/p jump & dead code elimination FAIL
Failed to load prog 'Permission denied'!
R4 !read_ok
verification time 11 usec
stack depth 0
processed 13 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
Summary: 0 PASSED, 0 SKIPPED, 1 FAILED
After:
# ./test_verifier 743
#743/p jump & dead code elimination OK
Summary: 1 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220701124727.11153-3-daniel@iogearbox.net
drivers/net/ethernet/microchip/sparx5/sparx5_switchdev.c
9c5de246c1 ("net: sparx5: mdb add/del handle non-sparx5 devices")
fbb89d02e3 ("net: sparx5: Allow mdb entries to both CPU and ports")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, xsk_socket__delete frees BPF resources regardless of ctx
refcount. Xdpxceiver has a test to verify whether underlying BPF
resources would not be wiped out after closing XSK socket that was
bound to interface with other active sockets. From library's xsk part
perspective it also means that the internal xsk context is shared and
its refcount is bumped accordingly.
After a switch to loading XDP prog based on previously opened XSK
socket, mentioned xdpxceiver test fails with:
not ok 16 [xdpxceiver.c:swap_xsk_resources:1334]: ERROR: 9/"Bad file descriptor
which means that in swap_xsk_resources(), xsk_socket__delete() released
xskmap which in turn caused a failure of xsk_socket__update_xskmap().
To fix this, when deleting socket, decrement ctx refcount before
releasing BPF resources and do so only when refcount dropped to 0 which
means there are no more active sockets for this ctx so BPF resources can
be freed safely.
Fixes: 2f6324a393 ("libbpf: Support shared umems between queues and devices")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20220629143458.934337-5-maciej.fijalkowski@intel.com
Currently, xsk_setup_xdp_prog() uses anonymous xsk_socket struct which
means that during xsk_create_bpf_link() call, xsk->config.xdp_flags is
always 0. This in turn means that from xdpxceiver it is impossible to
use xdpgeneric attachment, so since commit 3b22523bca ("selftests,
xsk: Fix bpf_res cleanup test") we were not testing SKB mode at all.
To fix this, introduce a function, called xsk_setup_xdp_prog_xsk(), that
will load XDP prog based on the existing xsk_socket, so that xsk
context's refcount is correctly bumped and flags from application side
are respected. Use this from xdpxceiver side so we get coverage of
generic and native XDP program attach points.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20220629143458.934337-3-maciej.fijalkowski@intel.com
Currently bpf_link probe is done for each call of xsk_socket__create().
For cases where xsk context was previously created and current socket
creation uses it, has_bpf_link will be overwritten, where it has already
been initialized.
Optimize this by moving the query to the xsk_create_ctx() so that when
xsk_get_ctx() finds a ctx then no further bpf_link probes are needed.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20220629143458.934337-2-maciej.fijalkowski@intel.com
Now that bpftool is able to produce a list of known program, map, attach
types, let's use as much of this as we can in the bash completion file,
so that we don't have to expand the list each time a new type is added
to the kernel.
Also update the relevant test script to remove some checks that are no
longer needed.
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Müller <deso@posteo.net>
Link: https://lore.kernel.org/bpf/20220629203637.138944-3-quentin@isovalent.com
Libbpf 1.0 stops support legacy-style BPF map definitions. Selftests has
been migrated away from using legacy BPF map definitions except for two
selftests, to make sure that legacy functionality still worked in
pre-1.0 libbpf. Now it's time to let those tests go as libbpf 1.0 is
imminent.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220627211527.2245459-14-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Remove deprecated xsk APIs from libbpf. But given we have selftests
relying on this, move those files (with minimal adjustments to make them
compilable) under selftests/bpf.
We also remove all the removed APIs from libbpf.map, while overall
keeping version inheritance chain, as most APIs are backwards
compatible so there is no need to reassign them as LIBBPF_1.0.0 versions.
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220627211527.2245459-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This test verifies that bpf_loop() inlining works as expected when
address of `env->prog` is updated. This address is updated upon BPF
program reallocation.
Reallocation is handled by bpf_prog_realloc(), which reuses old memory
if page boundary is not crossed. The value of `len` in the test is
chosen to cross this boundary on bpf_loop() patching.
Verify that the use-after-free bug in inline_bpf_loop() reported by
Dan Carpenter is fixed.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220624020613.548108-3-eddyz87@gmail.com
Add a benchmarks to demonstrate the performance cliff for local_storage
get as the number of local_storage maps increases beyond current
local_storage implementation's cache size.
"sequential get" and "interleaved get" benchmarks are added, both of
which do many bpf_task_storage_get calls on sets of task local_storage
maps of various counts, while considering a single specific map to be
'important' and counting task_storage_gets to the important map
separately in addition to normal 'hits' count of all gets. Goal here is
to mimic scenario where a particular program using one map - the
important one - is running on a system where many other local_storage
maps exist and are accessed often.
While "sequential get" benchmark does bpf_task_storage_get for map 0, 1,
..., {9, 99, 999} in order, "interleaved" benchmark interleaves 4
bpf_task_storage_gets for the important map for every 10 map gets. This
is meant to highlight performance differences when important map is
accessed far more frequently than non-important maps.
A "hashmap control" benchmark is also included for easy comparison of
standard bpf hashmap lookup vs local_storage get. The benchmark is
similar to "sequential get", but creates and uses BPF_MAP_TYPE_HASH
instead of local storage. Only one inner map is created - a hashmap
meant to hold tid -> data mapping for all tasks. Size of the hashmap is
hardcoded to my system's PID_MAX_LIMIT (4,194,304). The number of these
keys which are actually fetched as part of the benchmark is
configurable.
Addition of this benchmark is inspired by conversation with Alexei in a
previous patchset's thread [0], which highlighted the need for such a
benchmark to motivate and validate improvements to local_storage
implementation. My approach in that series focused on improving
performance for explicitly-marked 'important' maps and was rejected
with feedback to make more generally-applicable improvements while
avoiding explicitly marking maps as important. Thus the benchmark
reports both general and important-map-focused metrics, so effect of
future work on both is clear.
Regarding the benchmark results. On a powerful system (Skylake, 20
cores, 256gb ram):
Hashmap Control
===============
num keys: 10
hashmap (control) sequential get: hits throughput: 20.900 ± 0.334 M ops/s, hits latency: 47.847 ns/op, important_hits throughput: 20.900 ± 0.334 M ops/s
num keys: 1000
hashmap (control) sequential get: hits throughput: 13.758 ± 0.219 M ops/s, hits latency: 72.683 ns/op, important_hits throughput: 13.758 ± 0.219 M ops/s
num keys: 10000
hashmap (control) sequential get: hits throughput: 6.995 ± 0.034 M ops/s, hits latency: 142.959 ns/op, important_hits throughput: 6.995 ± 0.034 M ops/s
num keys: 100000
hashmap (control) sequential get: hits throughput: 4.452 ± 0.371 M ops/s, hits latency: 224.635 ns/op, important_hits throughput: 4.452 ± 0.371 M ops/s
num keys: 4194304
hashmap (control) sequential get: hits throughput: 3.043 ± 0.033 M ops/s, hits latency: 328.587 ns/op, important_hits throughput: 3.043 ± 0.033 M ops/s
Local Storage
=============
num_maps: 1
local_storage cache sequential get: hits throughput: 47.298 ± 0.180 M ops/s, hits latency: 21.142 ns/op, important_hits throughput: 47.298 ± 0.180 M ops/s
local_storage cache interleaved get: hits throughput: 55.277 ± 0.888 M ops/s, hits latency: 18.091 ns/op, important_hits throughput: 55.277 ± 0.888 M ops/s
num_maps: 10
local_storage cache sequential get: hits throughput: 40.240 ± 0.802 M ops/s, hits latency: 24.851 ns/op, important_hits throughput: 4.024 ± 0.080 M ops/s
local_storage cache interleaved get: hits throughput: 48.701 ± 0.722 M ops/s, hits latency: 20.533 ns/op, important_hits throughput: 17.393 ± 0.258 M ops/s
num_maps: 16
local_storage cache sequential get: hits throughput: 44.515 ± 0.708 M ops/s, hits latency: 22.464 ns/op, important_hits throughput: 2.782 ± 0.044 M ops/s
local_storage cache interleaved get: hits throughput: 49.553 ± 2.260 M ops/s, hits latency: 20.181 ns/op, important_hits throughput: 15.767 ± 0.719 M ops/s
num_maps: 17
local_storage cache sequential get: hits throughput: 38.778 ± 0.302 M ops/s, hits latency: 25.788 ns/op, important_hits throughput: 2.284 ± 0.018 M ops/s
local_storage cache interleaved get: hits throughput: 43.848 ± 1.023 M ops/s, hits latency: 22.806 ns/op, important_hits throughput: 13.349 ± 0.311 M ops/s
num_maps: 24
local_storage cache sequential get: hits throughput: 19.317 ± 0.568 M ops/s, hits latency: 51.769 ns/op, important_hits throughput: 0.806 ± 0.024 M ops/s
local_storage cache interleaved get: hits throughput: 24.397 ± 0.272 M ops/s, hits latency: 40.989 ns/op, important_hits throughput: 6.863 ± 0.077 M ops/s
num_maps: 32
local_storage cache sequential get: hits throughput: 13.333 ± 0.135 M ops/s, hits latency: 75.000 ns/op, important_hits throughput: 0.417 ± 0.004 M ops/s
local_storage cache interleaved get: hits throughput: 16.898 ± 0.383 M ops/s, hits latency: 59.178 ns/op, important_hits throughput: 4.717 ± 0.107 M ops/s
num_maps: 100
local_storage cache sequential get: hits throughput: 6.360 ± 0.107 M ops/s, hits latency: 157.233 ns/op, important_hits throughput: 0.064 ± 0.001 M ops/s
local_storage cache interleaved get: hits throughput: 7.303 ± 0.362 M ops/s, hits latency: 136.930 ns/op, important_hits throughput: 1.907 ± 0.094 M ops/s
num_maps: 1000
local_storage cache sequential get: hits throughput: 0.452 ± 0.010 M ops/s, hits latency: 2214.022 ns/op, important_hits throughput: 0.000 ± 0.000 M ops/s
local_storage cache interleaved get: hits throughput: 0.542 ± 0.007 M ops/s, hits latency: 1843.341 ns/op, important_hits throughput: 0.136 ± 0.002 M ops/s
Looking at the "sequential get" results, it's clear that as the
number of task local_storage maps grows beyond the current cache size
(16), there's a significant reduction in hits throughput. Note that
current local_storage implementation assigns a cache_idx to maps as they
are created. Since "sequential get" is creating maps 0..n in order and
then doing bpf_task_storage_get calls in the same order, the benchmark
is effectively ensuring that a map will not be in cache when the program
tries to access it.
For "interleaved get" results, important-map hits throughput is greatly
increased as the important map is more likely to be in cache by virtue
of being accessed far more frequently. Throughput still reduces as #
maps increases, though.
To get a sense of the overhead of the benchmark program, I
commented out bpf_task_storage_get/bpf_map_lookup_elem in
local_storage_bench.c and ran the benchmark on the same host as the
'real' run. Results:
Hashmap Control
===============
num keys: 10
hashmap (control) sequential get: hits throughput: 54.288 ± 0.655 M ops/s, hits latency: 18.420 ns/op, important_hits throughput: 54.288 ± 0.655 M ops/s
num keys: 1000
hashmap (control) sequential get: hits throughput: 52.913 ± 0.519 M ops/s, hits latency: 18.899 ns/op, important_hits throughput: 52.913 ± 0.519 M ops/s
num keys: 10000
hashmap (control) sequential get: hits throughput: 53.480 ± 1.235 M ops/s, hits latency: 18.699 ns/op, important_hits throughput: 53.480 ± 1.235 M ops/s
num keys: 100000
hashmap (control) sequential get: hits throughput: 54.982 ± 1.902 M ops/s, hits latency: 18.188 ns/op, important_hits throughput: 54.982 ± 1.902 M ops/s
num keys: 4194304
hashmap (control) sequential get: hits throughput: 50.858 ± 0.707 M ops/s, hits latency: 19.662 ns/op, important_hits throughput: 50.858 ± 0.707 M ops/s
Local Storage
=============
num_maps: 1
local_storage cache sequential get: hits throughput: 110.990 ± 4.828 M ops/s, hits latency: 9.010 ns/op, important_hits throughput: 110.990 ± 4.828 M ops/s
local_storage cache interleaved get: hits throughput: 161.057 ± 4.090 M ops/s, hits latency: 6.209 ns/op, important_hits throughput: 161.057 ± 4.090 M ops/s
num_maps: 10
local_storage cache sequential get: hits throughput: 112.930 ± 1.079 M ops/s, hits latency: 8.855 ns/op, important_hits throughput: 11.293 ± 0.108 M ops/s
local_storage cache interleaved get: hits throughput: 115.841 ± 2.088 M ops/s, hits latency: 8.633 ns/op, important_hits throughput: 41.372 ± 0.746 M ops/s
num_maps: 16
local_storage cache sequential get: hits throughput: 115.653 ± 0.416 M ops/s, hits latency: 8.647 ns/op, important_hits throughput: 7.228 ± 0.026 M ops/s
local_storage cache interleaved get: hits throughput: 138.717 ± 1.649 M ops/s, hits latency: 7.209 ns/op, important_hits throughput: 44.137 ± 0.525 M ops/s
num_maps: 17
local_storage cache sequential get: hits throughput: 112.020 ± 1.649 M ops/s, hits latency: 8.927 ns/op, important_hits throughput: 6.598 ± 0.097 M ops/s
local_storage cache interleaved get: hits throughput: 128.089 ± 1.960 M ops/s, hits latency: 7.807 ns/op, important_hits throughput: 38.995 ± 0.597 M ops/s
num_maps: 24
local_storage cache sequential get: hits throughput: 92.447 ± 5.170 M ops/s, hits latency: 10.817 ns/op, important_hits throughput: 3.855 ± 0.216 M ops/s
local_storage cache interleaved get: hits throughput: 128.844 ± 2.808 M ops/s, hits latency: 7.761 ns/op, important_hits throughput: 36.245 ± 0.790 M ops/s
num_maps: 32
local_storage cache sequential get: hits throughput: 102.042 ± 1.462 M ops/s, hits latency: 9.800 ns/op, important_hits throughput: 3.194 ± 0.046 M ops/s
local_storage cache interleaved get: hits throughput: 126.577 ± 1.818 M ops/s, hits latency: 7.900 ns/op, important_hits throughput: 35.332 ± 0.507 M ops/s
num_maps: 100
local_storage cache sequential get: hits throughput: 111.327 ± 1.401 M ops/s, hits latency: 8.983 ns/op, important_hits throughput: 1.113 ± 0.014 M ops/s
local_storage cache interleaved get: hits throughput: 131.327 ± 1.339 M ops/s, hits latency: 7.615 ns/op, important_hits throughput: 34.302 ± 0.350 M ops/s
num_maps: 1000
local_storage cache sequential get: hits throughput: 101.978 ± 0.563 M ops/s, hits latency: 9.806 ns/op, important_hits throughput: 0.102 ± 0.001 M ops/s
local_storage cache interleaved get: hits throughput: 141.084 ± 1.098 M ops/s, hits latency: 7.088 ns/op, important_hits throughput: 35.430 ± 0.276 M ops/s
Adjusting for overhead, latency numbers for "hashmap control" and
"sequential get" are:
hashmap_control_1k: ~53.8ns
hashmap_control_10k: ~124.2ns
hashmap_control_100k: ~206.5ns
sequential_get_1: ~12.1ns
sequential_get_10: ~16.0ns
sequential_get_16: ~13.8ns
sequential_get_17: ~16.8ns
sequential_get_24: ~40.9ns
sequential_get_32: ~65.2ns
sequential_get_100: ~148.2ns
sequential_get_1000: ~2204ns
Clearly demonstrating a cliff.
In the discussion for v1 of this patch, Alexei noted that local_storage
was 2.5x faster than a large hashmap when initially implemented [1]. The
benchmark results show that local_storage is 5-10x faster: a
long-running BPF application putting some pid-specific info into a
hashmap for each pid it sees will probably see on the order of 10-100k
pids. Bench numbers for hashmaps of this size are ~10x slower than
sequential_get_16, but as the number of local_storage maps grows far
past local_storage cache size the performance advantage shrinks and
eventually reverses.
When running the benchmarks it may be necessary to bump 'open files'
ulimit for a successful run.
[0]: https://lore.kernel.org/all/20220420002143.1096548-1-davemarchevsky@fb.com
[1]: https://lore.kernel.org/bpf/20220511173305.ftldpn23m4ski3d3@MBP-98dd607d3435.dhcp.thefacebook.com/
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20220620222554.270578-1-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Two new test BPF programs for test_prog selftests checking bpf_loop
behavior. Both are corner cases for bpf_loop inlinig transformation:
- check that bpf_loop behaves correctly when callback function is not
a compile time constant
- check that local function variables are not affected by allocating
additional stack storage for registers spilled by loop inlining
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/r/20220620235344.569325-6-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
A number of test cases for BPF selftests test_verifier to check how
bpf_loop inline transformation rewrites the BPF program. The following
cases are covered:
- happy path
- no-rewrite when flags is non-zero
- no-rewrite when callback is non-constant
- subprogno in insn_aux is updated correctly when dead sub-programs
are removed
- check that correct stack offsets are assigned for spilling of R6-R8
registers
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/r/20220620235344.569325-5-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The BTF and func_info specification for test_verifier tests follows
the same notation as in prog_tests/btf.c tests. E.g.:
...
.func_info = { { 0, 6 }, { 8, 7 } },
.func_info_cnt = 2,
.btf_strings = "\0int\0",
.btf_types = {
BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4),
BTF_PTR_ENC(1),
},
...
The BTF specification is loaded only when specified.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/r/20220620235344.569325-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Allows to specify expected and unexpected instruction sequences in
test_verifier test cases. The instructions are requested from kernel
after BPF program loading, thus allowing to check some of the
transformations applied by BPF verifier.
- `expected_insn` field specifies a sequence of instructions expected
to be found in the program;
- `unexpected_insn` field specifies a sequence of instructions that
are not expected to be found in the program;
- `INSN_OFF_MASK` and `INSN_IMM_MASK` values could be used to mask
`off` and `imm` fields.
- `SKIP_INSNS` could be used to specify that some instructions in the
(un)expected pattern are not important (behavior similar to usage of
`\t` in `errstr` field).
The intended usage is as follows:
{
"inline simple bpf_loop call",
.insns = {
/* main */
BPF_ALU64_IMM(BPF_MOV, BPF_REG_1, 1),
BPF_RAW_INSN(BPF_LD | BPF_IMM | BPF_DW, BPF_REG_2,
BPF_PSEUDO_FUNC, 0, 6),
...
BPF_EXIT_INSN(),
/* callback */
BPF_ALU64_IMM(BPF_MOV, BPF_REG_0, 1),
BPF_EXIT_INSN(),
},
.expected_insns = {
BPF_ALU64_IMM(BPF_MOV, BPF_REG_1, 1),
SKIP_INSNS(),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_CALL, 8, 1)
},
.unexpected_insns = {
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0,
INSN_OFF_MASK, INSN_IMM_MASK),
},
.prog_type = BPF_PROG_TYPE_TRACEPOINT,
.result = ACCEPT,
.runs = 0,
},
Here it is expected that move of 1 to register 1 would remain in place
and helper function call instruction would be replaced by a relative
call instruction.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/r/20220620235344.569325-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann says:
====================
pull-request: bpf-next 2022-06-17
We've added 72 non-merge commits during the last 15 day(s) which contain
a total of 92 files changed, 4582 insertions(+), 834 deletions(-).
The main changes are:
1) Add 64 bit enum value support to BTF, from Yonghong Song.
2) Implement support for sleepable BPF uprobe programs, from Delyan Kratunov.
3) Add new BPF helpers to issue and check TCP SYN cookies without binding to a
socket especially useful in synproxy scenarios, from Maxim Mikityanskiy.
4) Fix libbpf's internal USDT address translation logic for shared libraries as
well as uprobe's symbol file offset calculation, from Andrii Nakryiko.
5) Extend libbpf to provide an API for textual representation of the various
map/prog/attach/link types and use it in bpftool, from Daniel Müller.
6) Provide BTF line info for RV64 and RV32 JITs, and fix a put_user bug in the
core seen in 32 bit when storing BPF function addresses, from Pu Lehui.
7) Fix libbpf's BTF pointer size guessing by adding a list of various aliases
for 'long' types, from Douglas Raillard.
8) Fix bpftool to readd setting rlimit since probing for memcg-based accounting
has been unreliable and caused a regression on COS, from Quentin Monnet.
9) Fix UAF in BPF cgroup's effective program computation triggered upon BPF link
detachment, from Tadeusz Struk.
10) Fix bpftool build bootstrapping during cross compilation which was pointing
to the wrong AR process, from Shahab Vahedi.
11) Fix logic bug in libbpf's is_pow_of_2 implementation, from Yuze Chi.
12) BPF hash map optimization to avoid grabbing spinlocks of all CPUs when there
is no free element. Also add a benchmark as reproducer, from Feng Zhou.
13) Fix bpftool's codegen to bail out when there's no BTF, from Michael Mullin.
14) Various minor cleanup and improvements all over the place.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (72 commits)
bpf: Fix bpf_skc_lookup comment wrt. return type
bpf: Fix non-static bpf_func_proto struct definitions
selftests/bpf: Don't force lld on non-x86 architectures
selftests/bpf: Add selftests for raw syncookie helpers in TC mode
bpf: Allow the new syncookie helpers to work with SKBs
selftests/bpf: Add selftests for raw syncookie helpers
bpf: Add helpers to issue and check SYN cookies in XDP
bpf: Allow helpers to accept pointers with a fixed size
bpf: Fix documentation of th_len in bpf_tcp_{gen,check}_syncookie
selftests/bpf: add tests for sleepable (uk)probes
libbpf: add support for sleepable uprobe programs
bpf: allow sleepable uprobe programs to attach
bpf: implement sleepable uprobes by chaining gps
bpf: move bpf_prog to bpf.h
libbpf: Fix internal USDT address translation logic for shared libraries
samples/bpf: Check detach prog exist or not in xdp_fwd
selftests/bpf: Avoid skipping certain subtests
selftests/bpf: Fix test_varlen verification failure with latest llvm
bpftool: Do not check return value from libbpf_set_strict_mode()
Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"
...
====================
Link: https://lore.kernel.org/r/20220617220836.7373-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>