Merge tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:

 - Add redirect_neigh() BPF packet redirect helper, allowing to limit
   stack traversal in common container configs and improving TCP
   back-pressure.

   Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.

 - Expand netlink policy support and improve policy export to user
   space. (Ge)netlink core performs request validation according to
   declared policies. Expand the expressiveness of those policies
   (min/max length and bitmasks). Allow dumping policies for particular
   commands. This is used for feature discovery by user space (instead
   of kernel version parsing or trial and error).

 - Support IGMPv3/MLDv2 multicast listener discovery protocols in
   bridge.

 - Allow more than 255 IPv4 multicast interfaces.

 - Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
   packets of TCPv6.

 - In Multi-patch TCP (MPTCP) support concurrent transmission of data on
   multiple subflows in a load balancing scenario. Enhance advertising
   addresses via the RM_ADDR/ADD_ADDR options.

 - Support SMC-Dv2 version of SMC, which enables multi-subnet
   deployments.

 - Allow more calls to same peer in RxRPC.

 - Support two new Controller Area Network (CAN) protocols - CAN-FD and
   ISO 15765-2:2016.

 - Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
   kernel problem.

 - Add TC actions for implementing MPLS L2 VPNs.

 - Improve nexthop code - e.g. handle various corner cases when nexthop
   objects are removed from groups better, skip unnecessary
   notifications and make it easier to offload nexthops into HW by
   converting to a blocking notifier.

 - Support adding and consuming TCP header options by BPF programs,
   opening the doors for easy experimental and deployment-specific TCP
   option use.

 - Reorganize TCP congestion control (CC) initialization to simplify
   life of TCP CC implemented in BPF.

 - Add support for shipping BPF programs with the kernel and loading
   them early on boot via the User Mode Driver mechanism, hence reusing
   all the user space infra we have.

 - Support sleepable BPF programs, initially targeting LSM and tracing.

 - Add bpf_d_path() helper for returning full path for given 'struct
   path'.

 - Make bpf_tail_call compatible with bpf-to-bpf calls.

 - Allow BPF programs to call map_update_elem on sockmaps.

 - Add BPF Type Format (BTF) support for type and enum discovery, as
   well as support for using BTF within the kernel itself (current use
   is for pretty printing structures).

 - Support listing and getting information about bpf_links via the bpf
   syscall.

 - Enhance kernel interfaces around NIC firmware update. Allow
   specifying overwrite mask to control if settings etc. are reset
   during update; report expected max time operation may take to users;
   support firmware activation without machine reboot incl. limits of
   how much impact reset may have (e.g. dropping link or not).

 - Extend ethtool configuration interface to report IEEE-standard
   counters, to limit the need for per-vendor logic in user space.

 - Adopt or extend devlink use for debug, monitoring, fw update in many
   drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx,
   dpaa2-eth).

 - In mlxsw expose critical and emergency SFP module temperature alarms.
   Refactor port buffer handling to make the defaults more suitable and
   support setting these values explicitly via the DCBNL interface.

 - Add XDP support for Intel's igb driver.

 - Support offloading TC flower classification and filtering rules to
   mscc_ocelot switches.

 - Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
   fixed interval period pulse generator and one-step timestamping in
   dpaa-eth.

 - Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
   offload.

 - Add Lynx PHY/PCS MDIO module, and convert various drivers which have
   this HW to use it. Convert mvpp2 to split PCS.

 - Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
   7-port Mediatek MT7531 IP.

 - Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
   and wcn3680 support in wcn36xx.

 - Improve performance for packets which don't require much offloads on
   recent Mellanox NICs by 20% by making multiple packets share a
   descriptor entry.

 - Move chelsio inline crypto drivers (for TLS and IPsec) from the
   crypto subtree to drivers/net. Move MDIO drivers out of the phy
   directory.

 - Clean up a lot of W=1 warnings, reportedly the actively developed
   subsections of networking drivers should now build W=1 warning free.

 - Make sure drivers don't use in_interrupt() to dynamically adapt their
   code. Convert tasklets to use new tasklet_setup API (sadly this
   conversion is not yet complete).

* tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits)
  Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH"
  net, sockmap: Don't call bpf_prog_put() on NULL pointer
  bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo
  bpf, sockmap: Add locking annotations to iterator
  netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements
  net: fix pos incrementment in ipv6_route_seq_next
  net/smc: fix invalid return code in smcd_new_buf_create()
  net/smc: fix valid DMBE buffer sizes
  net/smc: fix use-after-free of delayed events
  bpfilter: Fix build error with CONFIG_BPFILTER_UMH
  cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr
  net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info
  bpf: Fix register equivalence tracking.
  rxrpc: Fix loss of final ack on shutdown
  rxrpc: Fix bundle counting for exclusive connections
  netfilter: restore NF_INET_NUMHOOKS
  ibmveth: Identify ingress large send packets.
  ibmveth: Switch order of ibmveth_helper calls.
  cxgb4: handle 4-tuple PEDIT to NAT mode translation
  selftests: Add VRF route leaking tests
  ...
This commit is contained in:
Linus Torvalds
2020-10-15 18:42:13 -07:00
2301 changed files with 130033 additions and 50954 deletions

View File

@@ -50,4 +50,5 @@ xdp_rxq_info
xdp_sample_pkts
xdp_tx_iptunnel
xdpsock
xsk_fwd
testfile.img

View File

@@ -48,6 +48,7 @@ tprogs-y += syscall_tp
tprogs-y += cpustat
tprogs-y += xdp_adjust_tail
tprogs-y += xdpsock
tprogs-y += xsk_fwd
tprogs-y += xdp_fwd
tprogs-y += task_fd_query
tprogs-y += xdp_sample_pkts
@@ -71,12 +72,12 @@ tracex4-objs := tracex4_user.o
tracex5-objs := tracex5_user.o $(TRACE_HELPERS)
tracex6-objs := tracex6_user.o
tracex7-objs := tracex7_user.o
test_probe_write_user-objs := bpf_load.o test_probe_write_user_user.o
trace_output-objs := bpf_load.o trace_output_user.o $(TRACE_HELPERS)
lathist-objs := bpf_load.o lathist_user.o
offwaketime-objs := bpf_load.o offwaketime_user.o $(TRACE_HELPERS)
spintest-objs := bpf_load.o spintest_user.o $(TRACE_HELPERS)
map_perf_test-objs := bpf_load.o map_perf_test_user.o
test_probe_write_user-objs := test_probe_write_user_user.o
trace_output-objs := trace_output_user.o $(TRACE_HELPERS)
lathist-objs := lathist_user.o
offwaketime-objs := offwaketime_user.o $(TRACE_HELPERS)
spintest-objs := spintest_user.o $(TRACE_HELPERS)
map_perf_test-objs := map_perf_test_user.o
test_overhead-objs := bpf_load.o test_overhead_user.o
test_cgrp2_array_pin-objs := test_cgrp2_array_pin.o
test_cgrp2_attach-objs := test_cgrp2_attach.o
@@ -86,7 +87,7 @@ xdp1-objs := xdp1_user.o
# reuse xdp1 source intentionally
xdp2-objs := xdp1_user.o
xdp_router_ipv4-objs := xdp_router_ipv4_user.o
test_current_task_under_cgroup-objs := bpf_load.o $(CGROUP_HELPERS) \
test_current_task_under_cgroup-objs := $(CGROUP_HELPERS) \
test_current_task_under_cgroup_user.o
trace_event-objs := trace_event_user.o $(TRACE_HELPERS)
sampleip-objs := sampleip_user.o $(TRACE_HELPERS)
@@ -97,13 +98,14 @@ test_map_in_map-objs := test_map_in_map_user.o
per_socket_stats_example-objs := cookie_uid_helper_example.o
xdp_redirect-objs := xdp_redirect_user.o
xdp_redirect_map-objs := xdp_redirect_map_user.o
xdp_redirect_cpu-objs := bpf_load.o xdp_redirect_cpu_user.o
xdp_monitor-objs := bpf_load.o xdp_monitor_user.o
xdp_redirect_cpu-objs := xdp_redirect_cpu_user.o
xdp_monitor-objs := xdp_monitor_user.o
xdp_rxq_info-objs := xdp_rxq_info_user.o
syscall_tp-objs := bpf_load.o syscall_tp_user.o
cpustat-objs := bpf_load.o cpustat_user.o
syscall_tp-objs := syscall_tp_user.o
cpustat-objs := cpustat_user.o
xdp_adjust_tail-objs := xdp_adjust_tail_user.o
xdpsock-objs := xdpsock_user.o
xsk_fwd-objs := xsk_fwd.o
xdp_fwd-objs := xdp_fwd_user.o
task_fd_query-objs := bpf_load.o task_fd_query_user.o $(TRACE_HELPERS)
xdp_sample_pkts-objs := xdp_sample_pkts_user.o $(TRACE_HELPERS)
@@ -203,11 +205,14 @@ TPROGLDLIBS_trace_output += -lrt
TPROGLDLIBS_map_perf_test += -lrt
TPROGLDLIBS_test_overhead += -lrt
TPROGLDLIBS_xdpsock += -pthread
TPROGLDLIBS_xsk_fwd += -pthread
# Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline:
# make M=samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang
LLC ?= llc
CLANG ?= clang
OPT ?= opt
LLVM_DIS ?= llvm-dis
LLVM_OBJCOPY ?= llvm-objcopy
BTF_PAHOLE ?= pahole
@@ -300,6 +305,11 @@ $(obj)/hbm_edt_kern.o: $(src)/hbm.h $(src)/hbm_kern.h
# asm/sysreg.h - inline assembly used by it is incompatible with llvm.
# But, there is no easy way to fix it, so just exclude it since it is
# useless for BPF samples.
# below we use long chain of commands, clang | opt | llvm-dis | llc,
# to generate final object file. 'clang' compiles the source into IR
# with native target, e.g., x64, arm64, etc. 'opt' does bpf CORE IR builtin
# processing (llvm12) and IR optimizations. 'llvm-dis' converts
# 'opt' output to IR, and finally 'llc' generates bpf byte code.
$(obj)/%.o: $(src)/%.c
@echo " CLANG-bpf " $@
$(Q)$(CLANG) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(BPF_EXTRA_CFLAGS) \
@@ -311,7 +321,9 @@ $(obj)/%.o: $(src)/%.c
-Wno-address-of-packed-member -Wno-tautological-compare \
-Wno-unknown-warning-option $(CLANG_ARCH_ARGS) \
-I$(srctree)/samples/bpf/ -include asm_goto_workaround.h \
-O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf $(LLC_FLAGS) -filetype=obj -o $@
-O2 -emit-llvm -Xclang -disable-llvm-passes -c $< -o - | \
$(OPT) -O2 -mtriple=bpf-pc-linux | $(LLVM_DIS) | \
$(LLC) -march=bpf $(LLC_FLAGS) -filetype=obj -o $@
ifeq ($(DWARF2BTF),y)
$(BTF_PAHOLE) -J $@
endif

View File

@@ -51,28 +51,28 @@ static int cpu_opps[] = { 208000, 432000, 729000, 960000, 1200000 };
#define MAP_OFF_PSTATE_IDX 3
#define MAP_OFF_NUM 4
struct bpf_map_def SEC("maps") my_map = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(u64),
.max_entries = MAX_CPU * MAP_OFF_NUM,
};
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__type(key, u32);
__type(value, u64);
__uint(max_entries, MAX_CPU * MAP_OFF_NUM);
} my_map SEC(".maps");
/* cstate_duration records duration time for every idle state per CPU */
struct bpf_map_def SEC("maps") cstate_duration = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(u64),
.max_entries = MAX_CPU * MAX_CSTATE_ENTRIES,
};
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__type(key, u32);
__type(value, u64);
__uint(max_entries, MAX_CPU * MAX_CSTATE_ENTRIES);
} cstate_duration SEC(".maps");
/* pstate_duration records duration time for every operating point per CPU */
struct bpf_map_def SEC("maps") pstate_duration = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(u64),
.max_entries = MAX_CPU * MAX_PSTATE_ENTRIES,
};
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__type(key, u32);
__type(value, u64);
__uint(max_entries, MAX_CPU * MAX_PSTATE_ENTRIES);
} pstate_duration SEC(".maps");
/*
* The trace events for cpu_idle and cpu_frequency are taken from:

View File

@@ -9,7 +9,6 @@
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <linux/bpf.h>
#include <locale.h>
#include <sys/types.h>
#include <sys/stat.h>
@@ -18,7 +17,9 @@
#include <sys/wait.h>
#include <bpf/bpf.h>
#include "bpf_load.h"
#include <bpf/libbpf.h>
static int cstate_map_fd, pstate_map_fd;
#define MAX_CPU 8
#define MAX_PSTATE_ENTRIES 5
@@ -181,21 +182,50 @@ static void int_exit(int sig)
{
cpu_stat_inject_cpu_idle_event();
cpu_stat_inject_cpu_frequency_event();
cpu_stat_update(map_fd[1], map_fd[2]);
cpu_stat_update(cstate_map_fd, pstate_map_fd);
cpu_stat_print();
exit(0);
}
int main(int argc, char **argv)
{
struct bpf_link *link = NULL;
struct bpf_program *prog;
struct bpf_object *obj;
char filename[256];
int ret;
snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
obj = bpf_object__open_file(filename, NULL);
if (libbpf_get_error(obj)) {
fprintf(stderr, "ERROR: opening BPF object file failed\n");
return 0;
}
if (load_bpf_file(filename)) {
printf("%s", bpf_log_buf);
return 1;
prog = bpf_object__find_program_by_name(obj, "bpf_prog1");
if (!prog) {
printf("finding a prog in obj file failed\n");
goto cleanup;
}
/* load BPF program */
if (bpf_object__load(obj)) {
fprintf(stderr, "ERROR: loading BPF object file failed\n");
goto cleanup;
}
cstate_map_fd = bpf_object__find_map_fd_by_name(obj, "cstate_duration");
pstate_map_fd = bpf_object__find_map_fd_by_name(obj, "pstate_duration");
if (cstate_map_fd < 0 || pstate_map_fd < 0) {
fprintf(stderr, "ERROR: finding a map in obj file failed\n");
goto cleanup;
}
link = bpf_program__attach(prog);
if (libbpf_get_error(link)) {
fprintf(stderr, "ERROR: bpf_program__attach failed\n");
link = NULL;
goto cleanup;
}
ret = cpu_stat_inject_cpu_idle_event();
@@ -210,10 +240,13 @@ int main(int argc, char **argv)
signal(SIGTERM, int_exit);
while (1) {
cpu_stat_update(map_fd[1], map_fd[2]);
cpu_stat_update(cstate_map_fd, pstate_map_fd);
cpu_stat_print();
sleep(5);
}
cleanup:
bpf_link__destroy(link);
bpf_object__close(obj);
return 0;
}

View File

@@ -40,6 +40,7 @@
#include <errno.h>
#include <fcntl.h>
#include <linux/unistd.h>
#include <linux/compiler.h>
#include <linux/bpf.h>
#include <bpf/bpf.h>
@@ -483,7 +484,7 @@ int main(int argc, char **argv)
"Option -%c requires an argument.\n\n",
optopt);
case 'h':
fallthrough;
__fallthrough;
default:
Usage();
return 0;

View File

@@ -18,12 +18,12 @@
* trace_preempt_[on|off] tracepoints hooks is not supported.
*/
struct bpf_map_def SEC("maps") my_map = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(u64),
.max_entries = MAX_CPU,
};
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__type(key, int);
__type(value, u64);
__uint(max_entries, MAX_CPU);
} my_map SEC(".maps");
SEC("kprobe/trace_preempt_off")
int bpf_prog1(struct pt_regs *ctx)
@@ -61,12 +61,12 @@ static unsigned int log2l(unsigned long v)
return log2(v);
}
struct bpf_map_def SEC("maps") my_lat = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(long),
.max_entries = MAX_CPU * MAX_ENTRIES,
};
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__type(key, int);
__type(value, long);
__uint(max_entries, MAX_CPU * MAX_ENTRIES);
} my_lat SEC(".maps");
SEC("kprobe/trace_preempt_on")
int bpf_prog2(struct pt_regs *ctx)

View File

@@ -6,9 +6,8 @@
#include <unistd.h>
#include <stdlib.h>
#include <signal.h>
#include <linux/bpf.h>
#include <bpf/libbpf.h>
#include <bpf/bpf.h>
#include "bpf_load.h"
#define MAX_ENTRIES 20
#define MAX_CPU 4
@@ -81,20 +80,51 @@ static void get_data(int fd)
int main(int argc, char **argv)
{
struct bpf_link *links[2];
struct bpf_program *prog;
struct bpf_object *obj;
char filename[256];
int map_fd, i = 0;
snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
obj = bpf_object__open_file(filename, NULL);
if (libbpf_get_error(obj)) {
fprintf(stderr, "ERROR: opening BPF object file failed\n");
return 0;
}
if (load_bpf_file(filename)) {
printf("%s", bpf_log_buf);
return 1;
/* load BPF program */
if (bpf_object__load(obj)) {
fprintf(stderr, "ERROR: loading BPF object file failed\n");
goto cleanup;
}
map_fd = bpf_object__find_map_fd_by_name(obj, "my_lat");
if (map_fd < 0) {
fprintf(stderr, "ERROR: finding a map in obj file failed\n");
goto cleanup;
}
bpf_object__for_each_program(prog, obj) {
links[i] = bpf_program__attach(prog);
if (libbpf_get_error(links[i])) {
fprintf(stderr, "ERROR: bpf_program__attach failed\n");
links[i] = NULL;
goto cleanup;
}
i++;
}
while (1) {
get_data(map_fd[1]);
get_data(map_fd);
print_hist();
sleep(5);
}
cleanup:
for (i--; i >= 0; i--)
bpf_link__destroy(links[i]);
bpf_object__close(obj);
return 0;
}

View File

@@ -28,38 +28,38 @@ struct key_t {
u32 tret;
};
struct bpf_map_def SEC("maps") counts = {
.type = BPF_MAP_TYPE_HASH,
.key_size = sizeof(struct key_t),
.value_size = sizeof(u64),
.max_entries = 10000,
};
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__type(key, struct key_t);
__type(value, u64);
__uint(max_entries, 10000);
} counts SEC(".maps");
struct bpf_map_def SEC("maps") start = {
.type = BPF_MAP_TYPE_HASH,
.key_size = sizeof(u32),
.value_size = sizeof(u64),
.max_entries = 10000,
};
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__type(key, u32);
__type(value, u64);
__uint(max_entries, 10000);
} start SEC(".maps");
struct wokeby_t {
char name[TASK_COMM_LEN];
u32 ret;
};
struct bpf_map_def SEC("maps") wokeby = {
.type = BPF_MAP_TYPE_HASH,
.key_size = sizeof(u32),
.value_size = sizeof(struct wokeby_t),
.max_entries = 10000,
};
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__type(key, u32);
__type(value, struct wokeby_t);
__uint(max_entries, 10000);
} wokeby SEC(".maps");
struct bpf_map_def SEC("maps") stackmap = {
.type = BPF_MAP_TYPE_STACK_TRACE,
.key_size = sizeof(u32),
.value_size = PERF_MAX_STACK_DEPTH * sizeof(u64),
.max_entries = 10000,
};
struct {
__uint(type, BPF_MAP_TYPE_STACK_TRACE);
__uint(key_size, sizeof(u32));
__uint(value_size, PERF_MAX_STACK_DEPTH * sizeof(u64));
__uint(max_entries, 10000);
} stackmap SEC(".maps");
#define STACKID_FLAGS (0 | BPF_F_FAST_STACK_CMP)

View File

@@ -5,19 +5,19 @@
#include <unistd.h>
#include <stdlib.h>
#include <signal.h>
#include <linux/bpf.h>
#include <string.h>
#include <linux/perf_event.h>
#include <errno.h>
#include <assert.h>
#include <stdbool.h>
#include <sys/resource.h>
#include <bpf/libbpf.h>
#include "bpf_load.h"
#include <bpf/bpf.h>
#include "trace_helpers.h"
#define PRINT_RAW_ADDR 0
/* counts, stackmap */
static int map_fd[2];
static void print_ksym(__u64 addr)
{
struct ksym *sym;
@@ -52,14 +52,14 @@ static void print_stack(struct key_t *key, __u64 count)
int i;
printf("%s;", key->target);
if (bpf_map_lookup_elem(map_fd[3], &key->tret, ip) != 0) {
if (bpf_map_lookup_elem(map_fd[1], &key->tret, ip) != 0) {
printf("---;");
} else {
for (i = PERF_MAX_STACK_DEPTH - 1; i >= 0; i--)
print_ksym(ip[i]);
}
printf("-;");
if (bpf_map_lookup_elem(map_fd[3], &key->wret, ip) != 0) {
if (bpf_map_lookup_elem(map_fd[1], &key->wret, ip) != 0) {
printf("---;");
} else {
for (i = 0; i < PERF_MAX_STACK_DEPTH; i++)
@@ -96,23 +96,54 @@ static void int_exit(int sig)
int main(int argc, char **argv)
{
struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
struct bpf_object *obj = NULL;
struct bpf_link *links[2];
struct bpf_program *prog;
int delay = 1, i = 0;
char filename[256];
int delay = 1;
snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
setrlimit(RLIMIT_MEMLOCK, &r);
signal(SIGINT, int_exit);
signal(SIGTERM, int_exit);
if (setrlimit(RLIMIT_MEMLOCK, &r)) {
perror("setrlimit(RLIMIT_MEMLOCK)");
return 1;
}
if (load_kallsyms()) {
printf("failed to process /proc/kallsyms\n");
return 2;
}
if (load_bpf_file(filename)) {
printf("%s", bpf_log_buf);
return 1;
snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
obj = bpf_object__open_file(filename, NULL);
if (libbpf_get_error(obj)) {
fprintf(stderr, "ERROR: opening BPF object file failed\n");
obj = NULL;
goto cleanup;
}
/* load BPF program */
if (bpf_object__load(obj)) {
fprintf(stderr, "ERROR: loading BPF object file failed\n");
goto cleanup;
}
map_fd[0] = bpf_object__find_map_fd_by_name(obj, "counts");
map_fd[1] = bpf_object__find_map_fd_by_name(obj, "stackmap");
if (map_fd[0] < 0 || map_fd[1] < 0) {
fprintf(stderr, "ERROR: finding a map in obj file failed\n");
goto cleanup;
}
signal(SIGINT, int_exit);
signal(SIGTERM, int_exit);
bpf_object__for_each_program(prog, obj) {
links[i] = bpf_program__attach(prog);
if (libbpf_get_error(links[i])) {
fprintf(stderr, "ERROR: bpf_program__attach failed\n");
links[i] = NULL;
goto cleanup;
}
i++;
}
if (argc > 1)
@@ -120,5 +151,10 @@ int main(int argc, char **argv)
sleep(delay);
print_stacks(map_fd[0]);
cleanup:
for (i--; i >= 0; i--)
bpf_link__destroy(links[i]);
bpf_object__close(obj);
return 0;
}

View File

@@ -31,28 +31,30 @@ struct {
#define PARSE_IP 3
#define PARSE_IPV6 4
/* protocol dispatch routine.
* It tail-calls next BPF program depending on eth proto
* Note, we could have used:
* bpf_tail_call(skb, &jmp_table, proto);
* but it would need large prog_array
/* Protocol dispatch routine. It tail-calls next BPF program depending
* on eth proto. Note, we could have used ...
*
* bpf_tail_call(skb, &jmp_table, proto);
*
* ... but it would need large prog_array and cannot be optimised given
* the map key is not static.
*/
static inline void parse_eth_proto(struct __sk_buff *skb, u32 proto)
{
switch (proto) {
case ETH_P_8021Q:
case ETH_P_8021AD:
bpf_tail_call(skb, &jmp_table, PARSE_VLAN);
bpf_tail_call_static(skb, &jmp_table, PARSE_VLAN);
break;
case ETH_P_MPLS_UC:
case ETH_P_MPLS_MC:
bpf_tail_call(skb, &jmp_table, PARSE_MPLS);
bpf_tail_call_static(skb, &jmp_table, PARSE_MPLS);
break;
case ETH_P_IP:
bpf_tail_call(skb, &jmp_table, PARSE_IP);
bpf_tail_call_static(skb, &jmp_table, PARSE_IP);
break;
case ETH_P_IPV6:
bpf_tail_call(skb, &jmp_table, PARSE_IPV6);
bpf_tail_call_static(skb, &jmp_table, PARSE_IPV6);
break;
}
}

View File

@@ -29,8 +29,8 @@ int main(int argc, char **argv)
struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
struct bpf_program *prog;
struct bpf_object *obj;
const char *section;
char filename[256];
const char *title;
FILE *f;
snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
@@ -58,8 +58,8 @@ int main(int argc, char **argv)
bpf_object__for_each_program(prog, obj) {
fd = bpf_program__fd(prog);
title = bpf_program__title(prog, false);
if (sscanf(title, "socket/%d", &key) != 1) {
section = bpf_program__section_name(prog);
if (sscanf(section, "socket/%d", &key) != 1) {
fprintf(stderr, "ERROR: finding prog failed\n");
goto cleanup;
}

View File

@@ -12,25 +12,25 @@
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
struct bpf_map_def SEC("maps") my_map = {
.type = BPF_MAP_TYPE_HASH,
.key_size = sizeof(long),
.value_size = sizeof(long),
.max_entries = 1024,
};
struct bpf_map_def SEC("maps") my_map2 = {
.type = BPF_MAP_TYPE_PERCPU_HASH,
.key_size = sizeof(long),
.value_size = sizeof(long),
.max_entries = 1024,
};
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__type(key, long);
__type(value, long);
__uint(max_entries, 1024);
} my_map SEC(".maps");
struct {
__uint(type, BPF_MAP_TYPE_PERCPU_HASH);
__uint(key_size, sizeof(long));
__uint(value_size, sizeof(long));
__uint(max_entries, 1024);
} my_map2 SEC(".maps");
struct bpf_map_def SEC("maps") stackmap = {
.type = BPF_MAP_TYPE_STACK_TRACE,
.key_size = sizeof(u32),
.value_size = PERF_MAX_STACK_DEPTH * sizeof(u64),
.max_entries = 10000,
};
struct {
__uint(type, BPF_MAP_TYPE_STACK_TRACE);
__uint(key_size, sizeof(u32));
__uint(value_size, PERF_MAX_STACK_DEPTH * sizeof(u64));
__uint(max_entries, 10000);
} stackmap SEC(".maps");
#define PROG(foo) \
int foo(struct pt_regs *ctx) \

View File

@@ -1,40 +1,77 @@
// SPDX-License-Identifier: GPL-2.0
#include <stdio.h>
#include <unistd.h>
#include <linux/bpf.h>
#include <string.h>
#include <assert.h>
#include <sys/resource.h>
#include <bpf/libbpf.h>
#include "bpf_load.h"
#include <bpf/bpf.h>
#include "trace_helpers.h"
int main(int ac, char **argv)
{
struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
char filename[256], symbol[256];
struct bpf_object *obj = NULL;
struct bpf_link *links[20];
long key, next_key, value;
char filename[256];
struct bpf_program *prog;
int map_fd, i, j = 0;
const char *section;
struct ksym *sym;
int i;
snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
setrlimit(RLIMIT_MEMLOCK, &r);
if (setrlimit(RLIMIT_MEMLOCK, &r)) {
perror("setrlimit(RLIMIT_MEMLOCK)");
return 1;
}
if (load_kallsyms()) {
printf("failed to process /proc/kallsyms\n");
return 2;
}
if (load_bpf_file(filename)) {
printf("%s", bpf_log_buf);
return 1;
snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
obj = bpf_object__open_file(filename, NULL);
if (libbpf_get_error(obj)) {
fprintf(stderr, "ERROR: opening BPF object file failed\n");
obj = NULL;
goto cleanup;
}
/* load BPF program */
if (bpf_object__load(obj)) {
fprintf(stderr, "ERROR: loading BPF object file failed\n");
goto cleanup;
}
map_fd = bpf_object__find_map_fd_by_name(obj, "my_map");
if (map_fd < 0) {
fprintf(stderr, "ERROR: finding a map in obj file failed\n");
goto cleanup;
}
bpf_object__for_each_program(prog, obj) {
section = bpf_program__section_name(prog);
if (sscanf(section, "kprobe/%s", symbol) != 1)
continue;
/* Attach prog only when symbol exists */
if (ksym_get_addr(symbol)) {
links[j] = bpf_program__attach(prog);
if (libbpf_get_error(links[j])) {
fprintf(stderr, "bpf_program__attach failed\n");
links[j] = NULL;
goto cleanup;
}
j++;
}
}
for (i = 0; i < 5; i++) {
key = 0;
printf("kprobing funcs:");
while (bpf_map_get_next_key(map_fd[0], &key, &next_key) == 0) {
bpf_map_lookup_elem(map_fd[0], &next_key, &value);
while (bpf_map_get_next_key(map_fd, &key, &next_key) == 0) {
bpf_map_lookup_elem(map_fd, &next_key, &value);
assert(next_key == value);
sym = ksym_search(value);
key = next_key;
@@ -48,10 +85,15 @@ int main(int ac, char **argv)
if (key)
printf("\n");
key = 0;
while (bpf_map_get_next_key(map_fd[0], &key, &next_key) == 0)
bpf_map_delete_elem(map_fd[0], &next_key);
while (bpf_map_get_next_key(map_fd, &key, &next_key) == 0)
bpf_map_delete_elem(map_fd, &next_key);
sleep(1);
}
cleanup:
for (j--; j >= 0; j--)
bpf_link__destroy(links[j]);
bpf_object__close(obj);
return 0;
}

View File

@@ -18,19 +18,19 @@ struct syscalls_exit_open_args {
long ret;
};
struct bpf_map_def SEC("maps") enter_open_map = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(u32),
.max_entries = 1,
};
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__type(key, u32);
__type(value, u32);
__uint(max_entries, 1);
} enter_open_map SEC(".maps");
struct bpf_map_def SEC("maps") exit_open_map = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(u32),
.max_entries = 1,
};
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__type(key, u32);
__type(value, u32);
__uint(max_entries, 1);
} exit_open_map SEC(".maps");
static __always_inline void count(void *map)
{

View File

@@ -5,16 +5,12 @@
#include <unistd.h>
#include <fcntl.h>
#include <stdlib.h>
#include <signal.h>
#include <linux/bpf.h>
#include <string.h>
#include <linux/perf_event.h>
#include <errno.h>
#include <assert.h>
#include <stdbool.h>
#include <sys/resource.h>
#include <bpf/libbpf.h>
#include <bpf/bpf.h>
#include "bpf_load.h"
/* This program verifies bpf attachment to tracepoint sys_enter_* and sys_exit_*.
* This requires kernel CONFIG_FTRACE_SYSCALLS to be set.
@@ -49,16 +45,44 @@ static void verify_map(int map_id)
static int test(char *filename, int num_progs)
{
int i, fd, map0_fds[num_progs], map1_fds[num_progs];
int map0_fds[num_progs], map1_fds[num_progs], fd, i, j = 0;
struct bpf_link *links[num_progs * 4];
struct bpf_object *objs[num_progs];
struct bpf_program *prog;
for (i = 0; i < num_progs; i++) {
if (load_bpf_file(filename)) {
fprintf(stderr, "%s", bpf_log_buf);
return 1;
objs[i] = bpf_object__open_file(filename, NULL);
if (libbpf_get_error(objs[i])) {
fprintf(stderr, "opening BPF object file failed\n");
objs[i] = NULL;
goto cleanup;
}
printf("prog #%d: map ids %d %d\n", i, map_fd[0], map_fd[1]);
map0_fds[i] = map_fd[0];
map1_fds[i] = map_fd[1];
/* load BPF program */
if (bpf_object__load(objs[i])) {
fprintf(stderr, "loading BPF object file failed\n");
goto cleanup;
}
map0_fds[i] = bpf_object__find_map_fd_by_name(objs[i],
"enter_open_map");
map1_fds[i] = bpf_object__find_map_fd_by_name(objs[i],
"exit_open_map");
if (map0_fds[i] < 0 || map1_fds[i] < 0) {
fprintf(stderr, "finding a map in obj file failed\n");
goto cleanup;
}
bpf_object__for_each_program(prog, objs[i]) {
links[j] = bpf_program__attach(prog);
if (libbpf_get_error(links[j])) {
fprintf(stderr, "bpf_program__attach failed\n");
links[j] = NULL;
goto cleanup;
}
j++;
}
printf("prog #%d: map ids %d %d\n", i, map0_fds[i], map1_fds[i]);
}
/* current load_bpf_file has perf_event_open default pid = -1
@@ -80,6 +104,12 @@ static int test(char *filename, int num_progs)
verify_map(map1_fds[i]);
}
cleanup:
for (j--; j >= 0; j--)
bpf_link__destroy(links[j]);
for (i--; i >= 0; i--)
bpf_object__close(objs[i]);
return 0;
}

View File

@@ -10,7 +10,7 @@ int bpf_prog1(struct pt_regs *ctx)
return 0;
}
SEC("kretprobe/blk_account_io_completion")
SEC("kretprobe/blk_account_io_done")
int bpf_prog2(struct pt_regs *ctx)
{
return 0;

View File

@@ -314,7 +314,7 @@ int main(int argc, char **argv)
/* test two functions in the corresponding *_kern.c file */
CHECK_AND_RET(test_debug_fs_kprobe(0, "blk_mq_start_request",
BPF_FD_TYPE_KPROBE));
CHECK_AND_RET(test_debug_fs_kprobe(1, "blk_account_io_completion",
CHECK_AND_RET(test_debug_fs_kprobe(1, "blk_account_io_done",
BPF_FD_TYPE_KRETPROBE));
/* test nondebug fs kprobe */

View File

@@ -10,23 +10,24 @@
#include <linux/version.h>
#include <bpf/bpf_helpers.h>
#include <uapi/linux/utsname.h>
#include "trace_common.h"
struct bpf_map_def SEC("maps") cgroup_map = {
.type = BPF_MAP_TYPE_CGROUP_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(u32),
.max_entries = 1,
};
struct {
__uint(type, BPF_MAP_TYPE_CGROUP_ARRAY);
__uint(key_size, sizeof(u32));
__uint(value_size, sizeof(u32));
__uint(max_entries, 1);
} cgroup_map SEC(".maps");
struct bpf_map_def SEC("maps") perf_map = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(u64),
.max_entries = 1,
};
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__type(key, u32);
__type(value, u64);
__uint(max_entries, 1);
} perf_map SEC(".maps");
/* Writes the last PID that called sync to a map at index 0 */
SEC("kprobe/sys_sync")
SEC("kprobe/" SYSCALL(sys_sync))
int bpf_prog1(struct pt_regs *ctx)
{
u64 pid = bpf_get_current_pid_tgid();

View File

@@ -4,10 +4,9 @@
#define _GNU_SOURCE
#include <stdio.h>
#include <linux/bpf.h>
#include <unistd.h>
#include <bpf/bpf.h>
#include "bpf_load.h"
#include <bpf/libbpf.h>
#include "cgroup_helpers.h"
#define CGROUP_PATH "/my-cgroup"
@@ -15,13 +14,44 @@
int main(int argc, char **argv)
{
pid_t remote_pid, local_pid = getpid();
int cg2, idx = 0, rc = 0;
struct bpf_link *link = NULL;
struct bpf_program *prog;
int cg2, idx = 0, rc = 1;
struct bpf_object *obj;
char filename[256];
int map_fd[2];
snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
if (load_bpf_file(filename)) {
printf("%s", bpf_log_buf);
return 1;
obj = bpf_object__open_file(filename, NULL);
if (libbpf_get_error(obj)) {
fprintf(stderr, "ERROR: opening BPF object file failed\n");
return 0;
}
prog = bpf_object__find_program_by_name(obj, "bpf_prog1");
if (!prog) {
printf("finding a prog in obj file failed\n");
goto cleanup;
}
/* load BPF program */
if (bpf_object__load(obj)) {
fprintf(stderr, "ERROR: loading BPF object file failed\n");
goto cleanup;
}
map_fd[0] = bpf_object__find_map_fd_by_name(obj, "cgroup_map");
map_fd[1] = bpf_object__find_map_fd_by_name(obj, "perf_map");
if (map_fd[0] < 0 || map_fd[1] < 0) {
fprintf(stderr, "ERROR: finding a map in obj file failed\n");
goto cleanup;
}
link = bpf_program__attach(prog);
if (libbpf_get_error(link)) {
fprintf(stderr, "ERROR: bpf_program__attach failed\n");
link = NULL;
goto cleanup;
}
if (setup_cgroup_environment())
@@ -70,12 +100,14 @@ int main(int argc, char **argv)
goto err;
}
goto out;
err:
rc = 1;
rc = 0;
out:
err:
close(cg2);
cleanup_cgroup_environment();
cleanup:
bpf_link__destroy(link);
bpf_object__close(obj);
return rc;
}

View File

@@ -103,10 +103,9 @@ static __always_inline int do_inline_hash_lookup(void *inner_map, u32 port)
return result ? *result : -ENOENT;
}
SEC("kprobe/" SYSCALL(sys_connect))
SEC("kprobe/__sys_connect")
int trace_sys_connect(struct pt_regs *ctx)
{
struct pt_regs *real_regs = (struct pt_regs *)PT_REGS_PARM1_CORE(ctx);
struct sockaddr_in6 *in6;
u16 test_case, port, dst6[8];
int addrlen, ret, inline_ret, ret_key = 0;
@@ -114,8 +113,8 @@ int trace_sys_connect(struct pt_regs *ctx)
void *outer_map, *inner_map;
bool inline_hash = false;
in6 = (struct sockaddr_in6 *)PT_REGS_PARM2_CORE(real_regs);
addrlen = (int)PT_REGS_PARM3_CORE(real_regs);
in6 = (struct sockaddr_in6 *)PT_REGS_PARM2_CORE(ctx);
addrlen = (int)PT_REGS_PARM3_CORE(ctx);
if (addrlen != sizeof(*in6))
return 0;

View File

@@ -13,12 +13,12 @@
#include <bpf/bpf_core_read.h>
#include "trace_common.h"
struct bpf_map_def SEC("maps") dnat_map = {
.type = BPF_MAP_TYPE_HASH,
.key_size = sizeof(struct sockaddr_in),
.value_size = sizeof(struct sockaddr_in),
.max_entries = 256,
};
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__type(key, struct sockaddr_in);
__type(value, struct sockaddr_in);
__uint(max_entries, 256);
} dnat_map SEC(".maps");
/* kprobe is NOT a stable ABI
* kernel functions can be removed, renamed or completely change semantics.

View File

@@ -1,21 +1,22 @@
// SPDX-License-Identifier: GPL-2.0
#include <stdio.h>
#include <assert.h>
#include <linux/bpf.h>
#include <unistd.h>
#include <bpf/bpf.h>
#include "bpf_load.h"
#include <bpf/libbpf.h>
#include <sys/socket.h>
#include <string.h>
#include <netinet/in.h>
#include <arpa/inet.h>
int main(int ac, char **argv)
{
int serverfd, serverconnfd, clientfd;
socklen_t sockaddr_len;
struct sockaddr serv_addr, mapped_addr, tmp_addr;
struct sockaddr_in *serv_addr_in, *mapped_addr_in, *tmp_addr_in;
struct sockaddr serv_addr, mapped_addr, tmp_addr;
int serverfd, serverconnfd, clientfd, map_fd;
struct bpf_link *link = NULL;
struct bpf_program *prog;
struct bpf_object *obj;
socklen_t sockaddr_len;
char filename[256];
char *ip;
@@ -24,10 +25,35 @@ int main(int ac, char **argv)
tmp_addr_in = (struct sockaddr_in *)&tmp_addr;
snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
obj = bpf_object__open_file(filename, NULL);
if (libbpf_get_error(obj)) {
fprintf(stderr, "ERROR: opening BPF object file failed\n");
return 0;
}
if (load_bpf_file(filename)) {
printf("%s", bpf_log_buf);
return 1;
prog = bpf_object__find_program_by_name(obj, "bpf_prog1");
if (libbpf_get_error(prog)) {
fprintf(stderr, "ERROR: finding a prog in obj file failed\n");
goto cleanup;
}
/* load BPF program */
if (bpf_object__load(obj)) {
fprintf(stderr, "ERROR: loading BPF object file failed\n");
goto cleanup;
}
map_fd = bpf_object__find_map_fd_by_name(obj, "dnat_map");
if (map_fd < 0) {
fprintf(stderr, "ERROR: finding a map in obj file failed\n");
goto cleanup;
}
link = bpf_program__attach(prog);
if (libbpf_get_error(link)) {
fprintf(stderr, "ERROR: bpf_program__attach failed\n");
link = NULL;
goto cleanup;
}
assert((serverfd = socket(AF_INET, SOCK_STREAM, 0)) > 0);
@@ -51,7 +77,7 @@ int main(int ac, char **argv)
mapped_addr_in->sin_port = htons(5555);
mapped_addr_in->sin_addr.s_addr = inet_addr("255.255.255.255");
assert(!bpf_map_update_elem(map_fd[0], &mapped_addr, &serv_addr, BPF_ANY));
assert(!bpf_map_update_elem(map_fd, &mapped_addr, &serv_addr, BPF_ANY));
assert(listen(serverfd, 5) == 0);
@@ -75,5 +101,8 @@ int main(int ac, char **argv)
/* Is the server's getsockname = the socket getpeername */
assert(memcmp(&serv_addr, &tmp_addr, sizeof(struct sockaddr_in)) == 0);
cleanup:
bpf_link__destroy(link);
bpf_object__close(obj);
return 0;
}

View File

@@ -2,15 +2,16 @@
#include <linux/version.h>
#include <uapi/linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include "trace_common.h"
struct bpf_map_def SEC("maps") my_map = {
.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(u32),
.max_entries = 2,
};
struct {
__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
__uint(key_size, sizeof(int));
__uint(value_size, sizeof(u32));
__uint(max_entries, 2);
} my_map SEC(".maps");
SEC("kprobe/sys_write")
SEC("kprobe/" SYSCALL(sys_write))
int bpf_prog1(struct pt_regs *ctx)
{
struct S {

View File

@@ -1,23 +1,10 @@
// SPDX-License-Identifier: GPL-2.0-only
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include <fcntl.h>
#include <poll.h>
#include <linux/perf_event.h>
#include <linux/bpf.h>
#include <errno.h>
#include <assert.h>
#include <sys/syscall.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <time.h>
#include <signal.h>
#include <bpf/libbpf.h>
#include "bpf_load.h"
#include "perf-sys.h"
static __u64 time_get_ns(void)
{
@@ -57,20 +44,48 @@ static void print_bpf_output(void *ctx, int cpu, void *data, __u32 size)
int main(int argc, char **argv)
{
struct perf_buffer_opts pb_opts = {};
struct bpf_link *link = NULL;
struct bpf_program *prog;
struct perf_buffer *pb;
struct bpf_object *obj;
int map_fd, ret = 0;
char filename[256];
FILE *f;
int ret;
snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
obj = bpf_object__open_file(filename, NULL);
if (libbpf_get_error(obj)) {
fprintf(stderr, "ERROR: opening BPF object file failed\n");
return 0;
}
if (load_bpf_file(filename)) {
printf("%s", bpf_log_buf);
return 1;
/* load BPF program */
if (bpf_object__load(obj)) {
fprintf(stderr, "ERROR: loading BPF object file failed\n");
goto cleanup;
}
map_fd = bpf_object__find_map_fd_by_name(obj, "my_map");
if (map_fd < 0) {
fprintf(stderr, "ERROR: finding a map in obj file failed\n");
goto cleanup;
}
prog = bpf_object__find_program_by_name(obj, "bpf_prog1");
if (libbpf_get_error(prog)) {
fprintf(stderr, "ERROR: finding a prog in obj file failed\n");
goto cleanup;
}
link = bpf_program__attach(prog);
if (libbpf_get_error(link)) {
fprintf(stderr, "ERROR: bpf_program__attach failed\n");
link = NULL;
goto cleanup;
}
pb_opts.sample_cb = print_bpf_output;
pb = perf_buffer__new(map_fd[0], 8, &pb_opts);
pb = perf_buffer__new(map_fd, 8, &pb_opts);
ret = libbpf_get_error(pb);
if (ret) {
printf("failed to setup perf_buffer: %d\n", ret);
@@ -84,5 +99,9 @@ int main(int argc, char **argv)
while ((ret = perf_buffer__poll(pb, 1000)) >= 0 && cnt < MAX_CNT) {
}
kill(0, SIGINT);
cleanup:
bpf_link__destroy(link);
bpf_object__close(obj);
return ret;
}

View File

@@ -49,7 +49,7 @@ struct {
__uint(max_entries, SLOTS);
} lat_map SEC(".maps");
SEC("kprobe/blk_account_io_completion")
SEC("kprobe/blk_account_io_done")
int bpf_prog2(struct pt_regs *ctx)
{
long rq = PT_REGS_PARM1(ctx);

View File

@@ -39,8 +39,8 @@ int main(int ac, char **argv)
struct bpf_program *prog;
struct bpf_object *obj;
int key, fd, progs_fd;
const char *section;
char filename[256];
const char *title;
FILE *f;
setrlimit(RLIMIT_MEMLOCK, &r);
@@ -78,9 +78,9 @@ int main(int ac, char **argv)
}
bpf_object__for_each_program(prog, obj) {
title = bpf_program__title(prog, false);
section = bpf_program__section_name(prog);
/* register only syscalls to PROG_ARRAY */
if (sscanf(title, "kprobe/%d", &key) != 1)
if (sscanf(section, "kprobe/%d", &key) != 1)
continue;
fd = bpf_program__fd(prog);

View File

@@ -6,21 +6,21 @@
#include <uapi/linux/bpf.h>
#include <bpf/bpf_helpers.h>
struct bpf_map_def SEC("maps") redirect_err_cnt = {
.type = BPF_MAP_TYPE_PERCPU_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(u64),
.max_entries = 2,
struct {
__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
__type(key, u32);
__type(value, u64);
__uint(max_entries, 2);
/* TODO: have entries for all possible errno's */
};
} redirect_err_cnt SEC(".maps");
#define XDP_UNKNOWN XDP_REDIRECT + 1
struct bpf_map_def SEC("maps") exception_cnt = {
.type = BPF_MAP_TYPE_PERCPU_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(u64),
.max_entries = XDP_UNKNOWN + 1,
};
struct {
__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
__type(key, u32);
__type(value, u64);
__uint(max_entries, XDP_UNKNOWN + 1);
} exception_cnt SEC(".maps");
/* Tracepoint format: /sys/kernel/debug/tracing/events/xdp/xdp_redirect/format
* Code in: kernel/include/trace/events/xdp.h
@@ -129,19 +129,19 @@ struct datarec {
};
#define MAX_CPUS 64
struct bpf_map_def SEC("maps") cpumap_enqueue_cnt = {
.type = BPF_MAP_TYPE_PERCPU_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(struct datarec),
.max_entries = MAX_CPUS,
};
struct {
__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
__type(key, u32);
__type(value, struct datarec);
__uint(max_entries, MAX_CPUS);
} cpumap_enqueue_cnt SEC(".maps");
struct bpf_map_def SEC("maps") cpumap_kthread_cnt = {
.type = BPF_MAP_TYPE_PERCPU_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(struct datarec),
.max_entries = 1,
};
struct {
__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
__type(key, u32);
__type(value, struct datarec);
__uint(max_entries, 1);
} cpumap_kthread_cnt SEC(".maps");
/* Tracepoint: /sys/kernel/debug/tracing/events/xdp/xdp_cpumap_enqueue/format
* Code in: kernel/include/trace/events/xdp.h
@@ -210,12 +210,12 @@ int trace_xdp_cpumap_kthread(struct cpumap_kthread_ctx *ctx)
return 0;
}
struct bpf_map_def SEC("maps") devmap_xmit_cnt = {
.type = BPF_MAP_TYPE_PERCPU_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(struct datarec),
.max_entries = 1,
};
struct {
__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
__type(key, u32);
__type(value, struct datarec);
__uint(max_entries, 1);
} devmap_xmit_cnt SEC(".maps");
/* Tracepoint: /sys/kernel/debug/tracing/events/xdp/xdp_devmap_xmit/format
* Code in: kernel/include/trace/events/xdp.h

View File

@@ -26,12 +26,37 @@ static const char *__doc_err_only__=
#include <net/if.h>
#include <time.h>
#include <signal.h>
#include <bpf/bpf.h>
#include "bpf_load.h"
#include <bpf/libbpf.h>
#include "bpf_util.h"
enum map_type {
REDIRECT_ERR_CNT,
EXCEPTION_CNT,
CPUMAP_ENQUEUE_CNT,
CPUMAP_KTHREAD_CNT,
DEVMAP_XMIT_CNT,
};
static const char *const map_type_strings[] = {
[REDIRECT_ERR_CNT] = "redirect_err_cnt",
[EXCEPTION_CNT] = "exception_cnt",
[CPUMAP_ENQUEUE_CNT] = "cpumap_enqueue_cnt",
[CPUMAP_KTHREAD_CNT] = "cpumap_kthread_cnt",
[DEVMAP_XMIT_CNT] = "devmap_xmit_cnt",
};
#define NUM_MAP 5
#define NUM_TP 8
static int tp_cnt;
static int map_cnt;
static int verbose = 1;
static bool debug = false;
struct bpf_map *map_data[NUM_MAP] = {};
struct bpf_link *tp_links[NUM_TP] = {};
struct bpf_object *obj;
static const struct option long_options[] = {
{"help", no_argument, NULL, 'h' },
@@ -41,6 +66,16 @@ static const struct option long_options[] = {
{0, 0, NULL, 0 }
};
static void int_exit(int sig)
{
/* Detach tracepoints */
while (tp_cnt)
bpf_link__destroy(tp_links[--tp_cnt]);
bpf_object__close(obj);
exit(0);
}
/* C standard specifies two constants, EXIT_SUCCESS(0) and EXIT_FAILURE(1) */
#define EXIT_FAIL_MEM 5
@@ -483,23 +518,23 @@ static bool stats_collect(struct stats_record *rec)
* this can happen by someone running perf-record -e
*/
fd = map_data[0].fd; /* map0: redirect_err_cnt */
fd = bpf_map__fd(map_data[REDIRECT_ERR_CNT]);
for (i = 0; i < REDIR_RES_MAX; i++)
map_collect_record_u64(fd, i, &rec->xdp_redirect[i]);
fd = map_data[1].fd; /* map1: exception_cnt */
fd = bpf_map__fd(map_data[EXCEPTION_CNT]);
for (i = 0; i < XDP_ACTION_MAX; i++) {
map_collect_record_u64(fd, i, &rec->xdp_exception[i]);
}
fd = map_data[2].fd; /* map2: cpumap_enqueue_cnt */
fd = bpf_map__fd(map_data[CPUMAP_ENQUEUE_CNT]);
for (i = 0; i < MAX_CPUS; i++)
map_collect_record(fd, i, &rec->xdp_cpumap_enqueue[i]);
fd = map_data[3].fd; /* map3: cpumap_kthread_cnt */
fd = bpf_map__fd(map_data[CPUMAP_KTHREAD_CNT]);
map_collect_record(fd, 0, &rec->xdp_cpumap_kthread);
fd = map_data[4].fd; /* map4: devmap_xmit_cnt */
fd = bpf_map__fd(map_data[DEVMAP_XMIT_CNT]);
map_collect_record(fd, 0, &rec->xdp_devmap_xmit);
return true;
@@ -598,8 +633,8 @@ static void stats_poll(int interval, bool err_only)
/* TODO Need more advanced stats on error types */
if (verbose) {
printf(" - Stats map0: %s\n", map_data[0].name);
printf(" - Stats map1: %s\n", map_data[1].name);
printf(" - Stats map0: %s\n", bpf_map__name(map_data[0]));
printf(" - Stats map1: %s\n", bpf_map__name(map_data[1]));
printf("\n");
}
fflush(stdout);
@@ -618,44 +653,51 @@ static void stats_poll(int interval, bool err_only)
static void print_bpf_prog_info(void)
{
int i;
struct bpf_program *prog;
struct bpf_map *map;
int i = 0;
/* Prog info */
printf("Loaded BPF prog have %d bpf program(s)\n", prog_cnt);
for (i = 0; i < prog_cnt; i++) {
printf(" - prog_fd[%d] = fd(%d)\n", i, prog_fd[i]);
printf("Loaded BPF prog have %d bpf program(s)\n", tp_cnt);
bpf_object__for_each_program(prog, obj) {
printf(" - prog_fd[%d] = fd(%d)\n", i, bpf_program__fd(prog));
i++;
}
i = 0;
/* Maps info */
printf("Loaded BPF prog have %d map(s)\n", map_data_count);
for (i = 0; i < map_data_count; i++) {
char *name = map_data[i].name;
int fd = map_data[i].fd;
printf("Loaded BPF prog have %d map(s)\n", map_cnt);
bpf_object__for_each_map(map, obj) {
const char *name = bpf_map__name(map);
int fd = bpf_map__fd(map);
printf(" - map_data[%d] = fd(%d) name:%s\n", i, fd, name);
i++;
}
/* Event info */
printf("Searching for (max:%d) event file descriptor(s)\n", prog_cnt);
for (i = 0; i < prog_cnt; i++) {
if (event_fd[i] != -1)
printf(" - event_fd[%d] = fd(%d)\n", i, event_fd[i]);
printf("Searching for (max:%d) event file descriptor(s)\n", tp_cnt);
for (i = 0; i < tp_cnt; i++) {
int fd = bpf_link__fd(tp_links[i]);
if (fd != -1)
printf(" - event_fd[%d] = fd(%d)\n", i, fd);
}
}
int main(int argc, char **argv)
{
struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
struct bpf_program *prog;
int longindex = 0, opt;
int ret = EXIT_SUCCESS;
char bpf_obj_file[256];
int ret = EXIT_FAILURE;
enum map_type type;
char filename[256];
/* Default settings: */
bool errors_only = true;
int interval = 2;
snprintf(bpf_obj_file, sizeof(bpf_obj_file), "%s_kern.o", argv[0]);
/* Parse commands line args */
while ((opt = getopt_long(argc, argv, "hDSs:",
long_options, &longindex)) != -1) {
@@ -672,40 +714,79 @@ int main(int argc, char **argv)
case 'h':
default:
usage(argv);
return EXIT_FAILURE;
return ret;
}
}
snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
if (setrlimit(RLIMIT_MEMLOCK, &r)) {
perror("setrlimit(RLIMIT_MEMLOCK)");
return EXIT_FAILURE;
return ret;
}
if (load_bpf_file(bpf_obj_file)) {
printf("ERROR - bpf_log_buf: %s", bpf_log_buf);
return EXIT_FAILURE;
/* Remove tracepoint program when program is interrupted or killed */
signal(SIGINT, int_exit);
signal(SIGTERM, int_exit);
obj = bpf_object__open_file(filename, NULL);
if (libbpf_get_error(obj)) {
printf("ERROR: opening BPF object file failed\n");
obj = NULL;
goto cleanup;
}
if (!prog_fd[0]) {
printf("ERROR - load_bpf_file: %s\n", strerror(errno));
return EXIT_FAILURE;
/* load BPF program */
if (bpf_object__load(obj)) {
printf("ERROR: loading BPF object file failed\n");
goto cleanup;
}
for (type = 0; type < NUM_MAP; type++) {
map_data[type] =
bpf_object__find_map_by_name(obj, map_type_strings[type]);
if (libbpf_get_error(map_data[type])) {
printf("ERROR: finding a map in obj file failed\n");
goto cleanup;
}
map_cnt++;
}
bpf_object__for_each_program(prog, obj) {
tp_links[tp_cnt] = bpf_program__attach(prog);
if (libbpf_get_error(tp_links[tp_cnt])) {
printf("ERROR: bpf_program__attach failed\n");
tp_links[tp_cnt] = NULL;
goto cleanup;
}
tp_cnt++;
}
if (debug) {
print_bpf_prog_info();
}
/* Unload/stop tracepoint event by closing fd's */
/* Unload/stop tracepoint event by closing bpf_link's */
if (errors_only) {
/* The prog_fd[i] and event_fd[i] depend on the
* order the functions was defined in _kern.c
/* The bpf_link[i] depend on the order of
* the functions was defined in _kern.c
*/
close(event_fd[2]); /* tracepoint/xdp/xdp_redirect */
close(prog_fd[2]); /* func: trace_xdp_redirect */
close(event_fd[3]); /* tracepoint/xdp/xdp_redirect_map */
close(prog_fd[3]); /* func: trace_xdp_redirect_map */
bpf_link__destroy(tp_links[2]); /* tracepoint/xdp/xdp_redirect */
tp_links[2] = NULL;
bpf_link__destroy(tp_links[3]); /* tracepoint/xdp/xdp_redirect_map */
tp_links[3] = NULL;
}
stats_poll(interval, errors_only);
ret = EXIT_SUCCESS;
cleanup:
/* Detach tracepoints */
while (tp_cnt)
bpf_link__destroy(tp_links[--tp_cnt]);
bpf_object__close(obj);
return ret;
}

View File

@@ -37,18 +37,35 @@ static __u32 prog_id;
static __u32 xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
static int n_cpus;
static int cpu_map_fd;
static int rx_cnt_map_fd;
static int redirect_err_cnt_map_fd;
static int cpumap_enqueue_cnt_map_fd;
static int cpumap_kthread_cnt_map_fd;
static int cpus_available_map_fd;
static int cpus_count_map_fd;
static int cpus_iterator_map_fd;
static int exception_cnt_map_fd;
enum map_type {
CPU_MAP,
RX_CNT,
REDIRECT_ERR_CNT,
CPUMAP_ENQUEUE_CNT,
CPUMAP_KTHREAD_CNT,
CPUS_AVAILABLE,
CPUS_COUNT,
CPUS_ITERATOR,
EXCEPTION_CNT,
};
static const char *const map_type_strings[] = {
[CPU_MAP] = "cpu_map",
[RX_CNT] = "rx_cnt",
[REDIRECT_ERR_CNT] = "redirect_err_cnt",
[CPUMAP_ENQUEUE_CNT] = "cpumap_enqueue_cnt",
[CPUMAP_KTHREAD_CNT] = "cpumap_kthread_cnt",
[CPUS_AVAILABLE] = "cpus_available",
[CPUS_COUNT] = "cpus_count",
[CPUS_ITERATOR] = "cpus_iterator",
[EXCEPTION_CNT] = "exception_cnt",
};
#define NUM_TP 5
struct bpf_link *tp_links[NUM_TP] = { 0 };
#define NUM_MAP 9
struct bpf_link *tp_links[NUM_TP] = {};
static int map_fds[NUM_MAP];
static int tp_cnt = 0;
/* Exit return codes */
@@ -111,7 +128,7 @@ static void print_avail_progs(struct bpf_object *obj)
bpf_object__for_each_program(pos, obj) {
if (bpf_program__is_xdp(pos))
printf(" %s\n", bpf_program__title(pos, false));
printf(" %s\n", bpf_program__section_name(pos));
}
}
@@ -527,20 +544,20 @@ static void stats_collect(struct stats_record *rec)
{
int fd, i;
fd = rx_cnt_map_fd;
fd = map_fds[RX_CNT];
map_collect_percpu(fd, 0, &rec->rx_cnt);
fd = redirect_err_cnt_map_fd;
fd = map_fds[REDIRECT_ERR_CNT];
map_collect_percpu(fd, 1, &rec->redir_err);
fd = cpumap_enqueue_cnt_map_fd;
fd = map_fds[CPUMAP_ENQUEUE_CNT];
for (i = 0; i < n_cpus; i++)
map_collect_percpu(fd, i, &rec->enq[i]);
fd = cpumap_kthread_cnt_map_fd;
fd = map_fds[CPUMAP_KTHREAD_CNT];
map_collect_percpu(fd, 0, &rec->kthread);
fd = exception_cnt_map_fd;
fd = map_fds[EXCEPTION_CNT];
map_collect_percpu(fd, 0, &rec->exception);
}
@@ -565,7 +582,7 @@ static int create_cpu_entry(__u32 cpu, struct bpf_cpumap_val *value,
/* Add a CPU entry to cpumap, as this allocate a cpu entry in
* the kernel for the cpu.
*/
ret = bpf_map_update_elem(cpu_map_fd, &cpu, value, 0);
ret = bpf_map_update_elem(map_fds[CPU_MAP], &cpu, value, 0);
if (ret) {
fprintf(stderr, "Create CPU entry failed (err:%d)\n", ret);
exit(EXIT_FAIL_BPF);
@@ -574,21 +591,21 @@ static int create_cpu_entry(__u32 cpu, struct bpf_cpumap_val *value,
/* Inform bpf_prog's that a new CPU is available to select
* from via some control maps.
*/
ret = bpf_map_update_elem(cpus_available_map_fd, &avail_idx, &cpu, 0);
ret = bpf_map_update_elem(map_fds[CPUS_AVAILABLE], &avail_idx, &cpu, 0);
if (ret) {
fprintf(stderr, "Add to avail CPUs failed\n");
exit(EXIT_FAIL_BPF);
}
/* When not replacing/updating existing entry, bump the count */
ret = bpf_map_lookup_elem(cpus_count_map_fd, &key, &curr_cpus_count);
ret = bpf_map_lookup_elem(map_fds[CPUS_COUNT], &key, &curr_cpus_count);
if (ret) {
fprintf(stderr, "Failed reading curr cpus_count\n");
exit(EXIT_FAIL_BPF);
}
if (new) {
curr_cpus_count++;
ret = bpf_map_update_elem(cpus_count_map_fd, &key,
ret = bpf_map_update_elem(map_fds[CPUS_COUNT], &key,
&curr_cpus_count, 0);
if (ret) {
fprintf(stderr, "Failed write curr cpus_count\n");
@@ -612,7 +629,7 @@ static void mark_cpus_unavailable(void)
int ret, i;
for (i = 0; i < n_cpus; i++) {
ret = bpf_map_update_elem(cpus_available_map_fd, &i,
ret = bpf_map_update_elem(map_fds[CPUS_AVAILABLE], &i,
&invalid_cpu, 0);
if (ret) {
fprintf(stderr, "Failed marking CPU unavailable\n");
@@ -665,68 +682,37 @@ static void stats_poll(int interval, bool use_separators, char *prog_name,
free_stats_record(prev);
}
static struct bpf_link * attach_tp(struct bpf_object *obj,
const char *tp_category,
const char* tp_name)
static int init_tracepoints(struct bpf_object *obj)
{
struct bpf_program *prog;
struct bpf_link *link;
char sec_name[PATH_MAX];
int len;
len = snprintf(sec_name, PATH_MAX, "tracepoint/%s/%s",
tp_category, tp_name);
if (len < 0)
exit(EXIT_FAIL);
bpf_object__for_each_program(prog, obj) {
if (bpf_program__is_tracepoint(prog) != true)
continue;
prog = bpf_object__find_program_by_title(obj, sec_name);
if (!prog) {
fprintf(stderr, "ERR: finding progsec: %s\n", sec_name);
exit(EXIT_FAIL_BPF);
tp_links[tp_cnt] = bpf_program__attach(prog);
if (libbpf_get_error(tp_links[tp_cnt])) {
tp_links[tp_cnt] = NULL;
return -EINVAL;
}
tp_cnt++;
}
link = bpf_program__attach_tracepoint(prog, tp_category, tp_name);
if (libbpf_get_error(link))
exit(EXIT_FAIL_BPF);
return link;
}
static void init_tracepoints(struct bpf_object *obj) {
tp_links[tp_cnt++] = attach_tp(obj, "xdp", "xdp_redirect_err");
tp_links[tp_cnt++] = attach_tp(obj, "xdp", "xdp_redirect_map_err");
tp_links[tp_cnt++] = attach_tp(obj, "xdp", "xdp_exception");
tp_links[tp_cnt++] = attach_tp(obj, "xdp", "xdp_cpumap_enqueue");
tp_links[tp_cnt++] = attach_tp(obj, "xdp", "xdp_cpumap_kthread");
return 0;
}
static int init_map_fds(struct bpf_object *obj)
{
/* Maps updated by tracepoints */
redirect_err_cnt_map_fd =
bpf_object__find_map_fd_by_name(obj, "redirect_err_cnt");
exception_cnt_map_fd =
bpf_object__find_map_fd_by_name(obj, "exception_cnt");
cpumap_enqueue_cnt_map_fd =
bpf_object__find_map_fd_by_name(obj, "cpumap_enqueue_cnt");
cpumap_kthread_cnt_map_fd =
bpf_object__find_map_fd_by_name(obj, "cpumap_kthread_cnt");
enum map_type type;
/* Maps used by XDP */
rx_cnt_map_fd = bpf_object__find_map_fd_by_name(obj, "rx_cnt");
cpu_map_fd = bpf_object__find_map_fd_by_name(obj, "cpu_map");
cpus_available_map_fd =
bpf_object__find_map_fd_by_name(obj, "cpus_available");
cpus_count_map_fd = bpf_object__find_map_fd_by_name(obj, "cpus_count");
cpus_iterator_map_fd =
bpf_object__find_map_fd_by_name(obj, "cpus_iterator");
for (type = 0; type < NUM_MAP; type++) {
map_fds[type] =
bpf_object__find_map_fd_by_name(obj,
map_type_strings[type]);
if (cpu_map_fd < 0 || rx_cnt_map_fd < 0 ||
redirect_err_cnt_map_fd < 0 || cpumap_enqueue_cnt_map_fd < 0 ||
cpumap_kthread_cnt_map_fd < 0 || cpus_available_map_fd < 0 ||
cpus_count_map_fd < 0 || cpus_iterator_map_fd < 0 ||
exception_cnt_map_fd < 0)
return -ENOENT;
if (map_fds[type] < 0)
return -ENOENT;
}
return 0;
}
@@ -795,13 +781,13 @@ int main(int argc, char **argv)
bool stress_mode = false;
struct bpf_program *prog;
struct bpf_object *obj;
int err = EXIT_FAIL;
char filename[256];
int added_cpus = 0;
int longindex = 0;
int interval = 2;
int add_cpu = -1;
int opt, err;
int prog_fd;
int opt, prog_fd;
int *cpu, i;
__u32 qsize;
@@ -824,24 +810,29 @@ int main(int argc, char **argv)
}
if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
return EXIT_FAIL;
return err;
if (prog_fd < 0) {
fprintf(stderr, "ERR: bpf_prog_load_xattr: %s\n",
strerror(errno));
return EXIT_FAIL;
return err;
}
init_tracepoints(obj);
if (init_tracepoints(obj) < 0) {
fprintf(stderr, "ERR: bpf_program__attach failed\n");
return err;
}
if (init_map_fds(obj) < 0) {
fprintf(stderr, "bpf_object__find_map_fd_by_name failed\n");
return EXIT_FAIL;
return err;
}
mark_cpus_unavailable();
cpu = malloc(n_cpus * sizeof(int));
if (!cpu) {
fprintf(stderr, "failed to allocate cpu array\n");
return EXIT_FAIL;
return err;
}
memset(cpu, 0, n_cpus * sizeof(int));
@@ -960,14 +951,12 @@ int main(int argc, char **argv)
prog = bpf_object__find_program_by_title(obj, prog_name);
if (!prog) {
fprintf(stderr, "bpf_object__find_program_by_title failed\n");
err = EXIT_FAIL;
goto out;
}
prog_fd = bpf_program__fd(prog);
if (prog_fd < 0) {
fprintf(stderr, "bpf_program__fd failed\n");
err = EXIT_FAIL;
goto out;
}
@@ -986,6 +975,8 @@ int main(int argc, char **argv)
stats_poll(interval, use_separators, prog_name, mprog_name,
&value, stress_mode);
err = EXIT_OK;
out:
free(cpu);
return err;

View File

@@ -5,14 +5,12 @@
#include <bpf/bpf_helpers.h>
#define SAMPLE_SIZE 64ul
#define MAX_CPUS 128
struct bpf_map_def SEC("maps") my_map = {
.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(u32),
.max_entries = MAX_CPUS,
};
struct {
__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
__uint(key_size, sizeof(int));
__uint(value_size, sizeof(u32));
} my_map SEC(".maps");
SEC("xdp_sample")
int xdp_sample_prog(struct xdp_md *ctx)

View File

@@ -18,7 +18,6 @@
#include "perf-sys.h"
#define MAX_CPUS 128
static int if_idx;
static char *if_name;
static __u32 xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;

View File

@@ -11,6 +11,7 @@
#include <linux/if_xdp.h>
#include <linux/if_ether.h>
#include <linux/ip.h>
#include <linux/limits.h>
#include <linux/udp.h>
#include <arpa/inet.h>
#include <locale.h>
@@ -78,6 +79,11 @@ static int opt_pkt_count;
static u16 opt_pkt_size = MIN_PKT_SIZE;
static u32 opt_pkt_fill_pattern = 0x12345678;
static bool opt_extra_stats;
static bool opt_quiet;
static bool opt_app_stats;
static const char *opt_irq_str = "";
static u32 irq_no;
static int irqs_at_init = -1;
static int opt_poll;
static int opt_interval = 1;
static u32 opt_xdp_bind_flags = XDP_USE_NEED_WAKEUP;
@@ -90,18 +96,7 @@ static bool opt_need_wakeup = true;
static u32 opt_num_xsks = 1;
static u32 prog_id;
struct xsk_umem_info {
struct xsk_ring_prod fq;
struct xsk_ring_cons cq;
struct xsk_umem *umem;
void *buffer;
};
struct xsk_socket_info {
struct xsk_ring_cons rx;
struct xsk_ring_prod tx;
struct xsk_umem_info *umem;
struct xsk_socket *xsk;
struct xsk_ring_stats {
unsigned long rx_npkts;
unsigned long tx_npkts;
unsigned long rx_dropped_npkts;
@@ -118,6 +113,41 @@ struct xsk_socket_info {
unsigned long prev_rx_full_npkts;
unsigned long prev_rx_fill_empty_npkts;
unsigned long prev_tx_empty_npkts;
};
struct xsk_driver_stats {
unsigned long intrs;
unsigned long prev_intrs;
};
struct xsk_app_stats {
unsigned long rx_empty_polls;
unsigned long fill_fail_polls;
unsigned long copy_tx_sendtos;
unsigned long tx_wakeup_sendtos;
unsigned long opt_polls;
unsigned long prev_rx_empty_polls;
unsigned long prev_fill_fail_polls;
unsigned long prev_copy_tx_sendtos;
unsigned long prev_tx_wakeup_sendtos;
unsigned long prev_opt_polls;
};
struct xsk_umem_info {
struct xsk_ring_prod fq;
struct xsk_ring_cons cq;
struct xsk_umem *umem;
void *buffer;
};
struct xsk_socket_info {
struct xsk_ring_cons rx;
struct xsk_ring_prod tx;
struct xsk_umem_info *umem;
struct xsk_socket *xsk;
struct xsk_ring_stats ring_stats;
struct xsk_app_stats app_stats;
struct xsk_driver_stats drv_stats;
u32 outstanding_tx;
};
@@ -172,18 +202,151 @@ static int xsk_get_xdp_stats(int fd, struct xsk_socket_info *xsk)
return err;
if (optlen == sizeof(struct xdp_statistics)) {
xsk->rx_dropped_npkts = stats.rx_dropped;
xsk->rx_invalid_npkts = stats.rx_invalid_descs;
xsk->tx_invalid_npkts = stats.tx_invalid_descs;
xsk->rx_full_npkts = stats.rx_ring_full;
xsk->rx_fill_empty_npkts = stats.rx_fill_ring_empty_descs;
xsk->tx_empty_npkts = stats.tx_ring_empty_descs;
xsk->ring_stats.rx_dropped_npkts = stats.rx_dropped;
xsk->ring_stats.rx_invalid_npkts = stats.rx_invalid_descs;
xsk->ring_stats.tx_invalid_npkts = stats.tx_invalid_descs;
xsk->ring_stats.rx_full_npkts = stats.rx_ring_full;
xsk->ring_stats.rx_fill_empty_npkts = stats.rx_fill_ring_empty_descs;
xsk->ring_stats.tx_empty_npkts = stats.tx_ring_empty_descs;
return 0;
}
return -EINVAL;
}
static void dump_app_stats(long dt)
{
int i;
for (i = 0; i < num_socks && xsks[i]; i++) {
char *fmt = "%-18s %'-14.0f %'-14lu\n";
double rx_empty_polls_ps, fill_fail_polls_ps, copy_tx_sendtos_ps,
tx_wakeup_sendtos_ps, opt_polls_ps;
rx_empty_polls_ps = (xsks[i]->app_stats.rx_empty_polls -
xsks[i]->app_stats.prev_rx_empty_polls) * 1000000000. / dt;
fill_fail_polls_ps = (xsks[i]->app_stats.fill_fail_polls -
xsks[i]->app_stats.prev_fill_fail_polls) * 1000000000. / dt;
copy_tx_sendtos_ps = (xsks[i]->app_stats.copy_tx_sendtos -
xsks[i]->app_stats.prev_copy_tx_sendtos) * 1000000000. / dt;
tx_wakeup_sendtos_ps = (xsks[i]->app_stats.tx_wakeup_sendtos -
xsks[i]->app_stats.prev_tx_wakeup_sendtos)
* 1000000000. / dt;
opt_polls_ps = (xsks[i]->app_stats.opt_polls -
xsks[i]->app_stats.prev_opt_polls) * 1000000000. / dt;
printf("\n%-18s %-14s %-14s\n", "", "calls/s", "count");
printf(fmt, "rx empty polls", rx_empty_polls_ps, xsks[i]->app_stats.rx_empty_polls);
printf(fmt, "fill fail polls", fill_fail_polls_ps,
xsks[i]->app_stats.fill_fail_polls);
printf(fmt, "copy tx sendtos", copy_tx_sendtos_ps,
xsks[i]->app_stats.copy_tx_sendtos);
printf(fmt, "tx wakeup sendtos", tx_wakeup_sendtos_ps,
xsks[i]->app_stats.tx_wakeup_sendtos);
printf(fmt, "opt polls", opt_polls_ps, xsks[i]->app_stats.opt_polls);
xsks[i]->app_stats.prev_rx_empty_polls = xsks[i]->app_stats.rx_empty_polls;
xsks[i]->app_stats.prev_fill_fail_polls = xsks[i]->app_stats.fill_fail_polls;
xsks[i]->app_stats.prev_copy_tx_sendtos = xsks[i]->app_stats.copy_tx_sendtos;
xsks[i]->app_stats.prev_tx_wakeup_sendtos = xsks[i]->app_stats.tx_wakeup_sendtos;
xsks[i]->app_stats.prev_opt_polls = xsks[i]->app_stats.opt_polls;
}
}
static bool get_interrupt_number(void)
{
FILE *f_int_proc;
char line[4096];
bool found = false;
f_int_proc = fopen("/proc/interrupts", "r");
if (f_int_proc == NULL) {
printf("Failed to open /proc/interrupts.\n");
return found;
}
while (!feof(f_int_proc) && !found) {
/* Make sure to read a full line at a time */
if (fgets(line, sizeof(line), f_int_proc) == NULL ||
line[strlen(line) - 1] != '\n') {
printf("Error reading from interrupts file\n");
break;
}
/* Extract interrupt number from line */
if (strstr(line, opt_irq_str) != NULL) {
irq_no = atoi(line);
found = true;
break;
}
}
fclose(f_int_proc);
return found;
}
static int get_irqs(void)
{
char count_path[PATH_MAX];
int total_intrs = -1;
FILE *f_count_proc;
char line[4096];
snprintf(count_path, sizeof(count_path),
"/sys/kernel/irq/%i/per_cpu_count", irq_no);
f_count_proc = fopen(count_path, "r");
if (f_count_proc == NULL) {
printf("Failed to open %s\n", count_path);
return total_intrs;
}
if (fgets(line, sizeof(line), f_count_proc) == NULL ||
line[strlen(line) - 1] != '\n') {
printf("Error reading from %s\n", count_path);
} else {
static const char com[2] = ",";
char *token;
total_intrs = 0;
token = strtok(line, com);
while (token != NULL) {
/* sum up interrupts across all cores */
total_intrs += atoi(token);
token = strtok(NULL, com);
}
}
fclose(f_count_proc);
return total_intrs;
}
static void dump_driver_stats(long dt)
{
int i;
for (i = 0; i < num_socks && xsks[i]; i++) {
char *fmt = "%-18s %'-14.0f %'-14lu\n";
double intrs_ps;
int n_ints = get_irqs();
if (n_ints < 0) {
printf("error getting intr info for intr %i\n", irq_no);
return;
}
xsks[i]->drv_stats.intrs = n_ints - irqs_at_init;
intrs_ps = (xsks[i]->drv_stats.intrs - xsks[i]->drv_stats.prev_intrs) *
1000000000. / dt;
printf("\n%-18s %-14s %-14s\n", "", "intrs/s", "count");
printf(fmt, "irqs", intrs_ps, xsks[i]->drv_stats.intrs);
xsks[i]->drv_stats.prev_intrs = xsks[i]->drv_stats.intrs;
}
}
static void dump_stats(void)
{
unsigned long now = get_nsecs();
@@ -193,67 +356,83 @@ static void dump_stats(void)
prev_time = now;
for (i = 0; i < num_socks && xsks[i]; i++) {
char *fmt = "%-15s %'-11.0f %'-11lu\n";
char *fmt = "%-18s %'-14.0f %'-14lu\n";
double rx_pps, tx_pps, dropped_pps, rx_invalid_pps, full_pps, fill_empty_pps,
tx_invalid_pps, tx_empty_pps;
rx_pps = (xsks[i]->rx_npkts - xsks[i]->prev_rx_npkts) *
rx_pps = (xsks[i]->ring_stats.rx_npkts - xsks[i]->ring_stats.prev_rx_npkts) *
1000000000. / dt;
tx_pps = (xsks[i]->tx_npkts - xsks[i]->prev_tx_npkts) *
tx_pps = (xsks[i]->ring_stats.tx_npkts - xsks[i]->ring_stats.prev_tx_npkts) *
1000000000. / dt;
printf("\n sock%d@", i);
print_benchmark(false);
printf("\n");
printf("%-15s %-11s %-11s %-11.2f\n", "", "pps", "pkts",
printf("%-18s %-14s %-14s %-14.2f\n", "", "pps", "pkts",
dt / 1000000000.);
printf(fmt, "rx", rx_pps, xsks[i]->rx_npkts);
printf(fmt, "tx", tx_pps, xsks[i]->tx_npkts);
printf(fmt, "rx", rx_pps, xsks[i]->ring_stats.rx_npkts);
printf(fmt, "tx", tx_pps, xsks[i]->ring_stats.tx_npkts);
xsks[i]->prev_rx_npkts = xsks[i]->rx_npkts;
xsks[i]->prev_tx_npkts = xsks[i]->tx_npkts;
xsks[i]->ring_stats.prev_rx_npkts = xsks[i]->ring_stats.rx_npkts;
xsks[i]->ring_stats.prev_tx_npkts = xsks[i]->ring_stats.tx_npkts;
if (opt_extra_stats) {
if (!xsk_get_xdp_stats(xsk_socket__fd(xsks[i]->xsk), xsks[i])) {
dropped_pps = (xsks[i]->rx_dropped_npkts -
xsks[i]->prev_rx_dropped_npkts) * 1000000000. / dt;
rx_invalid_pps = (xsks[i]->rx_invalid_npkts -
xsks[i]->prev_rx_invalid_npkts) * 1000000000. / dt;
tx_invalid_pps = (xsks[i]->tx_invalid_npkts -
xsks[i]->prev_tx_invalid_npkts) * 1000000000. / dt;
full_pps = (xsks[i]->rx_full_npkts -
xsks[i]->prev_rx_full_npkts) * 1000000000. / dt;
fill_empty_pps = (xsks[i]->rx_fill_empty_npkts -
xsks[i]->prev_rx_fill_empty_npkts)
* 1000000000. / dt;
tx_empty_pps = (xsks[i]->tx_empty_npkts -
xsks[i]->prev_tx_empty_npkts) * 1000000000. / dt;
dropped_pps = (xsks[i]->ring_stats.rx_dropped_npkts -
xsks[i]->ring_stats.prev_rx_dropped_npkts) *
1000000000. / dt;
rx_invalid_pps = (xsks[i]->ring_stats.rx_invalid_npkts -
xsks[i]->ring_stats.prev_rx_invalid_npkts) *
1000000000. / dt;
tx_invalid_pps = (xsks[i]->ring_stats.tx_invalid_npkts -
xsks[i]->ring_stats.prev_tx_invalid_npkts) *
1000000000. / dt;
full_pps = (xsks[i]->ring_stats.rx_full_npkts -
xsks[i]->ring_stats.prev_rx_full_npkts) *
1000000000. / dt;
fill_empty_pps = (xsks[i]->ring_stats.rx_fill_empty_npkts -
xsks[i]->ring_stats.prev_rx_fill_empty_npkts) *
1000000000. / dt;
tx_empty_pps = (xsks[i]->ring_stats.tx_empty_npkts -
xsks[i]->ring_stats.prev_tx_empty_npkts) *
1000000000. / dt;
printf(fmt, "rx dropped", dropped_pps,
xsks[i]->rx_dropped_npkts);
xsks[i]->ring_stats.rx_dropped_npkts);
printf(fmt, "rx invalid", rx_invalid_pps,
xsks[i]->rx_invalid_npkts);
xsks[i]->ring_stats.rx_invalid_npkts);
printf(fmt, "tx invalid", tx_invalid_pps,
xsks[i]->tx_invalid_npkts);
xsks[i]->ring_stats.tx_invalid_npkts);
printf(fmt, "rx queue full", full_pps,
xsks[i]->rx_full_npkts);
xsks[i]->ring_stats.rx_full_npkts);
printf(fmt, "fill ring empty", fill_empty_pps,
xsks[i]->rx_fill_empty_npkts);
xsks[i]->ring_stats.rx_fill_empty_npkts);
printf(fmt, "tx ring empty", tx_empty_pps,
xsks[i]->tx_empty_npkts);
xsks[i]->ring_stats.tx_empty_npkts);
xsks[i]->prev_rx_dropped_npkts = xsks[i]->rx_dropped_npkts;
xsks[i]->prev_rx_invalid_npkts = xsks[i]->rx_invalid_npkts;
xsks[i]->prev_tx_invalid_npkts = xsks[i]->tx_invalid_npkts;
xsks[i]->prev_rx_full_npkts = xsks[i]->rx_full_npkts;
xsks[i]->prev_rx_fill_empty_npkts = xsks[i]->rx_fill_empty_npkts;
xsks[i]->prev_tx_empty_npkts = xsks[i]->tx_empty_npkts;
xsks[i]->ring_stats.prev_rx_dropped_npkts =
xsks[i]->ring_stats.rx_dropped_npkts;
xsks[i]->ring_stats.prev_rx_invalid_npkts =
xsks[i]->ring_stats.rx_invalid_npkts;
xsks[i]->ring_stats.prev_tx_invalid_npkts =
xsks[i]->ring_stats.tx_invalid_npkts;
xsks[i]->ring_stats.prev_rx_full_npkts =
xsks[i]->ring_stats.rx_full_npkts;
xsks[i]->ring_stats.prev_rx_fill_empty_npkts =
xsks[i]->ring_stats.rx_fill_empty_npkts;
xsks[i]->ring_stats.prev_tx_empty_npkts =
xsks[i]->ring_stats.tx_empty_npkts;
} else {
printf("%-15s\n", "Error retrieving extra stats");
}
}
}
if (opt_app_stats)
dump_app_stats(dt);
if (irq_no)
dump_driver_stats(dt);
}
static bool is_benchmark_done(void)
@@ -613,7 +792,16 @@ static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
{
struct xsk_umem_info *umem;
struct xsk_umem_config cfg = {
.fill_size = XSK_RING_PROD__DEFAULT_NUM_DESCS,
/* We recommend that you set the fill ring size >= HW RX ring size +
* AF_XDP RX ring size. Make sure you fill up the fill ring
* with buffers at regular intervals, and you will with this setting
* avoid allocation failures in the driver. These are usually quite
* expensive since drivers have not been written to assume that
* allocation failures are common. For regular sockets, kernel
* allocated memory is used that only runs out in OOM situations
* that should be rare.
*/
.fill_size = XSK_RING_PROD__DEFAULT_NUM_DESCS * 2,
.comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS,
.frame_size = opt_xsk_frame_size,
.frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM,
@@ -640,13 +828,13 @@ static void xsk_populate_fill_ring(struct xsk_umem_info *umem)
u32 idx;
ret = xsk_ring_prod__reserve(&umem->fq,
XSK_RING_PROD__DEFAULT_NUM_DESCS, &idx);
if (ret != XSK_RING_PROD__DEFAULT_NUM_DESCS)
XSK_RING_PROD__DEFAULT_NUM_DESCS * 2, &idx);
if (ret != XSK_RING_PROD__DEFAULT_NUM_DESCS * 2)
exit_with_error(-ret);
for (i = 0; i < XSK_RING_PROD__DEFAULT_NUM_DESCS; i++)
for (i = 0; i < XSK_RING_PROD__DEFAULT_NUM_DESCS * 2; i++)
*xsk_ring_prod__fill_addr(&umem->fq, idx++) =
i * opt_xsk_frame_size;
xsk_ring_prod__submit(&umem->fq, XSK_RING_PROD__DEFAULT_NUM_DESCS);
xsk_ring_prod__submit(&umem->fq, XSK_RING_PROD__DEFAULT_NUM_DESCS * 2);
}
static struct xsk_socket_info *xsk_configure_socket(struct xsk_umem_info *umem,
@@ -683,6 +871,17 @@ static struct xsk_socket_info *xsk_configure_socket(struct xsk_umem_info *umem,
if (ret)
exit_with_error(-ret);
xsk->app_stats.rx_empty_polls = 0;
xsk->app_stats.fill_fail_polls = 0;
xsk->app_stats.copy_tx_sendtos = 0;
xsk->app_stats.tx_wakeup_sendtos = 0;
xsk->app_stats.opt_polls = 0;
xsk->app_stats.prev_rx_empty_polls = 0;
xsk->app_stats.prev_fill_fail_polls = 0;
xsk->app_stats.prev_copy_tx_sendtos = 0;
xsk->app_stats.prev_tx_wakeup_sendtos = 0;
xsk->app_stats.prev_opt_polls = 0;
return xsk;
}
@@ -709,6 +908,9 @@ static struct option long_options[] = {
{"tx-pkt-size", required_argument, 0, 's'},
{"tx-pkt-pattern", required_argument, 0, 'P'},
{"extra-stats", no_argument, 0, 'x'},
{"quiet", no_argument, 0, 'Q'},
{"app-stats", no_argument, 0, 'a'},
{"irq-string", no_argument, 0, 'I'},
{0, 0, 0, 0}
};
@@ -744,6 +946,9 @@ static void usage(const char *prog)
" Min size: %d, Max size %d.\n"
" -P, --tx-pkt-pattern=nPacket fill pattern. Default: 0x%x\n"
" -x, --extra-stats Display extra statistics.\n"
" -Q, --quiet Do not display any stats.\n"
" -a, --app-stats Display application (syscall) statistics.\n"
" -I, --irq-string Display driver interrupt statistics for interface associated with irq-string.\n"
"\n";
fprintf(stderr, str, prog, XSK_UMEM__DEFAULT_FRAME_SIZE,
opt_batch_size, MIN_PKT_SIZE, MIN_PKT_SIZE,
@@ -759,7 +964,7 @@ static void parse_command_line(int argc, char **argv)
opterr = 0;
for (;;) {
c = getopt_long(argc, argv, "Frtli:q:pSNn:czf:muMd:b:C:s:P:x",
c = getopt_long(argc, argv, "Frtli:q:pSNn:czf:muMd:b:C:s:P:xQaI:",
long_options, &option_index);
if (c == -1)
break;
@@ -842,6 +1047,22 @@ static void parse_command_line(int argc, char **argv)
break;
case 'x':
opt_extra_stats = 1;
break;
case 'Q':
opt_quiet = 1;
break;
case 'a':
opt_app_stats = 1;
break;
case 'I':
opt_irq_str = optarg;
if (get_interrupt_number())
irqs_at_init = get_irqs();
if (irqs_at_init < 0) {
fprintf(stderr, "ERROR: Failed to get irqs for %s\n", opt_irq_str);
usage(basename(argv[0]));
}
break;
default:
usage(basename(argv[0]));
@@ -888,8 +1109,15 @@ static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk,
if (!xsk->outstanding_tx)
return;
if (!opt_need_wakeup || xsk_ring_prod__needs_wakeup(&xsk->tx))
/* In copy mode, Tx is driven by a syscall so we need to use e.g. sendto() to
* really send the packets. In zero-copy mode we do not have to do this, since Tx
* is driven by the NAPI loop. So as an optimization, we do not have to call
* sendto() all the time in zero-copy mode for l2fwd.
*/
if (opt_xdp_bind_flags & XDP_COPY) {
xsk->app_stats.copy_tx_sendtos++;
kick_tx(xsk);
}
ndescs = (xsk->outstanding_tx > opt_batch_size) ? opt_batch_size :
xsk->outstanding_tx;
@@ -904,8 +1132,10 @@ static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk,
while (ret != rcvd) {
if (ret < 0)
exit_with_error(-ret);
if (xsk_ring_prod__needs_wakeup(&umem->fq))
if (xsk_ring_prod__needs_wakeup(&umem->fq)) {
xsk->app_stats.fill_fail_polls++;
ret = poll(fds, num_socks, opt_timeout);
}
ret = xsk_ring_prod__reserve(&umem->fq, rcvd, &idx_fq);
}
@@ -916,7 +1146,7 @@ static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk,
xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
xsk_ring_cons__release(&xsk->umem->cq, rcvd);
xsk->outstanding_tx -= rcvd;
xsk->tx_npkts += rcvd;
xsk->ring_stats.tx_npkts += rcvd;
}
}
@@ -929,14 +1159,16 @@ static inline void complete_tx_only(struct xsk_socket_info *xsk,
if (!xsk->outstanding_tx)
return;
if (!opt_need_wakeup || xsk_ring_prod__needs_wakeup(&xsk->tx))
if (!opt_need_wakeup || xsk_ring_prod__needs_wakeup(&xsk->tx)) {
xsk->app_stats.tx_wakeup_sendtos++;
kick_tx(xsk);
}
rcvd = xsk_ring_cons__peek(&xsk->umem->cq, batch_size, &idx);
if (rcvd > 0) {
xsk_ring_cons__release(&xsk->umem->cq, rcvd);
xsk->outstanding_tx -= rcvd;
xsk->tx_npkts += rcvd;
xsk->ring_stats.tx_npkts += rcvd;
}
}
@@ -948,8 +1180,10 @@ static void rx_drop(struct xsk_socket_info *xsk, struct pollfd *fds)
rcvd = xsk_ring_cons__peek(&xsk->rx, opt_batch_size, &idx_rx);
if (!rcvd) {
if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq))
if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
xsk->app_stats.rx_empty_polls++;
ret = poll(fds, num_socks, opt_timeout);
}
return;
}
@@ -957,8 +1191,10 @@ static void rx_drop(struct xsk_socket_info *xsk, struct pollfd *fds)
while (ret != rcvd) {
if (ret < 0)
exit_with_error(-ret);
if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq))
if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
xsk->app_stats.fill_fail_polls++;
ret = poll(fds, num_socks, opt_timeout);
}
ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
}
@@ -976,7 +1212,7 @@ static void rx_drop(struct xsk_socket_info *xsk, struct pollfd *fds)
xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
xsk_ring_cons__release(&xsk->rx, rcvd);
xsk->rx_npkts += rcvd;
xsk->ring_stats.rx_npkts += rcvd;
}
static void rx_drop_all(void)
@@ -991,6 +1227,8 @@ static void rx_drop_all(void)
for (;;) {
if (opt_poll) {
for (i = 0; i < num_socks; i++)
xsks[i]->app_stats.opt_polls++;
ret = poll(fds, num_socks, opt_timeout);
if (ret <= 0)
continue;
@@ -1004,7 +1242,7 @@ static void rx_drop_all(void)
}
}
static void tx_only(struct xsk_socket_info *xsk, u32 frame_nb, int batch_size)
static void tx_only(struct xsk_socket_info *xsk, u32 *frame_nb, int batch_size)
{
u32 idx;
unsigned int i;
@@ -1017,14 +1255,14 @@ static void tx_only(struct xsk_socket_info *xsk, u32 frame_nb, int batch_size)
for (i = 0; i < batch_size; i++) {
struct xdp_desc *tx_desc = xsk_ring_prod__tx_desc(&xsk->tx,
idx + i);
tx_desc->addr = (frame_nb + i) << XSK_UMEM__DEFAULT_FRAME_SHIFT;
tx_desc->addr = (*frame_nb + i) << XSK_UMEM__DEFAULT_FRAME_SHIFT;
tx_desc->len = PKT_SIZE;
}
xsk_ring_prod__submit(&xsk->tx, batch_size);
xsk->outstanding_tx += batch_size;
frame_nb += batch_size;
frame_nb %= NUM_FRAMES;
*frame_nb += batch_size;
*frame_nb %= NUM_FRAMES;
complete_tx_only(xsk, batch_size);
}
@@ -1071,6 +1309,8 @@ static void tx_only_all(void)
int batch_size = get_batch_size(pkt_cnt);
if (opt_poll) {
for (i = 0; i < num_socks; i++)
xsks[i]->app_stats.opt_polls++;
ret = poll(fds, num_socks, opt_timeout);
if (ret <= 0)
continue;
@@ -1080,7 +1320,7 @@ static void tx_only_all(void)
}
for (i = 0; i < num_socks; i++)
tx_only(xsks[i], frame_nb[i], batch_size);
tx_only(xsks[i], &frame_nb[i], batch_size);
pkt_cnt += batch_size;
@@ -1102,8 +1342,10 @@ static void l2fwd(struct xsk_socket_info *xsk, struct pollfd *fds)
rcvd = xsk_ring_cons__peek(&xsk->rx, opt_batch_size, &idx_rx);
if (!rcvd) {
if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq))
if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
xsk->app_stats.rx_empty_polls++;
ret = poll(fds, num_socks, opt_timeout);
}
return;
}
@@ -1111,8 +1353,11 @@ static void l2fwd(struct xsk_socket_info *xsk, struct pollfd *fds)
while (ret != rcvd) {
if (ret < 0)
exit_with_error(-ret);
if (xsk_ring_prod__needs_wakeup(&xsk->tx))
complete_tx_l2fwd(xsk, fds);
if (xsk_ring_prod__needs_wakeup(&xsk->tx)) {
xsk->app_stats.tx_wakeup_sendtos++;
kick_tx(xsk);
}
ret = xsk_ring_prod__reserve(&xsk->tx, rcvd, &idx_tx);
}
@@ -1134,7 +1379,7 @@ static void l2fwd(struct xsk_socket_info *xsk, struct pollfd *fds)
xsk_ring_prod__submit(&xsk->tx, rcvd);
xsk_ring_cons__release(&xsk->rx, rcvd);
xsk->rx_npkts += rcvd;
xsk->ring_stats.rx_npkts += rcvd;
xsk->outstanding_tx += rcvd;
}
@@ -1150,6 +1395,8 @@ static void l2fwd_all(void)
for (;;) {
if (opt_poll) {
for (i = 0; i < num_socks; i++)
xsks[i]->app_stats.opt_polls++;
ret = poll(fds, num_socks, opt_timeout);
if (ret <= 0)
continue;
@@ -1271,9 +1518,11 @@ int main(int argc, char **argv)
setlocale(LC_ALL, "");
ret = pthread_create(&pt, NULL, poller, NULL);
if (ret)
exit_with_error(ret);
if (!opt_quiet) {
ret = pthread_create(&pt, NULL, poller, NULL);
if (ret)
exit_with_error(ret);
}
prev_time = get_nsecs();
start_time = prev_time;
@@ -1287,7 +1536,8 @@ int main(int argc, char **argv)
benchmark_done = true;
pthread_join(pt, NULL);
if (!opt_quiet)
pthread_join(pt, NULL);
xdpsock_cleanup();

1085
samples/bpf/xsk_fwd.c Normal file

File diff suppressed because it is too large Load Diff