Commit Graph

14029 Commits

Author SHA1 Message Date
Oliver Upton
123f42f0ad Merge branch kvm-arm64/pmu_pmcr_n into kvmarm/next
* kvm-arm64/pmu_pmcr_n:
  : User-defined PMC limit, courtesy Raghavendra Rao Ananta
  :
  : Certain VMMs may want to reserve some PMCs for host use while running a
  : KVM guest. This was a bit difficult before, as KVM advertised all
  : supported counters to the guest. Userspace can now limit the number of
  : advertised PMCs by writing to PMCR_EL0.N, as KVM's sysreg and PMU
  : emulation enforce the specified limit for handling guest accesses.
  KVM: selftests: aarch64: vPMU test for validating user accesses
  KVM: selftests: aarch64: vPMU register test for unimplemented counters
  KVM: selftests: aarch64: vPMU register test for implemented counters
  KVM: selftests: aarch64: Introduce vpmu_counter_access test
  tools: Import arm_pmuv3.h
  KVM: arm64: PMU: Allow userspace to limit PMCR_EL0.N for the guest
  KVM: arm64: Sanitize PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} before first run
  KVM: arm64: Add {get,set}_user for PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}
  KVM: arm64: PMU: Set PMCR_EL0.N for vCPU based on the associated PMU
  KVM: arm64: PMU: Add a helper to read a vCPU's PMCR_EL0
  KVM: arm64: Select default PMU in KVM_ARM_VCPU_INIT handler
  KVM: arm64: PMU: Introduce helpers to set the guest's PMU

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30 20:24:19 +00:00
Oliver Upton
a87a36436c Merge branch kvm-arm64/writable-id-regs into kvmarm/next
* kvm-arm64/writable-id-regs:
  : Writable ID registers, courtesy of Jing Zhang
  :
  : This series significantly expands the architectural feature set that
  : userspace can manipulate via the ID registers. A new ioctl is defined
  : that makes the mutable fields in the ID registers discoverable to
  : userspace.
  KVM: selftests: Avoid using forced target for generating arm64 headers
  tools headers arm64: Fix references to top srcdir in Makefile
  KVM: arm64: selftests: Test for setting ID register from usersapce
  tools headers arm64: Update sysreg.h with kernel sources
  KVM: selftests: Generate sysreg-defs.h and add to include path
  perf build: Generate arm64's sysreg-defs.h and add to include path
  tools: arm64: Add a Makefile for generating sysreg-defs.h
  KVM: arm64: Document vCPU feature selection UAPIs
  KVM: arm64: Allow userspace to change ID_AA64ZFR0_EL1
  KVM: arm64: Allow userspace to change ID_AA64PFR0_EL1
  KVM: arm64: Allow userspace to change ID_AA64MMFR{0-2}_EL1
  KVM: arm64: Allow userspace to change ID_AA64ISAR{0-2}_EL1
  KVM: arm64: Bump up the default KVM sanitised debug version to v8p8
  KVM: arm64: Reject attempts to set invalid debug arch version
  KVM: arm64: Advertise selected DebugVer in DBGDIDR.Version
  KVM: arm64: Use guest ID register values for the sake of emulation
  KVM: arm64: Document KVM_ARM_GET_REG_WRITABLE_MASKS
  KVM: arm64: Allow userspace to get the writable masks for feature ID registers

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30 20:21:09 +00:00
Oliver Upton
70c7b704ca KVM: selftests: Avoid using forced target for generating arm64 headers
The 'prepare' target that generates the arm64 sysreg headers had no
prerequisites, so it wound up forcing a rebuild of all KVM selftests
each invocation. Add a rule for the generated headers and just have
dependents use that for a prerequisite.

Reported-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
Fixes: 9697d84cc3 ("KVM: selftests: Generate sysreg-defs.h and add to include path")
Tested-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
Link: https://lore.kernel.org/r/20231027005439.3142015-3-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30 20:20:39 +00:00
Oliver Upton
054056bf98 Merge branch kvm-arm64/misc into kvmarm/next
* kvm-arm64/misc:
  : Miscellaneous updates
  :
  :  - Put an upper bound on the number of I-cache invalidations by
  :    cacheline to avoid soft lockups
  :
  :  - Get rid of bogus refererence count transfer for THP mappings
  :
  :  - Do a local TLB invalidation on permission fault race
  :
  :  - Fixes for page_fault_test KVM selftest
  :
  :  - Add a tracepoint for detecting MMIO instructions unsupported by KVM
  KVM: arm64: Add tracepoint for MMIO accesses where ISV==0
  KVM: arm64: selftest: Perform ISB before reading PAR_EL1
  KVM: arm64: selftest: Add the missing .guest_prepare()
  KVM: arm64: Always invalidate TLB for stage-2 permission faults
  KVM: arm64: Do not transfer page refcount for THP adjustment
  KVM: arm64: Avoid soft lockups due to I-cache maintenance
  arm64: tlbflush: Rename MAX_TLBI_OPS
  KVM: arm64: Don't use kerneldoc comment for arm64_check_features()

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30 20:18:00 +00:00
Zenghui Yu
06899aa5dd KVM: arm64: selftest: Perform ISB before reading PAR_EL1
It looks like a mistake to issue ISB *after* reading PAR_EL1, we should
instead perform it between the AT instruction and the reads of PAR_EL1.

As according to DDI0487J.a IJTYVP,

"When an address translation instruction is executed, explicit
 synchronization is required to guarantee the result is visible to
 subsequent direct reads of PAR_EL1."

Otherwise all guest_at testcases fail on my box with

==== Test Assertion Failure ====
  aarch64/page_fault_test.c:142: par & 1 == 0
  pid=1355864 tid=1355864 errno=4 - Interrupted system call
     1	0x0000000000402853: vcpu_run_loop at page_fault_test.c:681
     2	0x0000000000402cdb: run_test at page_fault_test.c:730
     3	0x0000000000403897: for_each_guest_mode at guest_modes.c:100
     4	0x00000000004019f3: for_each_test_and_guest_mode at page_fault_test.c:1105
     5	 (inlined by) main at page_fault_test.c:1131
     6	0x0000ffffb153c03b: ?? ??:0
     7	0x0000ffffb153c113: ?? ??:0
     8	0x0000000000401aaf: _start at ??:?
  0x1 != 0x0 (par & 1 != 0)

Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231007124043.626-2-yuzenghui@huawei.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30 20:12:46 +00:00
Zenghui Yu
beaf35b480 KVM: arm64: selftest: Add the missing .guest_prepare()
Running page_fault_test on a Cortex A72 fails with

Test: ro_memslot_no_syndrome_guest_cas
Testing guest mode: PA-bits:40,  VA-bits:48,  4K pages
Testing memory backing src type: anonymous
==== Test Assertion Failure ====
  aarch64/page_fault_test.c:117: guest_check_lse()
  pid=1944087 tid=1944087 errno=4 - Interrupted system call
     1	0x00000000004028b3: vcpu_run_loop at page_fault_test.c:682
     2	0x0000000000402d93: run_test at page_fault_test.c:731
     3	0x0000000000403957: for_each_guest_mode at guest_modes.c:100
     4	0x00000000004019f3: for_each_test_and_guest_mode at page_fault_test.c:1108
     5	 (inlined by) main at page_fault_test.c:1134
     6	0x0000ffff868e503b: ?? ??:0
     7	0x0000ffff868e5113: ?? ??:0
     8	0x0000000000401aaf: _start at ??:?
  guest_check_lse()

because we don't have a guest_prepare stage to check the presence of
FEAT_LSE and skip the related guest_cas testing, and we end-up failing in
GUEST_ASSERT(guest_check_lse()).

Add the missing .guest_prepare() where it's indeed required.

Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231007124043.626-1-yuzenghui@huawei.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30 20:12:46 +00:00
Robert Richter
f611d98a00 cxl/pci: Remove Component Register base address from struct cxl_dev_state
The Component Register base address @component_reg_phys is no longer
used after the rework of the Component Register setup which now uses
struct member @reg_map instead. Remove the base address.

Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20231018171713.1883517-9-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27 20:13:37 -07:00
Vishal Verma
8f61d48c83 tools/testing/cxl: Slow down the mock firmware transfer
The cxl-cli unit test for firmware update does operations like starting
an asynchronous firmware update, making sure it is in progress, and
attempting to cancel it. In some cases, such as with no or minimal
dynamic debugging turned on, the firmware update completes too quickly,
not allowing the test to have a chance to verify it was in progress.
This caused a failure of the signature:

  expected fw_update_in_progress:true
  test/cxl-update-firmware.sh: failed at line 88

Fix this by adding a delay (~1.5 - 2 ms) to each firmware transfer
request handled by the mocked interface.

Reported-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Link: https://lore.kernel.org/r/20231026-vv-fw_upd_test_fix-v2-1-5282fd193883@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27 13:04:52 -07:00
Jim Harris
98a04c7ace cxl/region: Fix x1 root-decoder granularity calculations
Root decoder granularity must match value from CFWMS, which may not
be the region's granularity for non-interleaved root decoders.

So when calculating granularities for host bridge decoders, use the
region's granularity instead of the root decoder's granularity to ensure
the correct granularities are set for the host bridge decoders and any
downstream switch decoders.

Test configuration is 1 host bridge * 2 switches * 2 endpoints per switch.

Region created with 2048 granularity using following command line:

cxl create-region -m -d decoder0.0 -w 4 mem0 mem2 mem1 mem3 \
		  -g 2048 -s 2048M

Use "cxl list -PDE | grep granularity" to get a view of the granularity
set at each level of the topology.

Before this patch:
        "interleave_granularity":2048,
        "interleave_granularity":2048,
    "interleave_granularity":512,
        "interleave_granularity":2048,
        "interleave_granularity":2048,
    "interleave_granularity":512,
"interleave_granularity":256,

After:
        "interleave_granularity":2048,
        "interleave_granularity":2048,
    "interleave_granularity":4096,
        "interleave_granularity":2048,
        "interleave_granularity":2048,
    "interleave_granularity":4096,
"interleave_granularity":2048,

Fixes: 27b3f8d138 ("cxl/region: Program target lists")
Cc: <stable@vger.kernel.org>
Signed-off-by: Jim Harris <jim.harris@samsung.com>
Link: https://lore.kernel.org/r/169824893473.1403938.16110924262989774582.stgit@bgt-140510-bm03.eng.stellus.in
[djbw: fixup the prebuilt cxl_test region]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27 13:04:52 -07:00
Mickaël Salaün
f12f8f8450 selftests/landlock: Add tests for FS topology changes with network rules
Add 2 tests to the layout1 fixture:
* topology_changes_with_net_only: Checks that FS topology
  changes are not denied by network-only restrictions.
* topology_changes_with_net_and_fs: Make sure that FS topology
  changes are still denied with FS and network restrictions.

This specifically test commit d722036403 ("landlock: Allow FS topology
changes for domains without such rule type").

Cc: Konstantin Meskhidze <konstantin.meskhidze@huawei.com>
Link: https://lore.kernel.org/r/20231027154615.815134-1-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2023-10-27 17:53:31 +02:00
Geliang Tang
629b35a225 selftests: mptcp: display simult in extra_msg
Just like displaying "invert" after "Info: ", "simult" should be
displayed too when rm_subflow_nr doesn't match the expect value in
chk_rm_nr():

      syn                                 [ ok ]
      synack                              [ ok ]
      ack                                 [ ok ]
      add                                 [ ok ]
      echo                                [ ok ]
      rm                                  [ ok ]
      rmsf                                [ ok ] 3 in [2:4]
      Info: invert simult

      syn                                 [ ok ]
      synack                              [ ok ]
      ack                                 [ ok ]
      add                                 [ ok ]
      echo                                [ ok ]
      rm                                  [ ok ]
      rmsf                                [ ok ]
      Info: invert

Reviewed-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231025-send-net-next-20231025-v1-10-db8f25f798eb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-27 08:47:30 -07:00
Geliang Tang
e71aab6777 selftests: mptcp: sockopt: drop mptcp_connect var
Global var mptcp_connect defined at the front of mptcp_sockopt.sh is
duplicate with local var mptcp_connect defined in do_transfer(), drop
this useless global one.

Reviewed-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231025-send-net-next-20231025-v1-9-db8f25f798eb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-27 08:47:30 -07:00
Geliang Tang
9168ea02b8 selftests: mptcp: fix wait_rm_addr/sf parameters
The second input parameter of 'wait_rm_addr/sf $1 1' is misused. If it's
1, wait_rm_addr/sf will never break, and will loop ten times, then
'wait_rm_addr/sf' equals to 'sleep 1'. This delay time is too long,
which can sometimes make the tests fail.

A better way to use wait_rm_addr/sf is to use rm_addr/sf_count to obtain
the current value, and then pass into wait_rm_addr/sf.

Fixes: 4369c198e5 ("selftests: mptcp: test userspace pm out of transfer")
Cc: stable@vger.kernel.org
Suggested-by: Matthieu Baerts <matttbe@kernel.org>
Reviewed-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231025-send-net-next-20231025-v1-2-db8f25f798eb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-27 08:47:07 -07:00
Geliang Tang
f4a75e9d11 selftests: mptcp: run userspace pm tests slower
Some userspace pm tests failed are reported by CI:

112 userspace pm add & remove address
      syn                                 [ ok ]
      synack                              [ ok ]
      ack                                 [ ok ]
      add                                 [ ok ]
      echo                                [ ok ]
      mptcp_info subflows=1:1             [ ok ]
      subflows_total 2:2                  [ ok ]
      mptcp_info add_addr_signal=1:1      [ ok ]
      rm                                  [ ok ]
      rmsf                                [ ok ]
      Info: invert
      mptcp_info subflows=0:0             [ ok ]
      subflows_total 1:1                  [fail]
                         got subflows 0:0 expected 1:1
Server ns stats
TcpPassiveOpens                 2                  0.0
TcpInSegs                       118                0.0

This patch fixes them by changing 'speed' to 5 to run the tests much more
slowly.

Fixes: 4369c198e5 ("selftests: mptcp: test userspace pm out of transfer")
Cc: stable@vger.kernel.org
Reviewed-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231025-send-net-next-20231025-v1-1-db8f25f798eb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-27 08:47:07 -07:00
Ido Schimmel
0514dd0593 selftests: vxlan_mdb: Use MDB get instead of dump
Test the new MDB get functionality by converting dump and grep to MDB
get.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27 10:51:42 +01:00
Ido Schimmel
e8bba9e83c selftests: bridge_mdb: Use MDB get instead of dump
Test the new MDB get functionality by converting dump and grep to MDB
get.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27 10:51:42 +01:00
Jakub Kicinski
c6f9b7138b Merge tag 'for-netdev' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2023-10-26

We've added 51 non-merge commits during the last 10 day(s) which contain
a total of 75 files changed, 5037 insertions(+), 200 deletions(-).

The main changes are:

1) Add open-coded task, css_task and css iterator support.
   One of the use cases is customizable OOM victim selection via BPF,
   from Chuyi Zhou.

2) Fix BPF verifier's iterator convergence logic to use exact states
   comparison for convergence checks, from Eduard Zingerman,
   Andrii Nakryiko and Alexei Starovoitov.

3) Add BPF programmable net device where bpf_mprog defines the logic
   of its xmit routine. It can operate in L3 and L2 mode,
   from Daniel Borkmann and Nikolay Aleksandrov.

4) Batch of fixes for BPF per-CPU kptr and re-enable unit_size checking
   for global per-CPU allocator, from Hou Tao.

5) Fix libbpf which eagerly assumed that SHT_GNU_verdef ELF section
   was going to be present whenever a binary has SHT_GNU_versym section,
   from Andrii Nakryiko.

6) Fix BPF ringbuf correctness to fold smp_mb__before_atomic() into
   atomic_set_release(), from Paul E. McKenney.

7) Add a warning if NAPI callback missed xdp_do_flush() under
   CONFIG_DEBUG_NET which helps checking if drivers were missing
   the former, from Sebastian Andrzej Siewior.

8) Fix missed RCU read-lock in bpf_task_under_cgroup() which was throwing
   a warning under sleepable programs, from Yafang Shao.

9) Avoid unnecessary -EBUSY from htab_lock_bucket by disabling IRQ before
   checking map_locked, from Song Liu.

10) Make BPF CI linked_list failure test more robust,
    from Kumar Kartikeya Dwivedi.

11) Enable samples/bpf to be built as PIE in Fedora, from Viktor Malik.

12) Fix xsk starving when multiple xsk sockets were associated with
    a single xsk_buff_pool, from Albert Huang.

13) Clarify the signed modulo implementation for the BPF ISA standardization
    document that it uses truncated division, from Dave Thaler.

14) Improve BPF verifier's JEQ/JNE branch taken logic to also consider
    signed bounds knowledge, from Andrii Nakryiko.

15) Add an option to XDP selftests to use multi-buffer AF_XDP
    xdp_hw_metadata and mark used XDP programs as capable to use frags,
    from Larysa Zaremba.

16) Fix bpftool's BTF dumper wrt printing a pointer value and another
    one to fix struct_ops dump in an array, from Manu Bretelle.

* tag 'for-netdev' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (51 commits)
  netkit: Remove explicit active/peer ptr initialization
  selftests/bpf: Fix selftests broken by mitigations=off
  samples/bpf: Allow building with custom bpftool
  samples/bpf: Fix passing LDFLAGS to libbpf
  samples/bpf: Allow building with custom CFLAGS/LDFLAGS
  bpf: Add more WARN_ON_ONCE checks for mismatched alloc and free
  selftests/bpf: Add selftests for netkit
  selftests/bpf: Add netlink helper library
  bpftool: Extend net dump with netkit progs
  bpftool: Implement link show support for netkit
  libbpf: Add link-based API for netkit
  tools: Sync if_link uapi header
  netkit, bpf: Add bpf programmable net device
  bpf: Improve JEQ/JNE branch taken logic
  bpf: Fold smp_mb__before_atomic() into atomic_set_release()
  bpf: Fix unnecessary -EBUSY from htab_lock_bucket
  xsk: Avoid starving the xsk further down the list
  bpf: print full verifier states on infinite loop detection
  selftests/bpf: test if state loops are detected in a tricky case
  bpf: correct loop detection for iterators convergence
  ...
====================

Link: https://lore.kernel.org/r/20231026150509.2824-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-26 20:02:41 -07:00
Jakub Kicinski
ec4c20ca09 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

net/mac80211/rx.c
  91535613b6 ("wifi: mac80211: don't drop all unprotected public action frames")
  6c02fab724 ("wifi: mac80211: split ieee80211_drop_unencrypted_mgmt() return value")

Adjacent changes:

drivers/net/ethernet/apm/xgene/xgene_enet_main.c
  61471264c0 ("net: ethernet: apm: Convert to platform remove callback returning void")
  d2ca43f306 ("net: xgene: Fix unused xgene_enet_of_match warning for !CONFIG_OF")

net/vmw_vsock/virtio_transport.c
  64c99d2d6a ("vsock/virtio: support to send non-linear skb")
  53b08c4985 ("vsock/virtio: initialize the_virtio_vsock before using VQs")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-26 13:46:28 -07:00
Konstantin Meskhidze
a549d055a2 selftests/landlock: Add network tests
Add 82 test suites to check edge cases related to bind() and connect()
actions. They are defined with 6 fixtures and their variants:

The "protocol" fixture is extended with 12 variants defined as a matrix
of: sandboxed/not-sandboxed, IPv4/IPv6/unix network domain, and
stream/datagram socket. 4 related tests suites are defined:
* bind: Tests bind action.
* connect: Tests connect action.
* bind_unspec: Tests bind action with the AF_UNSPEC socket family.
* connect_unspec: Tests connect action with the AF_UNSPEC socket family.

The "ipv4" fixture is extended with 4 variants defined as a matrix
of: sandboxed/not-sandboxed, and stream/datagram socket. 1 related test
suite is defined:
* from_unix_to_inet: Tests to make sure unix sockets' actions are not
  restricted by Landlock rules applied to TCP ones.

The "tcp_layers" fixture is extended with 8 variants defined as a matrix
of: IPv4/IPv6 network domain, and different number of landlock rule
layers. 2 related tests suites are defined:
* ruleset_overlap: Tests nested layers with less constraints.
* ruleset_expand: Tests nested layers with more constraints.

In the "mini" fixture 4 tests suites are defined:
* network_access_rights: Tests handling of known access rights.
* unknown_access_rights: Tests handling of unknown access rights.
* inval: Tests unhandled allowed access and zero access value.
* tcp_port_overflow: Tests with port values greater than 65535.

The "ipv4_tcp" fixture supports IPv4 network domain with stream socket.
2 tests suites are defined:
* port_endianness: Tests with big/little endian port formats.
* with_fs: Tests a ruleset with both filesystem and network
  restrictions.

The "port_specific" fixture is extended with 4 variants defined
as a matrix of: sandboxed/not-sandboxed, IPv4/IPv6 network domain,
and stream socket. 2 related tests suites are defined:
* bind_connect_zero: Tests with port 0.
* bind_connect_1023: Tests with port 1023.

Test coverage for security/landlock is 92.4% of 710 lines according to
gcc/gcov-13.

Signed-off-by: Konstantin Meskhidze <konstantin.meskhidze@huawei.com>
Link: https://lore.kernel.org/r/20231026014751.414649-11-konstantin.meskhidze@huawei.com
[mic: Extend commit message, update test coverage, clean up capability
use, fix useless TEST_F_FORK, and improve ipv4_tcp.with_fs]
Co-developed-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2023-10-26 21:07:16 +02:00
Konstantin Meskhidze
1fa335209f selftests/landlock: Share enforce_ruleset() helper
Move enforce_ruleset() helper function to common.h so that it can be
used both by filesystem tests and network ones.

Signed-off-by: Konstantin Meskhidze <konstantin.meskhidze@huawei.com>
Link: https://lore.kernel.org/r/20231026014751.414649-10-konstantin.meskhidze@huawei.com
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2023-10-26 21:07:15 +02:00
Konstantin Meskhidze
fff69fb03d landlock: Support network rules with TCP bind and connect
Add network rules support in the ruleset management helpers and the
landlock_create_ruleset() syscall. Extend user space API to support
network actions:
* Add new network access rights: LANDLOCK_ACCESS_NET_BIND_TCP and
  LANDLOCK_ACCESS_NET_CONNECT_TCP.
* Add a new network rule type: LANDLOCK_RULE_NET_PORT tied to struct
  landlock_net_port_attr. The allowed_access field contains the network
  access rights, and the port field contains the port value according to
  the controlled protocol. This field can take up to a 64-bit value
  but the maximum value depends on the related protocol (e.g. 16-bit
  value for TCP). Network port is in host endianness [1].
* Add a new handled_access_net field to struct landlock_ruleset_attr
  that contains network access rights.
* Increment the Landlock ABI version to 4.

Implement socket_bind() and socket_connect() LSM hooks, which enable
to control TCP socket binding and connection to specific ports.

Expand access_masks_t from u16 to u32 to be able to store network access
rights alongside filesystem access rights for rulesets' handled access
rights.

Access rights are not tied to socket file descriptors but checked at
bind() or connect() call time against the caller's Landlock domain. For
the filesystem, a file descriptor is a direct access to a file/data.
However, for network sockets, we cannot identify for which data or peer
a newly created socket will give access to. Indeed, we need to wait for
a connect or bind request to identify the use case for this socket.
Likewise a directory file descriptor may enable to open another file
(i.e. a new data item), but this opening is also restricted by the
caller's domain, not the file descriptor's access rights [2].

[1] https://lore.kernel.org/r/278ab07f-7583-a4e0-3d37-1bacd091531d@digikod.net
[2] https://lore.kernel.org/r/263c1eb3-602f-57fe-8450-3f138581bee7@digikod.net

Signed-off-by: Konstantin Meskhidze <konstantin.meskhidze@huawei.com>
Link: https://lore.kernel.org/r/20231026014751.414649-9-konstantin.meskhidze@huawei.com
[mic: Extend commit message, fix typo in comments, and specify
endianness in the documentation]
Co-developed-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2023-10-26 21:07:15 +02:00
Catalin Marinas
2baca17e6a Merge branch 'for-next/feat_lse128' into for-next/core
* for-next/feat_lse128:
  : HWCAP for FEAT_LSE128
  kselftest/arm64: add FEAT_LSE128 to hwcap test
  arm64: add FEAT_LSE128 HWCAP
2023-10-26 17:10:07 +01:00
Catalin Marinas
023113fe66 Merge branch 'for-next/feat_lrcpc3' into for-next/core
* for-next/feat_lrcpc3:
  : HWCAP for FEAT_LRCPC3
  selftests/arm64: add HWCAP2_LRCPC3 test
  arm64: add FEAT_LRCPC3 HWCAP
2023-10-26 17:10:05 +01:00
Catalin Marinas
2a3f8ce3bb Merge branch 'for-next/feat_sve_b16b16' into for-next/core
* for-next/feat_sve_b16b16:
  : Add support for FEAT_SVE_B16B16 (BFloat16)
  kselftest/arm64: Verify HWCAP2_SVE_B16B16
  arm64/sve: Report FEAT_SVE_B16B16 to userspace
2023-10-26 17:10:01 +01:00
Nicolin Chen
55a01657cb iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with nested HWPTs
The IOMMU_HWPT_ALLOC ioctl now supports passing user_data to allocate a
user-managed domain for nested HWPTs. Add its coverage for that. Also,
update _test_cmd_hwpt_alloc() and add test_cmd/err_hwpt_alloc_nested().

Link: https://lore.kernel.org/r/20231026043938.63898-11-yi.l.liu@intel.com
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-26 11:15:57 -03:00
Yafang Shao
399f6185a1 selftests/bpf: Fix selftests broken by mitigations=off
When we configure the kernel command line with 'mitigations=off' and set
the sysctl knob 'kernel.unprivileged_bpf_disabled' to 0, the commit
bc5bc309db ("bpf: Inherit system settings for CPU security mitigations")
causes issues in the execution of `test_progs -t verifier`. This is
because 'mitigations=off' bypasses Spectre v1 and Spectre v4 protections.

Currently, when a program requests to run in unprivileged mode
(kernel.unprivileged_bpf_disabled = 0), the BPF verifier may prevent
it from running due to the following conditions not being enabled:

  - bypass_spec_v1
  - bypass_spec_v4
  - allow_ptr_leaks
  - allow_uninit_stack

While 'mitigations=off' enables the first two conditions, it does not
enable the latter two. As a result, some test cases in
'test_progs -t verifier' that were expected to fail to run may run
successfully, while others still fail but with different error messages.
This makes it challenging to address them comprehensively.

Moreover, in the future, we may introduce more fine-grained control over
CPU mitigations, such as enabling only bypass_spec_v1 or bypass_spec_v4.

Given the complexity of the situation, rather than fixing each broken test
case individually, it's preferable to skip them when 'mitigations=off' is
in effect and introduce specific test cases for the new 'mitigations=off'
scenario. For instance, we can introduce new BTF declaration tags like
'__failure__nospec', '__failure_nospecv1' and '__failure_nospecv4'.

In this patch, the approach is to simply skip the broken test cases when
'mitigations=off' is enabled. The result of `test_progs -t verifier` as
follows after this commit,

Before this commit
==================

- without 'mitigations=off'
  - kernel.unprivileged_bpf_disabled = 2
    Summary: 74/948 PASSED, 388 SKIPPED, 0 FAILED
  - kernel.unprivileged_bpf_disabled = 0
    Summary: 74/1336 PASSED, 0 SKIPPED, 0 FAILED    <<<<
- with 'mitigations=off'
  - kernel.unprivileged_bpf_disabled = 2
    Summary: 74/948 PASSED, 388 SKIPPED, 0 FAILED
  - kernel.unprivileged_bpf_disabled = 0
    Summary: 63/1276 PASSED, 0 SKIPPED, 11 FAILED   <<<< 11 FAILED

After this commit
=================

- without 'mitigations=off'
  - kernel.unprivileged_bpf_disabled = 2
    Summary: 74/948 PASSED, 388 SKIPPED, 0 FAILED
  - kernel.unprivileged_bpf_disabled = 0
    Summary: 74/1336 PASSED, 0 SKIPPED, 0 FAILED    <<<<
- with this patch, with 'mitigations=off'
  - kernel.unprivileged_bpf_disabled = 2
    Summary: 74/948 PASSED, 388 SKIPPED, 0 FAILED
  - kernel.unprivileged_bpf_disabled = 0
    Summary: 74/948 PASSED, 388 SKIPPED, 0 FAILED   <<<< SKIPPED

Fixes: bc5bc309db ("bpf: Inherit system settings for CPU security mitigations")
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Closes: https://lore.kernel.org/bpf/CAADnVQKUBJqg+hHtbLeeC2jhoJAWqnmRAzXW3hmUCNSV9kx4sQ@mail.gmail.com
Link: https://lore.kernel.org/bpf/20231025031144.5508-1-laoar.shao@gmail.com
2023-10-26 15:42:03 +02:00
Daniel Borkmann
ace15f91e5 selftests/bpf: Add selftests for netkit
Add a bigger batch of test coverage to assert correct operation of
netkit devices and their BPF program management:

  # ./test_progs -t tc_netkit
  [...]
  [    1.166267] bpf_testmod: loading out-of-tree module taints kernel.
  [    1.166831] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
  [    1.270957] tsc: Refined TSC clocksource calibration: 3407.988 MHz
  [    1.272579] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fc932722, max_idle_ns: 440795381586 ns
  [    1.275336] clocksource: Switched to clocksource tsc
  #257     tc_netkit_basic:OK
  #258     tc_netkit_device:OK
  #259     tc_netkit_multi_links:OK
  #260     tc_netkit_multi_opts:OK
  #261     tc_netkit_neigh_links:OK
  Summary: 5/0 PASSED, 0 SKIPPED, 0 FAILED
  [...]

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20231024214904.29825-8-daniel@iogearbox.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-24 16:07:43 -07:00
Daniel Borkmann
51f1892b52 selftests/bpf: Add netlink helper library
Add a minimal netlink helper library for the BPF selftests. This has been
taken and cut down and cleaned up from iproute2. This covers basics such
as netdevice creation which we need for BPF selftests / BPF CI given
iproute2 package cannot cover it yet.

Stanislav Fomichev suggested that this could be replaced in future by ynl
tool generated C code once it has RTNL support to create devices. Once we
get to this point the BPF CI would also need to add libmnl. If no further
extensions are needed, a second option could be that we remove this code
again once iproute2 package has support.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20231024214904.29825-7-daniel@iogearbox.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-24 16:07:39 -07:00
Raghavendra Rao Ananta
62708be351 KVM: selftests: aarch64: vPMU test for validating user accesses
Add a vPMU test scenario to validate the userspace accesses for
the registers PM{C,I}NTEN{SET,CLR} and PMOVS{SET,CLR} to ensure
that KVM honors the architectural definitions of these registers
for a given PMCR.N.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231020214053.2144305-13-rananta@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-24 22:59:31 +00:00
Reiji Watanabe
e1cc872063 KVM: selftests: aarch64: vPMU register test for unimplemented counters
Add a new test case to the vpmu_counter_access test to check
if PMU registers or their bits for unimplemented counters are not
accessible or are RAZ, as expected.

Signed-off-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231020214053.2144305-12-rananta@google.com
[Oliver: fix issues relating to exception return address]
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-24 22:59:31 +00:00
Reiji Watanabe
ada1ae6826 KVM: selftests: aarch64: vPMU register test for implemented counters
Add a new test case to the vpmu_counter_access test to check if PMU
registers or their bits for implemented counters on the vCPU are
readable/writable as expected, and can be programmed to count events.

Signed-off-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231020214053.2144305-11-rananta@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-24 22:59:31 +00:00
Reiji Watanabe
8d0aebe1ca KVM: selftests: aarch64: Introduce vpmu_counter_access test
Introduce vpmu_counter_access test for arm64 platforms.
The test configures PMUv3 for a vCPU, sets PMCR_EL0.N for the vCPU,
and check if the guest can consistently see the same number of the
PMU event counters (PMCR_EL0.N) that userspace sets.
This test case is done with each of the PMCR_EL0.N values from
0 to 31 (With the PMCR_EL0.N values greater than the host value,
the test expects KVM_SET_ONE_REG for the PMCR_EL0 to fail).

Signed-off-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231020214053.2144305-10-rananta@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-24 22:59:31 +00:00
Swarup Laxman Kotiaklapudi
37a38e439d selftests: net: change ifconfig with ip command
Change ifconfig with ip command, on a system where ifconfig is
not used this script will not work correcly.

Test result with this patchset:

sudo make TARGETS="net" kselftest
....
TAP version 13
1..1
 timeout set to 1500
 selftests: net: route_localnet.sh
 run arp_announce test
 net.ipv4.conf.veth0.route_localnet = 1
 net.ipv4.conf.veth1.route_localnet = 1
 net.ipv4.conf.veth0.arp_announce = 2
 net.ipv4.conf.veth1.arp_announce = 2
 PING 127.25.3.14 (127.25.3.14) from 127.25.3.4 veth0: 56(84)
  bytes of data.
 64 bytes from 127.25.3.14: icmp_seq=1 ttl=64 time=0.038 ms
 64 bytes from 127.25.3.14: icmp_seq=2 ttl=64 time=0.068 ms
 64 bytes from 127.25.3.14: icmp_seq=3 ttl=64 time=0.068 ms
 64 bytes from 127.25.3.14: icmp_seq=4 ttl=64 time=0.068 ms
 64 bytes from 127.25.3.14: icmp_seq=5 ttl=64 time=0.068 ms

 --- 127.25.3.14 ping statistics ---
 5 packets transmitted, 5 received, 0% packet loss, time 4073ms
 rtt min/avg/max/mdev = 0.038/0.062/0.068/0.012 ms
 ok
 run arp_ignore test
 net.ipv4.conf.veth0.route_localnet = 1
 net.ipv4.conf.veth1.route_localnet = 1
 net.ipv4.conf.veth0.arp_ignore = 3
 net.ipv4.conf.veth1.arp_ignore = 3
 PING 127.25.3.14 (127.25.3.14) from 127.25.3.4 veth0: 56(84)
  bytes of data.
 64 bytes from 127.25.3.14: icmp_seq=1 ttl=64 time=0.032 ms
 64 bytes from 127.25.3.14: icmp_seq=2 ttl=64 time=0.065 ms
 64 bytes from 127.25.3.14: icmp_seq=3 ttl=64 time=0.066 ms
 64 bytes from 127.25.3.14: icmp_seq=4 ttl=64 time=0.065 ms
 64 bytes from 127.25.3.14: icmp_seq=5 ttl=64 time=0.065 ms

 --- 127.25.3.14 ping statistics ---
 5 packets transmitted, 5 received, 0% packet loss, time 4092ms
 rtt min/avg/max/mdev = 0.032/0.058/0.066/0.013 ms
 ok
ok 1 selftests: net: route_localnet.sh
...

Signed-off-by: Swarup Laxman Kotiaklapudi <swarupkotikalapudi@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20231023123422.2895-1-swarupkotikalapudi@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24 13:53:39 -07:00
Linus Torvalds
4f82870119 Merge tag 'mm-hotfixes-stable-2023-10-24-09-40' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
 "20 hotfixes. 12 are cc:stable and the remainder address post-6.5
  issues or aren't considered necessary for earlier kernel versions"

* tag 'mm-hotfixes-stable-2023-10-24-09-40' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  maple_tree: add GFP_KERNEL to allocations in mas_expected_entries()
  selftests/mm: include mman header to access MREMAP_DONTUNMAP identifier
  mailmap: correct email aliasing for Oleksij Rempel
  mailmap: map Bartosz's old address to the current one
  mm/damon/sysfs: check DAMOS regions update progress from before_terminate()
  MAINTAINERS: Ondrej has moved
  kasan: disable kasan_non_canonical_hook() for HW tags
  kasan: print the original fault addr when access invalid shadow
  hugetlbfs: close race between MADV_DONTNEED and page fault
  hugetlbfs: extend hugetlb_vma_lock to private VMAs
  hugetlbfs: clear resv_map pointer if mmap fails
  mm: zswap: fix pool refcount bug around shrink_worker()
  mm/migrate: fix do_pages_move for compat pointers
  riscv: fix set_huge_pte_at() for NAPOT mappings when a swap entry is set
  riscv: handle VM_FAULT_[HWPOISON|HWPOISON_LARGE] faults instead of panicking
  mmap: fix error paths with dup_anon_vma()
  mmap: fix vma_iterator in error path of vma_merge()
  mm: fix vm_brk_flags() to not bail out while holding lock
  mm/mempolicy: fix set_mempolicy_home_node() previous VMA pointer
  mm/page_alloc: correct start page when guard page debug is enabled
2023-10-24 09:52:16 -10:00
Joao Martins
0795b305da iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR flag
Change test_mock_dirty_bitmaps() to pass a flag where it specifies the flag
under test. The test does the same thing as the GET_DIRTY_BITMAP regular
test. Except that it tests whether the dirtied bits are fetched all the
same a second time, as opposed to observing them cleared.

Link: https://lore.kernel.org/r/20231024135109.73787-19-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-24 11:58:44 -03:00
Joao Martins
ae36fe70ce iommufd/selftest: Test out_capabilities in IOMMU_GET_HW_INFO
Enumerate the capabilities from the mock device and test whether it
advertises as expected. Include it as part of the iommufd_dirty_tracking
fixture.

Link: https://lore.kernel.org/r/20231024135109.73787-18-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-24 11:58:44 -03:00
Joao Martins
a9af47e382 iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP
Add a new test ioctl for simulating the dirty IOVAs in the mock domain, and
implement the mock iommu domain ops that get the dirty tracking supported.

The selftest exercises the usual main workflow of:

1) Setting dirty tracking from the iommu domain
2) Read and clear dirty IOPTEs

Different fixtures will test different IOVA range sizes, that exercise
corner cases of the bitmaps.

Link: https://lore.kernel.org/r/20231024135109.73787-17-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-24 11:58:44 -03:00
Joao Martins
7adf267d66 iommufd/selftest: Test IOMMU_HWPT_SET_DIRTY_TRACKING
Change mock_domain to supporting dirty tracking and add tests to exercise
the new SET_DIRTY_TRACKING API in the iommufd_dirty_tracking selftest
fixture.

Link: https://lore.kernel.org/r/20231024135109.73787-16-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-24 11:58:44 -03:00
Joao Martins
266ce58989 iommufd/selftest: Test IOMMU_HWPT_ALLOC_DIRTY_TRACKING
In order to selftest the iommu domain dirty enforcing implement the
mock_domain necessary support and add a new dev_flags to test that the
hwpt_alloc/attach_device fails as expected.

Expand the existing mock_domain fixture with a enforce_dirty test that
exercises the hwpt_alloc and device attachment.

Link: https://lore.kernel.org/r/20231024135109.73787-15-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-24 11:58:44 -03:00
Joao Martins
e04b23c8d4 iommufd/selftest: Expand mock_domain with dev_flags
Expand mock_domain test to be able to manipulate the device capabilities.
This allows testing with mockdev without dirty tracking support advertised
and thus make sure enforce_dirty test does the expected.

To avoid breaking IOMMUFD_TEST UABI replicate the mock_domain struct and
thus add an input dev_flags at the end.

Link: https://lore.kernel.org/r/20231024135109.73787-14-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-24 11:58:44 -03:00
Eduard Zingerman
64870feebe selftests/bpf: test if state loops are detected in a tricky case
A convoluted test case for iterators convergence logic that
demonstrates that states with branch count equal to 0 might still be
a part of not completely explored loop.

E.g. consider the following state diagram:

               initial     Here state 'succ' was processed first,
                 |         it was eventually tracked to produce a
                 V         state identical to 'hdr'.
    .---------> hdr        All branches from 'succ' had been explored
    |            |         and thus 'succ' has its .branches == 0.
    |            V
    |    .------...        Suppose states 'cur' and 'succ' correspond
    |    |       |         to the same instruction + callsites.
    |    V       V         In such case it is necessary to check
    |   ...     ...        whether 'succ' and 'cur' are identical.
    |    |       |         If 'succ' and 'cur' are a part of the same loop
    |    V       V         they have to be compared exactly.
    |   succ <- cur
    |    |
    |    V
    |   ...
    |    |
    '----'

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20231024000917.12153-7-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-10-23 21:49:32 -07:00
Eduard Zingerman
389ede06c2 selftests/bpf: tests with delayed read/precision makrs in loop body
These test cases try to hide read and precision marks from loop
convergence logic: marks would only be assigned on subsequent loop
iterations or after exploring states pushed to env->head stack first.
Without verifier fix to use exact states comparison logic for
iterators convergence these tests (except 'triple_continue') would be
errorneously marked as safe.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20231024000917.12153-5-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-10-23 21:49:31 -07:00
Eduard Zingerman
2793a8b015 bpf: exact states comparison for iterator convergence checks
Convergence for open coded iterators is computed in is_state_visited()
by examining states with branches count > 1 and using states_equal().
states_equal() computes sub-state relation using read and precision marks.
Read and precision marks are propagated from children states,
thus are not guaranteed to be complete inside a loop when branches
count > 1. This could be demonstrated using the following unsafe program:

     1. r7 = -16
     2. r6 = bpf_get_prandom_u32()
     3. while (bpf_iter_num_next(&fp[-8])) {
     4.   if (r6 != 42) {
     5.     r7 = -32
     6.     r6 = bpf_get_prandom_u32()
     7.     continue
     8.   }
     9.   r0 = r10
    10.   r0 += r7
    11.   r8 = *(u64 *)(r0 + 0)
    12.   r6 = bpf_get_prandom_u32()
    13. }

Here verifier would first visit path 1-3, create a checkpoint at 3
with r7=-16, continue to 4-7,3 with r7=-32.

Because instructions at 9-12 had not been visitied yet existing
checkpoint at 3 does not have read or precision mark for r7.
Thus states_equal() would return true and verifier would discard
current state, thus unsafe memory access at 11 would not be caught.

This commit fixes this loophole by introducing exact state comparisons
for iterator convergence logic:
- registers are compared using regs_exact() regardless of read or
  precision marks;
- stack slots have to have identical type.

Unfortunately, this is too strict even for simple programs like below:

    i = 0;
    while(iter_next(&it))
      i++;

At each iteration step i++ would produce a new distinct state and
eventually instruction processing limit would be reached.

To avoid such behavior speculatively forget (widen) range for
imprecise scalar registers, if those registers were not precise at the
end of the previous iteration and do not match exactly.

This a conservative heuristic that allows to verify wide range of
programs, however it precludes verification of programs that conjure
an imprecise value on the first loop iteration and use it as precise
on the second.

Test case iter_task_vma_for_each() presents one of such cases:

        unsigned int seen = 0;
        ...
        bpf_for_each(task_vma, vma, task, 0) {
                if (seen >= 1000)
                        break;
                ...
                seen++;
        }

Here clang generates the following code:

<LBB0_4>:
      24:       r8 = r6                          ; stash current value of
                ... body ...                       'seen'
      29:       r1 = r10
      30:       r1 += -0x8
      31:       call bpf_iter_task_vma_next
      32:       r6 += 0x1                        ; seen++;
      33:       if r0 == 0x0 goto +0x2 <LBB0_6>  ; exit on next() == NULL
      34:       r7 += 0x10
      35:       if r8 < 0x3e7 goto -0xc <LBB0_4> ; loop on seen < 1000

<LBB0_6>:
      ... exit ...

Note that counter in r6 is copied to r8 and then incremented,
conditional jump is done using r8. Because of this precision mark for
r6 lags one state behind of precision mark on r8 and widening logic
kicks in.

Adding barrier_var(seen) after conditional is sufficient to force
clang use the same register for both counting and conditional jump.

This issue was discussed in the thread [1] which was started by
Andrew Werner <awerner32@gmail.com> demonstrating a similar bug
in callback functions handling. The callbacks would be addressed
in a followup patch.

[1] https://lore.kernel.org/bpf/97a90da09404c65c8e810cf83c94ac703705dc0e.camel@gmail.com/

Co-developed-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Co-developed-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20231024000917.12153-4-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-10-23 21:49:31 -07:00
Eric Dumazet
2a7c8d291f tcp: introduce tcp_clock_ms()
It delivers current TCP time stamp in ms unit, and is used
in place of confusing tcp_time_stamp_raw()

It is the same family than tcp_clock_ns() and tcp_clock_ms().

tcp_time_stamp_raw() will be replaced later for TSval
contexts with a more descriptive name.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-23 09:35:01 +01:00
Linus Torvalds
023cc83605 Merge tag 'probes-fixes-v6.6-rc6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes fixes from Masami Hiramatsu:

 - kprobe-events: Fix kprobe events to reject if the attached symbol is
   not unique name because it may not the function which the user want
   to attach to. (User can attach a probe to such symbol using the
   nearest unique symbol + offset.)

 - selftest: Add a testcase to ensure the kprobe event rejects non
   unique symbol correctly.

* tag 'probes-fixes-v6.6-rc6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  selftests/ftrace: Add new test case which checks non unique symbol
  tracing/kprobes: Return EADDRNOTAVAIL when func matches several symbols
2023-10-21 11:00:36 -07:00
Pedro Tammela
ee3d122854 selftests: tc-testing: add test for 'rt' upgrade on hfsc
Add a test to check if inner rt curves are upgraded to sc curves.

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-21 11:46:41 +01:00
Linus Torvalds
444ccf1b11 Merge tag 'linux_kselftest_active-fixes-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fix from Shuah Khan:
 "One single fix to assert check in user_events abi_test to properly
  check bit value on Big Endian architectures. The code treated the bit
  values as Little Endian and the check failed on Big Endian"

* tag 'linux_kselftest_active-fixes-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/user_events: Fix abi_test for BE archs
2023-10-20 14:45:41 -07:00
Hou Tao
d440ba91ca selftests/bpf: Add more test cases for bpf memory allocator
Add the following 3 test cases for bpf memory allocator:
1) Do allocation in bpf program and free through map free
2) Do batch per-cpu allocation and per-cpu free in bpf program
3) Do per-cpu allocation in bpf program and free through map free

For per-cpu allocation, because per-cpu allocation can not refill timely
sometimes, so test 2) and test 3) consider it is OK for
bpf_percpu_obj_new_impl() to return NULL.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20231020133202.4043247-8-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-10-20 14:15:13 -07:00
Kumar Kartikeya Dwivedi
da1055b673 selftests/bpf: Make linked_list failure test more robust
The linked list failure test 'pop_front_off' and 'pop_back_off'
currently rely on matching exact instruction and register values.  The
purpose of the test is to ensure the offset is correctly incremented for
the returned pointers from list pop helpers, which can then be used with
container_of to obtain the real object. Hence, somehow obtaining the
information that the offset is 48 will work for us. Make the test more
robust by relying on verifier error string of bpf_spin_lock and remove
dependence on fragile instruction index or register number, which can be
affected by different clang versions used to build the selftests.

Fixes: 300f19dcdb ("selftests/bpf: Add BPF linked list API tests")
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20231020144839.2734006-1-memxor@gmail.com
2023-10-20 09:29:39 -07:00
Francis Laniel
03b80ff802 selftests/ftrace: Add new test case which checks non unique symbol
If name_show() is non unique, this test will try to install a kprobe on this
function which should fail returning EADDRNOTAVAIL.
On kernel where name_show() is not unique, this test is skipped.

Link: https://lore.kernel.org/all/20231020104250.9537-3-flaniel@linux.microsoft.com/

Cc: stable@vger.kernel.org
Signed-off-by: Francis Laniel <flaniel@linux.microsoft.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-10-20 22:11:49 +09:00