Commit Graph

18353 Commits

Author SHA1 Message Date
Linus Torvalds
1a9239bb42 Merge tag 'net-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
 "Core & protocols:

   - Continue Netlink conversions to per-namespace RTNL lock
     (IPv4 routing, routing rules, routing next hops, ARP ioctls)

   - Continue extending the use of netdev instance locks. As a driver
     opt-in protect queue operations and (in due course) ethtool
     operations with the instance lock and not RTNL lock.

   - Support collecting TCP timestamps (data submitted, sent, acked) in
     BPF, allowing for transparent (to the application) and lower
     overhead tracking of TCP RPC performance.

   - Tweak existing networking Rx zero-copy infra to support zero-copy
     Rx via io_uring.

   - Optimize MPTCP performance in single subflow mode by 29%.

   - Enable GRO on packets which went thru XDP CPU redirect (were queued
     for processing on a different CPU). Improving TCP stream
     performance up to 2x.

   - Improve performance of contended connect() by 200% by searching for
     an available 4-tuple under RCU rather than a spin lock. Bring an
     additional 229% improvement by tweaking hash distribution.

   - Avoid unconditionally touching sk_tsflags on RX, improving
     performance under UDP flood by as much as 10%.

   - Avoid skb_clone() dance in ping_rcv() to improve performance under
     ping flood.

   - Avoid FIB lookup in netfilter if socket is available, 20% perf win.

   - Rework network device creation (in-kernel) API to more clearly
     identify network namespaces and their roles. There are up to 4
     namespace roles but we used to have just 2 netns pointer arguments,
     interpreted differently based on context.

   - Use sysfs_break_active_protection() instead of trylock to avoid
     deadlocks between unregistering objects and sysfs access.

   - Add a new sysctl and sockopt for capping max retransmit timeout in
     TCP.

   - Support masking port and DSCP in routing rule matches.

   - Support dumping IPv4 multicast addresses with RTM_GETMULTICAST.

   - Support specifying at what time packet should be sent on AF_XDP
     sockets.

   - Expose TCP ULP diagnostic info (for TLS and MPTCP) to non-admin
     users.

   - Add Netlink YAML spec for WiFi (nl80211) and conntrack.

   - Introduce EXPORT_IPV6_MOD() and EXPORT_IPV6_MOD_GPL() for symbols
     which only need to be exported when IPv6 support is built as a
     module.

   - Age FDB entries based on Rx not Tx traffic in VxLAN, similar to
     normal bridging.

   - Allow users to specify source port range for GENEVE tunnels.

   - netconsole: allow attaching kernel release, CPU ID and task name to
     messages as metadata

  Driver API:

   - Continue rework / fixing of Energy Efficient Ethernet (EEE) across
     the SW layers. Delegate the responsibilities to phylink where
     possible. Improve its handling in phylib.

   - Support symmetric OR-XOR RSS hashing algorithm.

   - Support tracking and preserving IRQ affinity by NAPI itself.

   - Support loopback mode speed selection for interface selftests.

  Device drivers:

   - Remove the IBM LCS driver for s390

   - Remove the sb1000 cable modem driver

   - Add support for SFP module access over SMBus

   - Add MCTP transport driver for MCTP-over-USB

   - Enable XDP metadata support in multiple drivers

   - Ethernet high-speed NICs:
      - Broadcom (bnxt):
         - add PCIe TLP Processing Hints (TPH) support for new AMD
           platforms
         - support dumping RoCE queue state for debug
         - opt into instance locking
      - Intel (100G, ice, idpf):
         - ice: rework MSI-X IRQ management and distribution
         - ice: support for E830 devices
         - iavf: add support for Rx timestamping
         - iavf: opt into instance locking
      - nVidia/Mellanox:
         - mlx4: use page pool memory allocator for Rx
         - mlx5: support for one PTP device per hardware clock
         - mlx5: support for 200Gbps per-lane link modes
         - mlx5: move IPSec policy check after decryption
      - AMD/Solarflare:
         - support FW flashing via devlink
      - Cisco (enic):
         - use page pool memory allocator for Rx
         - enable 32, 64 byte CQEs
         - get max rx/tx ring size from the device
      - Meta (fbnic):
         - support flow steering and RSS configuration
         - report queue stats
         - support TCP segmentation
         - support IRQ coalescing
         - support ring size configuration
      - Marvell/Cavium:
         - support AF_XDP
      - Wangxun:
         - support for PTP clock and timestamping
      - Huawei (hibmcge):
         - checksum offload
         - add more statistics

   - Ethernet virtual:
      - VirtIO net:
         - aggressively suppress Tx completions, improve perf by 96%
           with 1 CPU and 55% with 2 CPUs
         - expose NAPI to IRQ mapping and persist NAPI settings
      - Google (gve):
         - support XDP in DQO RDA Queue Format
         - opt into instance locking
      - Microsoft vNIC:
         - support BIG TCP

   - Ethernet NICs consumer, and embedded:
      - Synopsys (stmmac):
         - cleanup Tx and Tx clock setting and other link-focused
           cleanups
         - enable SGMII and 2500BASEX mode switching for Intel platforms
         - support Sophgo SG2044
      - Broadcom switches (b53):
         - support for BCM53101
      - TI:
         - iep: add perout configuration support
         - icssg: support XDP
      - Cadence (macb):
         - implement BQL
      - Xilinx (axinet):
         - support dynamic IRQ moderation and changing coalescing at
           runtime
         - implement BQL
         - report standard stats
      - MediaTek:
         - support phylink managed EEE
      - Intel:
         - igc: don't restart the interface on every XDP program change
      - RealTek (r8169):
         - support reading registers of internal PHYs directly
         - increase max jumbo packet size on RTL8125/RTL8126
      - Airoha:
         - support for RISC-V NPU packet processing unit
         - enable scatter-gather and support MTU up to 9kB
      - Tehuti (tn40xx):
         - support cards with TN4010 MAC and an Aquantia AQR105 PHY

   - Ethernet PHYs:
      - support for TJA1102S, TJA1121
      - dp83tg720: add randomized polling intervals for link detection
      - dp83822: support changing the transmit amplitude voltage
      - support for LEDs on 88q2xxx

   - CAN:
      - canxl: support Remote Request Substitution bit access
      - flexcan: add S32G2/S32G3 SoC

   - WiFi:
      - remove cooked monitor support
      - strict mode for better AP testing
      - basic EPCS support
      - OMI RX bandwidth reduction support
      - batman-adv: add support for jumbo frames

   - WiFi drivers:
      - RealTek (rtw88):
         - support RTL8814AE and RTL8814AU
      - RealTek (rtw89):
         - switch using wiphy_lock and wiphy_work
         - add BB context to manipulate two PHY as preparation of MLO
         - improve BT-coexistence mechanism to play A2DP smoothly
      - Intel (iwlwifi):
         - add new iwlmld sub-driver for latest HW/FW combinations
      - MediaTek (mt76):
         - preparation for mt7996 Multi-Link Operation (MLO) support
      - Qualcomm/Atheros (ath12k):
         - continued work on MLO
      - Silabs (wfx):
         - Wake-on-WLAN support

   - Bluetooth:
      - add support for skb TX SND/COMPLETION timestamping
      - hci_core: enable buffer flow control for SCO/eSCO
      - coredump: log devcd dumps into the monitor

   - Bluetooth drivers:
      - intel: add support to configure TX power
      - nxp: handle bootloader error during cmd5 and cmd7"

* tag 'net-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1681 commits)
  unix: fix up for "apparmor: add fine grained af_unix mediation"
  mctp: Fix incorrect tx flow invalidation condition in mctp-i2c
  net: usb: asix: ax88772: Increase phy_name size
  net: phy: Introduce PHY_ID_SIZE — minimum size for PHY ID string
  net: libwx: fix Tx L4 checksum
  net: libwx: fix Tx descriptor content for some tunnel packets
  atm: Fix NULL pointer dereference
  net: tn40xx: add pci-id of the aqr105-based Tehuti TN4010 cards
  net: tn40xx: prepare tn40xx driver to find phy of the TN9510 card
  net: tn40xx: create swnode for mdio and aqr105 phy and add to mdiobus
  net: phy: aquantia: add essential functions to aqr105 driver
  net: phy: aquantia: search for firmware-name in fwnode
  net: phy: aquantia: add probe function to aqr105 for firmware loading
  net: phy: Add swnode support to mdiobus_scan
  gve: add XDP DROP and PASS support for DQ
  gve: update XDP allocation path support RX buffer posting
  gve: merge packet buffer size fields
  gve: update GQ RX to use buf_size
  gve: introduce config-based allocation for XDP
  gve: remove xdp_xsk_done and xdp_xsk_wakeup statistics
  ...
2025-03-26 21:48:21 -07:00
Linus Torvalds
592329e5e9 Merge tag 'sysctl-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl
Pull sysctl updates from Joel Granados:

 - Move vm_table members out of kernel/sysctl.c

   All vm_table array members have moved to their respective subsystems
   leading to the removal of vm_table from kernel/sysctl.c. This
   increases modularity by placing the ctl_tables closer to where they
   are actually used and at the same time reducing the chances of merge
   conflicts in kernel/sysctl.c.

 - ctl_table range fixes

   Replace the proc_handler function that checks variable ranges in
   coredump_sysctls and vdso_table with the one that actually uses the
   extra{1,2} pointers as min/max values. This tightens the range of the
   values that users can pass into the kernel effectively preventing
   {under,over}flows.

 - Misc fixes

   Correct grammar errors and typos in test messages. Update sysctl
   files in MAINTAINERS. Constified and removed array size in
   declaration for alignment_tbl

* tag 'sysctl-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: (22 commits)
  selftests/sysctl: fix wording of help messages
  selftests: fix spelling/grammar errors in sysctl/sysctl.sh
  MAINTAINERS: Update sysctl file list in MAINTAINERS
  sysctl: Fix underflow value setting risk in vm_table
  coredump: Fixes core_pipe_limit sysctl proc_handler
  sysctl: remove unneeded include
  sysctl: remove the vm_table
  sh: vdso: move the sysctl to arch/sh/kernel/vsyscall/vsyscall.c
  x86: vdso: move the sysctl to arch/x86/entry/vdso/vdso32-setup.c
  fs: dcache: move the sysctl to fs/dcache.c
  sunrpc: simplify rpcauth_cache_shrink_count()
  fs: drop_caches: move sysctl to fs/drop_caches.c
  fs: fs-writeback: move sysctl to fs/fs-writeback.c
  mm: nommu: move sysctl to mm/nommu.c
  security: min_addr: move sysctl to security/min_addr.c
  mm: mmap: move sysctl to mm/mmap.c
  mm: util: move sysctls to mm/util.c
  mm: vmscan: move vmscan sysctls to mm/vmscan.c
  mm: swap: move sysctl to mm/swap.c
  mm: filemap: move sysctl to mm/filemap.c
  ...
2025-03-26 21:02:05 -07:00
Linus Torvalds
2e3fcbcc3b Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
 "Updates to the usual drivers (scsi_debug, ufs, lpfc, st, fnic, mpi3mr,
  mpt3sas) and the removal of cxlflash.

  The only non-trivial core change is an addition to unit attention
  handling to recognize UAs for power on/reset and new media so the tape
  driver can use it"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (107 commits)
  scsi: st: Tighten the page format heuristics with MODE SELECT
  scsi: st: ERASE does not change tape location
  scsi: st: Fix array overflow in st_setup()
  scsi: target: tcm_loop: Fix wrong abort tag
  scsi: lpfc: Restore clearing of NLP_UNREG_INP in ndlp->nlp_flag
  scsi: hisi_sas: Fixed failure to issue vendor specific commands
  scsi: fnic: Remove unnecessary NUL-terminations
  scsi: fnic: Remove redundant flush_workqueue() calls
  scsi: core: Use a switch statement when attaching VPD pages
  scsi: ufs: renesas: Add initialization code for R-Car S4-8 ES1.2
  scsi: ufs: renesas: Add reusable functions
  scsi: ufs: renesas: Refactor 0x10ad/0x10af PHY settings
  scsi: ufs: renesas: Remove register control helper function
  scsi: ufs: renesas: Add register read to remove save/set/restore
  scsi: ufs: renesas: Replace init data by init code
  scsi: ufs: dt-bindings: renesas,ufs: Add calibration data
  scsi: mpi3mr: Task Abort EH Support
  scsi: storvsc: Don't report the host packet status as the hv status
  scsi: isci: Make most module parameters static
  scsi: megaraid_sas: Make most module parameters static
  ...
2025-03-26 19:57:34 -07:00
Linus Torvalds
91928e0d3c Merge tag 'for-6.15/io_uring-20250322' of git://git.kernel.dk/linux
Pull io_uring updates from Jens Axboe:
 "This is the first of the io_uring pull requests for the 6.15 merge
  window, there will be others once the net tree has gone in. This
  contains:

   - Cleanup and unification of cancelation handling across various
     request types.

   - Improvement for bundles, supporting them both for incrementally
     consumed buffers, and for non-multishot requests.

   - Enable toggling of using iowait while waiting on io_uring events or
     not. Unfortunately this is still tied with CPU frequency boosting
     on short waits, as the scheduler side has not been very receptive
     to splitting the (useless) iowait stat from the cpufreq implied
     boost.

   - Add support for kbuf nodes, enabling zero-copy support for the ublk
     block driver.

   - Various cleanups for resource node handling.

   - Series greatly cleaning up the legacy provided (non-ring based)
     buffers. For years, we've been pushing the ring provided buffers as
     the way to go, and that is what people have been using. Reduce the
     complexity and code associated with legacy provided buffers.

   - Series cleaning up the compat handling.

   - Series improving and cleaning up the recvmsg/sendmsg iovec and msg
     handling.

   - Series of cleanups for io-wq.

   - Start adding a bunch of selftests. The liburing repository
     generally carries feature and regression tests for everything, but
     at least for ublk initially, we'll try and go the route of having
     it in selftests as well. We'll see how this goes, might decide to
     migrate more tests this way in the future.

   - Various little cleanups and fixes"

* tag 'for-6.15/io_uring-20250322' of git://git.kernel.dk/linux: (108 commits)
  selftests: ublk: add stripe target
  selftests: ublk: simplify loop io completion
  selftests: ublk: enable zero copy for null target
  selftests: ublk: prepare for supporting stripe target
  selftests: ublk: move common code into common.c
  selftests: ublk: increase max buffer size to 1MB
  selftests: ublk: add single sqe allocator helper
  selftests: ublk: add generic_01 for verifying sequential IO order
  selftests: ublk: fix starting ublk device
  io_uring: enable toggle of iowait usage when waiting on CQEs
  selftests: ublk: fix write cache implementation
  selftests: ublk: add variable for user to not show test result
  selftests: ublk: don't show `modprobe` failure
  selftests: ublk: add one dependency header
  io_uring/kbuf: enable bundles for incrementally consumed buffers
  Revert "io_uring/rsrc: simplify the bvec iter count calculation"
  selftests: ublk: improve test usability
  selftests: ublk: add stress test for covering IO vs. killing ublk server
  selftests: ublk: add one stress test for covering IO vs. removing device
  selftests: ublk: load/unload ublk_drv when preparing & cleaning up tests
  ...
2025-03-26 17:56:00 -07:00
Linus Torvalds
ee6740fd34 Merge tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull CRC updates from Eric Biggers:
 "Another set of improvements to the kernel's CRC (cyclic redundancy
  check) code:

   - Rework the CRC64 library functions to be directly optimized, like
     what I did last cycle for the CRC32 and CRC-T10DIF library
     functions

   - Rewrite the x86 PCLMULQDQ-optimized CRC code, and add VPCLMULQDQ
     support and acceleration for crc64_be and crc64_nvme

   - Rewrite the riscv Zbc-optimized CRC code, and add acceleration for
     crc_t10dif, crc64_be, and crc64_nvme

   - Remove crc_t10dif and crc64_rocksoft from the crypto API, since
     they are no longer needed there

   - Rename crc64_rocksoft to crc64_nvme, as the old name was incorrect

   - Add kunit test cases for crc64_nvme and crc7

   - Eliminate redundant functions for calculating the Castagnoli CRC32,
     settling on just crc32c()

   - Remove unnecessary prompts from some of the CRC kconfig options

   - Further optimize the x86 crc32c code"

* tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (36 commits)
  x86/crc: drop the avx10_256 functions and rename avx10_512 to avx512
  lib/crc: remove unnecessary prompt for CONFIG_CRC64
  lib/crc: remove unnecessary prompt for CONFIG_LIBCRC32C
  lib/crc: remove unnecessary prompt for CONFIG_CRC8
  lib/crc: remove unnecessary prompt for CONFIG_CRC7
  lib/crc: remove unnecessary prompt for CONFIG_CRC4
  lib/crc7: unexport crc7_be_syndrome_table
  lib/crc_kunit.c: update comment in crc_benchmark()
  lib/crc_kunit.c: add test and benchmark for crc7_be()
  x86/crc32: optimize tail handling for crc32c short inputs
  riscv/crc64: add Zbc optimized CRC64 functions
  riscv/crc-t10dif: add Zbc optimized CRC-T10DIF function
  riscv/crc32: reimplement the CRC32 functions using new template
  riscv/crc: add "template" for Zbc optimized CRC functions
  x86/crc: add ANNOTATE_NOENDBR to suppress objtool warnings
  x86/crc32: improve crc32c_arch() code generation with clang
  x86/crc64: implement crc64_be and crc64_nvme using new template
  x86/crc-t10dif: implement crc_t10dif using new template
  x86/crc32: implement crc32_le using new template
  x86/crc: add "template" for [V]PCLMULQDQ based CRC functions
  ...
2025-03-25 18:33:04 -07:00
Linus Torvalds
edb0e8f6e2 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
 "ARM:

   - Nested virtualization support for VGICv3, giving the nested
     hypervisor control of the VGIC hardware when running an L2 VM

   - Removal of 'late' nested virtualization feature register masking,
     making the supported feature set directly visible to userspace

   - Support for emulating FEAT_PMUv3 on Apple silicon, taking advantage
     of an IMPLEMENTATION DEFINED trap that covers all PMUv3 registers

   - Paravirtual interface for discovering the set of CPU
     implementations where a VM may run, addressing a longstanding issue
     of guest CPU errata awareness in big-little systems and
     cross-implementation VM migration

   - Userspace control of the registers responsible for identifying a
     particular CPU implementation (MIDR_EL1, REVIDR_EL1, AIDR_EL1),
     allowing VMs to be migrated cross-implementation

   - pKVM updates, including support for tracking stage-2 page table
     allocations in the protected hypervisor in the 'SecPageTable' stat

   - Fixes to vPMU, ensuring that userspace updates to the vPMU after
     KVM_RUN are reflected into the backing perf events

  LoongArch:

   - Remove unnecessary header include path

   - Assume constant PGD during VM context switch

   - Add perf events support for guest VM

  RISC-V:

   - Disable the kernel perf counter during configure

   - KVM selftests improvements for PMU

   - Fix warning at the time of KVM module removal

  x86:

   - Add support for aging of SPTEs without holding mmu_lock.

     Not taking mmu_lock allows multiple aging actions to run in
     parallel, and more importantly avoids stalling vCPUs. This includes
     an implementation of per-rmap-entry locking; aging the gfn is done
     with only a per-rmap single-bin spinlock taken, whereas locking an
     rmap for write requires taking both the per-rmap spinlock and the
     mmu_lock.

     Note that this decreases slightly the accuracy of accessed-page
     information, because changes to the SPTE outside aging might not
     use atomic operations even if they could race against a clear of
     the Accessed bit.

     This is deliberate because KVM and mm/ tolerate false
     positives/negatives for accessed information, and testing has shown
     that reducing the latency of aging is far more beneficial to
     overall system performance than providing "perfect" young/old
     information.

   - Defer runtime CPUID updates until KVM emulates a CPUID instruction,
     to coalesce updates when multiple pieces of vCPU state are
     changing, e.g. as part of a nested transition

   - Fix a variety of nested emulation bugs, and add VMX support for
     synthesizing nested VM-Exit on interception (instead of injecting
     #UD into L2)

   - Drop "support" for async page faults for protected guests that do
     not set SEND_ALWAYS (i.e. that only want async page faults at CPL3)

   - Bring a bit of sanity to x86's VM teardown code, which has
     accumulated a lot of cruft over the years. Particularly, destroy
     vCPUs before the MMU, despite the latter being a VM-wide operation

   - Add common secure TSC infrastructure for use within SNP and in the
     future TDX

   - Block KVM_CAP_SYNC_REGS if guest state is protected. It does not
     make sense to use the capability if the relevant registers are not
     available for reading or writing

   - Don't take kvm->lock when iterating over vCPUs in the suspend
     notifier to fix a largely theoretical deadlock

   - Use the vCPU's actual Xen PV clock information when starting the
     Xen timer, as the cached state in arch.hv_clock can be stale/bogus

   - Fix a bug where KVM could bleed PVCLOCK_GUEST_STOPPED across
     different PV clocks; restrict PVCLOCK_GUEST_STOPPED to kvmclock, as
     KVM's suspend notifier only accounts for kvmclock, and there's no
     evidence that the flag is actually supported by Xen guests

   - Clean up the per-vCPU "cache" of its reference pvclock, and instead
     only track the vCPU's TSC scaling (multipler+shift) metadata (which
     is moderately expensive to compute, and rarely changes for modern
     setups)

   - Don't write to the Xen hypercall page on MSR writes that are
     initiated by the host (userspace or KVM) to fix a class of bugs
     where KVM can write to guest memory at unexpected times, e.g.
     during vCPU creation if userspace has set the Xen hypercall MSR
     index to collide with an MSR that KVM emulates

   - Restrict the Xen hypercall MSR index to the unofficial synthetic
     range to reduce the set of possible collisions with MSRs that are
     emulated by KVM (collisions can still happen as KVM emulates
     Hyper-V MSRs, which also reside in the synthetic range)

   - Clean up and optimize KVM's handling of Xen MSR writes and
     xen_hvm_config

   - Update Xen TSC leaves during CPUID emulation instead of modifying
     the CPUID entries when updating PV clocks; there is no guarantee PV
     clocks will be updated between TSC frequency changes and CPUID
     emulation, and guest reads of the TSC leaves should be rare, i.e.
     are not a hot path

  x86 (Intel):

   - Fix a bug where KVM unnecessarily reads XFD_ERR from hardware and
     thus modifies the vCPU's XFD_ERR on a #NM due to CR0.TS=1

   - Pass XFD_ERR as the payload when injecting #NM, as a preparatory
     step for upcoming FRED virtualization support

   - Decouple the EPT entry RWX protection bit macros from the EPT
     Violation bits, both as a general cleanup and in anticipation of
     adding support for emulating Mode-Based Execution Control (MBEC)

   - Reject KVM_RUN if userspace manages to gain control and stuff
     invalid guest state while KVM is in the middle of emulating nested
     VM-Enter

   - Add a macro to handle KVM's sanity checks on entry/exit VMCS
     control pairs in anticipation of adding sanity checks for secondary
     exit controls (the primary field is out of bits)

  x86 (AMD):

   - Ensure the PSP driver is initialized when both the PSP and KVM
     modules are built-in (the initcall framework doesn't handle
     dependencies)

   - Use long-term pins when registering encrypted memory regions, so
     that the pages are migrated out of MIGRATE_CMA/ZONE_MOVABLE and
     don't lead to excessive fragmentation

   - Add macros and helpers for setting GHCB return/error codes

   - Add support for Idle HLT interception, which elides interception if
     the vCPU has a pending, unmasked virtual IRQ when HLT is executed

   - Fix a bug in INVPCID emulation where KVM fails to check for a
     non-canonical address

   - Don't attempt VMRUN for SEV-ES+ guests if the vCPU's VMSA is
     invalid, e.g. because the vCPU was "destroyed" via SNP's AP
     Creation hypercall

   - Reject SNP AP Creation if the requested SEV features for the vCPU
     don't match the VM's configured set of features

  Selftests:

   - Fix again the Intel PMU counters test; add a data load and do
     CLFLUSH{OPT} on the data instead of executing code. The theory is
     that modern Intel CPUs have learned new code prefetching tricks
     that bypass the PMU counters

   - Fix a flaw in the Intel PMU counters test where it asserts that an
     event is counting correctly without actually knowing what the event
     counts on the underlying hardware

   - Fix a variety of flaws, bugs, and false failures/passes
     dirty_log_test, and improve its coverage by collecting all dirty
     entries on each iteration

   - Fix a few minor bugs related to handling of stats FDs

   - Add infrastructure to make vCPU and VM stats FDs available to tests
     by default (open the FDs during VM/vCPU creation)

   - Relax an assertion on the number of HLT exits in the xAPIC IPI test
     when running on a CPU that supports AMD's Idle HLT (which elides
     interception of HLT if a virtual IRQ is pending and unmasked)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (216 commits)
  RISC-V: KVM: Optimize comments in kvm_riscv_vcpu_isa_disable_allowed
  RISC-V: KVM: Teardown riscv specific bits after kvm_exit
  LoongArch: KVM: Register perf callbacks for guest
  LoongArch: KVM: Implement arch-specific functions for guest perf
  LoongArch: KVM: Add stub for kvm_arch_vcpu_preempted_in_kernel()
  LoongArch: KVM: Remove PGD saving during VM context switch
  LoongArch: KVM: Remove unnecessary header include path
  KVM: arm64: Tear down vGIC on failed vCPU creation
  KVM: arm64: PMU: Reload when resetting
  KVM: arm64: PMU: Reload when user modifies registers
  KVM: arm64: PMU: Fix SET_ONE_REG for vPMC regs
  KVM: arm64: PMU: Assume PMU presence in pmu-emul.c
  KVM: arm64: PMU: Set raw values from user to PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}
  KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpu
  KVM: arm64: Factor out pKVM hyp vcpu creation to separate function
  KVM: arm64: Initialize HCRX_EL2 traps in pKVM
  KVM: arm64: Factor out setting HCRX_EL2 traps into separate function
  KVM: x86: block KVM_CAP_SYNC_REGS if guest state is protected
  KVM: x86: Add infrastructure for secure TSC
  KVM: x86: Push down setting vcpu.arch.user_set_tsc
  ...
2025-03-25 14:22:07 -07:00
Linus Torvalds
2d09a9449e Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
 "Nothing major this time around.

  Apart from the usual perf/PMU updates, some page table cleanups, the
  notable features are average CPU frequency based on the AMUv1
  counters, CONFIG_HOTPLUG_SMT and MOPS instructions (memcpy/memset) in
  the uaccess routines.

  Perf and PMUs:

   - Support for the 'Rainier' CPU PMU from Arm

   - Preparatory driver changes and cleanups that pave the way for BRBE
     support

   - Support for partial virtualisation of the Apple-M1 PMU

   - Support for the second event filter in Arm CSPMU designs

   - Minor fixes and cleanups (CMN and DWC PMUs)

   - Enable EL2 requirements for FEAT_PMUv3p9

  Power, CPU topology:

   - Support for AMUv1-based average CPU frequency

   - Run-time SMT control wired up for arm64 (CONFIG_HOTPLUG_SMT). It
     adds a generic topology_is_primary_thread() function overridden by
     x86 and powerpc

  New(ish) features:

   - MOPS (memcpy/memset) support for the uaccess routines

  Security/confidential compute:

   - Fix the DMA address for devices used in Realms with Arm CCA. The
     CCA architecture uses the address bit to differentiate between
     shared and private addresses

   - Spectre-BHB: assume CPUs Linux doesn't know about vulnerable by
     default

  Memory management clean-ups:

   - Drop the P*D_TABLE_BIT definition in preparation for 128-bit PTEs

   - Some minor page table accessor clean-ups

   - PIE/POE (permission indirection/overlay) helpers clean-up

  Kselftests:

   - MTE: skip hugetlb tests if MTE is not supported on such mappings
     and user correct naming for sync/async tag checking modes

  Miscellaneous:

   - Add a PKEY_UNRESTRICTED definition as 0 to uapi (toolchain people
     request)

   - Sysreg updates for new register fields

   - CPU type info for some Qualcomm Kryo cores"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (72 commits)
  arm64: mm: Don't use %pK through printk
  perf/arm_cspmu: Fix missing io.h include
  arm64: errata: Add newer ARM cores to the spectre_bhb_loop_affected() lists
  arm64: cputype: Add MIDR_CORTEX_A76AE
  arm64: errata: Add KRYO 2XX/3XX/4XX silver cores to Spectre BHB safe list
  arm64: errata: Assume that unknown CPUs _are_ vulnerable to Spectre BHB
  arm64: errata: Add QCOM_KRYO_4XX_GOLD to the spectre_bhb_k24_list
  arm64/sysreg: Enforce whole word match for open/close tokens
  arm64/sysreg: Fix unbalanced closing block
  arm64: Kconfig: Enable HOTPLUG_SMT
  arm64: topology: Support SMT control on ACPI based system
  arch_topology: Support SMT control for OF based system
  cpu/SMT: Provide a default topology_is_primary_thread()
  arm64/mm: Define PTDESC_ORDER
  perf/arm_cspmu: Add PMEVFILT2R support
  perf/arm_cspmu: Generalise event filtering
  perf/arm_cspmu: Move register definitons to header
  arm64/kernel: Always use level 2 or higher for early mappings
  arm64/mm: Drop PXD_TABLE_BIT
  arm64/mm: Check pmd_table() in pmd_trans_huge()
  ...
2025-03-25 13:16:16 -07:00
Catalin Marinas
8cc14fdcc1 Merge branches 'for-next/amuv1-avg-freq', 'for-next/pkey_unrestricted', 'for-next/sysreg', 'for-next/misc', 'for-next/pgtable-cleanups', 'for-next/kselftest', 'for-next/uaccess-mops', 'for-next/pie-poe-cleanup', 'for-next/cputype-kryo', 'for-next/cca-dma-address', 'for-next/drop-pxd_table_bit' and 'for-next/spectre-bhb-assume-vulnerable', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf:
  perf/arm_cspmu: Fix missing io.h include
  perf/arm_cspmu: Add PMEVFILT2R support
  perf/arm_cspmu: Generalise event filtering
  perf/arm_cspmu: Move register definitons to header
  drivers/perf: apple_m1: Support host/guest event filtering
  drivers/perf: apple_m1: Refactor event select/filter configuration
  perf/dwc_pcie: fix duplicate pci_dev devices
  perf/dwc_pcie: fix some unreleased resources
  perf/arm-cmn: Minor event type housekeeping
  perf: arm_pmu: Move PMUv3-specific data
  perf: apple_m1: Don't disable counter in m1_pmu_enable_event()
  perf: arm_v7_pmu: Don't disable counter in (armv7|krait_|scorpion_)pmu_enable_event()
  perf: arm_v7_pmu: Drop obvious comments for enabling/disabling counters and interrupts
  perf: arm_pmuv3: Don't disable counter in armv8pmu_enable_event()
  perf: arm_pmu: Don't disable counter in armpmu_add()
  perf: arm_pmuv3: Call kvm_vcpu_pmu_resync_el0() before enabling counters
  perf: arm_pmuv3: Add support for ARM Rainier PMU

* for-next/amuv1-avg-freq:
  : Add support for AArch64 AMUv1-based average freq
  arm64: Utilize for_each_cpu_wrap for reference lookup
  arm64: Update AMU-based freq scale factor on entering idle
  arm64: Provide an AMU-based version of arch_freq_get_on_cpu
  cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry
  cpufreq: Allow arch_freq_get_on_cpu to return an error
  arch_topology: init capacity_freq_ref to 0

* for-next/pkey_unrestricted:
  : mm/pkey: Add PKEY_UNRESTRICTED macro
  selftest/powerpc/mm/pkey: fix build-break introduced by commit 00894c3fc9
  selftests/powerpc: Use PKEY_UNRESTRICTED macro
  selftests/mm: Use PKEY_UNRESTRICTED macro
  mm/pkey: Add PKEY_UNRESTRICTED macro

* for-next/sysreg:
  : arm64 sysreg updates
  arm64/sysreg: Enforce whole word match for open/close tokens
  arm64/sysreg: Fix unbalanced closing block
  arm64/sysreg: Add register fields for HFGWTR2_EL2
  arm64/sysreg: Add register fields for HFGRTR2_EL2
  arm64/sysreg: Add register fields for HFGITR2_EL2
  arm64/sysreg: Add register fields for HDFGWTR2_EL2
  arm64/sysreg: Add register fields for HDFGRTR2_EL2
  arm64/sysreg: Update register fields for ID_AA64MMFR0_EL1

* for-next/misc:
  : Miscellaneous arm64 patches
  arm64: mm: Don't use %pK through printk
  arm64/fpsimd: Remove unused declaration fpsimd_kvm_prepare()

* for-next/pgtable-cleanups:
  : arm64 pgtable accessors cleanup
  arm64/mm: Define PTDESC_ORDER
  arm64/kernel: Always use level 2 or higher for early mappings
  arm64/hugetlb: Consistently use pud_sect_supported()
  arm64/mm: Convert __pte_to_phys() and __phys_to_pte_val() as functions

* for-next/kselftest:
  : arm64 kselftest updates
  kselftest/arm64: mte: Skip the hugetlb tests if MTE not supported on such mappings
  kselftest/arm64: mte: Use the correct naming for tag check modes in check_hugetlb_options.c

* for-next/uaccess-mops:
  : Implement the uaccess memory copy/set using MOPS instructions
  arm64: lib: Use MOPS for usercopy routines
  arm64: mm: Handle PAN faults on uaccess CPY* instructions
  arm64: extable: Add fixup handling for uaccess CPY* instructions

* for-next/pie-poe-cleanup:
  : PIE/POE helpers cleanup
  arm64/sysreg: Move POR_EL0_INIT to asm/por.h
  arm64/sysreg: Rename POE_RXW to POE_RWX
  arm64/sysreg: Improve PIR/POR helpers

* for-next/cputype-kryo:
  : Add cputype info for some Qualcomm Kryo cores
  arm64: cputype: Add comments about Qualcomm Kryo 5XX and 6XX cores
  arm64: cputype: Add QCOM_CPU_PART_KRYO_3XX_GOLD

* for-next/cca-dma-address:
  : Fix DMA address for devices used in realms with Arm CCA
  arm64: realm: Use aliased addresses for device DMA to shared buffers
  dma: Introduce generic dma_addr_*crypted helpers
  dma: Fix encryption bit clearing for dma_to_phys

* for-next/drop-pxd_table_bit:
  : Drop the arm64 PXD_TABLE_BIT (clean-up in preparation for 128-bit PTEs)
  arm64/mm: Drop PXD_TABLE_BIT
  arm64/mm: Check pmd_table() in pmd_trans_huge()
  arm64/mm: Check PUD_TYPE_TABLE in pud_bad()
  arm64/mm: Check PXD_TYPE_TABLE in [p4d|pgd]_bad()
  arm64/mm: Clear PXX_TYPE_MASK and set PXD_TYPE_SECT in [pmd|pud]_mkhuge()
  arm64/mm: Clear PXX_TYPE_MASK in mk_[pmd|pud]_sect_prot()
  arm64/ptdump: Test PMD_TYPE_MASK for block mapping
  KVM: arm64: ptdump: Test PMD_TYPE_MASK for block mapping

* for-next/spectre-bhb-assume-vulnerable:
  : Rework Spectre BHB mitigations to not assume "safe"
  arm64: errata: Add newer ARM cores to the spectre_bhb_loop_affected() lists
  arm64: cputype: Add MIDR_CORTEX_A76AE
  arm64: errata: Add KRYO 2XX/3XX/4XX silver cores to Spectre BHB safe list
  arm64: errata: Assume that unknown CPUs _are_ vulnerable to Spectre BHB
  arm64: errata: Add QCOM_KRYO_4XX_GOLD to the spectre_bhb_k24_list
2025-03-25 19:32:03 +00:00
Linus Torvalds
317a76a996 Merge tag 'timers-vdso-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull VDSO infrastructure updates from Thomas Gleixner:

 - Consolidate the VDSO storage

   The VDSO data storage and data layout has been largely architecture
   specific for historical reasons. That increases the maintenance
   effort and causes inconsistencies over and over.

   There is no real technical reason for architecture specific layouts
   and implementations. The architecture specific details can easily be
   integrated into a generic layout, which also reduces the amount of
   duplicated code for managing the mappings.

   Convert all architectures over to a unified layout and common mapping
   infrastructure. This splits the VDSO data layout into subsystem
   specific blocks, timekeeping, random and architecture parts, which
   provides a better structure and allows to improve and update the
   functionalities without conflict and interaction.

 - Rework the timekeeping data storage

   The current implementation is designed for exposing system
   timekeeping accessors, which was good enough at the time when it was
   designed.

   PTP and Time Sensitive Networking (TSN) change that as there are
   requirements to expose independent PTP clocks, which are not related
   to system timekeeping.

   Replace the monolithic data storage by a structured layout, which
   allows to add support for independent PTP clocks on top while reusing
   both the data structures and the time accessor implementations.

* tag 'timers-vdso-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits)
  sparc/vdso: Always reject undefined references during linking
  x86/vdso: Always reject undefined references during linking
  vdso: Rework struct vdso_time_data and introduce struct vdso_clock
  vdso: Move architecture related data before basetime data
  powerpc/vdso: Prepare introduction of struct vdso_clock
  arm64/vdso: Prepare introduction of struct vdso_clock
  x86/vdso: Prepare introduction of struct vdso_clock
  time/namespace: Prepare introduction of struct vdso_clock
  vdso/namespace: Rename timens_setup_vdso_data() to reflect new vdso_clock struct
  vdso/vsyscall: Prepare introduction of struct vdso_clock
  vdso/gettimeofday: Prepare helper functions for introduction of struct vdso_clock
  vdso/gettimeofday: Prepare do_coarse_timens() for introduction of struct vdso_clock
  vdso/gettimeofday: Prepare do_coarse() for introduction of struct vdso_clock
  vdso/gettimeofday: Prepare do_hres_timens() for introduction of struct vdso_clock
  vdso/gettimeofday: Prepare do_hres() for introduction of struct vdso_clock
  vdso/gettimeofday: Prepare introduction of struct vdso_clock
  vdso/helpers: Prepare introduction of struct vdso_clock
  vdso/datapage: Define vdso_clock to prepare for multiple PTP clocks
  vdso: Make vdso_time_data cacheline aligned
  arm64: Make asm/cache.h compatible with vDSO
  ...
2025-03-25 11:30:42 -07:00
Eric Dumazet
a7c428ee8f tcp/dccp: remove icsk->icsk_timeout
icsk->icsk_timeout can be replaced by icsk->icsk_retransmit_timer.expires

This saves 8 bytes in TCP/DCCP sockets and helps for better cache locality.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250324203607.703850-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25 10:34:33 -07:00
Linus Torvalds
d5048d1176 Merge tag 'timers-core-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer core updates from Thomas Gleixner:

 - Fix a memory ordering issue in posix-timers

   Posix-timer lookup is lockless and reevaluates the timer validity
   under the timer lock, but the update which validates the timer is not
   protected by the timer lock. That allows the store to be reordered
   against the initialization stores, so that the lookup side can
   observe a partially initialized timer. That's mostly a theoretical
   problem, but incorrect nevertheless.

 - Fix a long standing inconsistency of the coarse time getters

   The coarse time getters read the base time of the current update
   cycle without reading the actual hardware clock. NTP frequency
   adjustment can set the base time backwards. The fine grained
   interfaces compensate this by reading the clock and applying the new
   conversion factor, but the coarse grained time getters use the base
   time directly. That allows the user to observe time going backwards.

   Cure it by always forwarding base time, when NTP changes the
   frequency with an immediate step.

 - Rework of posix-timer hashing

   The posix-timer hash is not scalable and due to the CRIU timer
   restore mechanism prone to massive contention on the global hash
   bucket lock.

   Replace the global hash lock with a fine grained per bucket locking
   scheme to address that.

 - Rework the proc/$PID/timers interface.

   /proc/$PID/timers is provided for CRIU to be able to restore a timer.
   The printout happens with sighand lock held and interrupts disabled.
   That's not required as this can be done with RCU protection as well.

 - Provide a sane mechanism for CRIU to restore a timer ID

   CRIU restores timers by creating and deleting them until the kernel
   internal per process ID counter reached the requested ID. That's
   horribly slow for sparse timer IDs.

   Provide a prctl() which allows CRIU to restore a timer with a given
   ID. When enabled the ID pointer is used as input pointer to read the
   requested ID from user space. When disabled, the normal allocation
   scheme (next ID) is active as before. This is backwards compatible
   for both kernel and user space.

 - Make hrtimer_update_function() less expensive.

   The sanity checks are valuable, but expensive for high frequency
   usage in io/uring. Make the debug checks conditional and enable them
   only when lockdep is enabled.

 - Small updates, cleanups and improvements

* tag 'timers-core-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
  selftests/timers: Improve skew_consistency by testing with other clockids
  timekeeping: Fix possible inconsistencies in _COARSE clockids
  posix-timers: Drop redundant memset() invocation
  selftests/timers/posix-timers: Add a test for exact allocation mode
  posix-timers: Provide a mechanism to allocate a given timer ID
  posix-timers: Dont iterate /proc/$PID/timers with sighand:: Siglock held
  posix-timers: Make per process list RCU safe
  posix-timers: Avoid false cacheline sharing
  posix-timers: Switch to jhash32()
  posix-timers: Improve hash table performance
  posix-timers: Make signal_struct:: Next_posix_timer_id an atomic_t
  posix-timers: Make lock_timer() use guard()
  posix-timers: Rework timer removal
  posix-timers: Simplify lock/unlock_timer()
  posix-timers: Use guards in a few places
  posix-timers: Remove SLAB_PANIC from kmem cache
  posix-timers: Remove a few paranoid warnings
  posix-timers: Cleanup includes
  posix-timers: Add cond_resched() to posix_timer_add() search loop
  posix-timers: Initialise timer before adding it to the hash table
  ...
2025-03-25 10:33:23 -07:00
Dmitry Safonov
edbac739e4 selftests/net: Drop timeout argument from test_client_verify()
It's always TEST_TIMEOUT_SEC, with an unjustified exception in rst test,
that is more paranoia-long timeout rather than based on requirements.

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-7-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25 06:10:30 -07:00
Dmitry Safonov
1e1738faa2 selftests/net: Delete timeout from test_connect_socket()
Unused: it's always either the default timeout or asynchronous
connect().

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-6-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25 06:10:30 -07:00
Dmitry Safonov
266ed1ace8 selftests/net: Print the testing side in unsigned-md5
As both client and server print the same test name on failure or pass,
add "[server]" so that it's more obvious from a log which side printed
"ok" or "not ok".

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-5-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25 06:10:30 -07:00
Dmitry Safonov
3f36781e57 selftests/net: Add mixed select()+polling mode to TCP-AO tests
Currently, tcp_ao tests have two timeouts: TEST_RETRANSMIT_SEC and
TEST_TIMEOUT_SEC [by default 1 and 5 seconds]. The first one,
TEST_RETRANSMIT_SEC is used for operations that are expected to succeed
in order for a test to pass. It is usually not consumed and exists only
to avoid indefinite test run if the operation didn't complete.
The second one, TEST_RETRANSMIT_SEC exists for the tests that checking
operations, that are expected to fail/timeout. It is shorter as it is
fully consumed, with an expectation that if operation didn't succeed
during that period, it will timeout. And the related test that expects
the timeout is passing. The actual operation failure is then
cross-verified by other means like counters checks.

The issue with TEST_RETRANSMIT_SEC timeout is that 1 second is the exact
initial TCP timeout. So, in case the initial segment gets lost (quite
unlikely on local veth interface between two net namespaces, yet happens
in slow VMs), the retransmission never happens and as a result, the test
is not actually testing the functionality. Which in the end fails
counters checks.

As I want tcp_ao selftests to be fast and finishing in a reasonable
amount of time on manual run, I didn't consider increasing
TEST_RETRANSMIT_SEC.

Rather, initially, BPF_SOCK_OPS_TIMEOUT_INIT looked promising as a lever
to make the initial TCP timeout shorter. But as it's not a socket bpf
attached thing, but sock_ops (attaches to cgroups), the selftests would
have to use libbpf, which I wanted to avoid if not absolutely required.

Instead, use a mixed select() and counters polling mode with the longer
TEST_TIMEOUT_SEC timeout to detect running-away failed tests. It
actually not only allows losing segments and succeeding after
the previous TEST_RETRANSMIT_SEC timeout was consumed, but makes
the tests expecting timeout/failure pass faster.

The only test case taking longer (TEST_TIMEOUT_SEC) now is connect-deny
"wrong snd id", which checks for no key on SYN-ACK for which there is no
counter in the kernel (see tcp_make_synack()). Yet it can be speed up
by poking skpair from the trace event (see trace_tcp_ao_synack_no_key).

Fixes: ed9d09b309 ("selftests/net: Add a test for TCP-AO keys matching")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20241205070656.6ef344d7@kernel.org/
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-4-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25 06:10:30 -07:00
Dmitry Safonov
5a0a3193f6 selftests/net: Fetch and check TCP-MD5 counters
There are related TCP-MD5 <=> TCP and TCP-MD5 <=> TCP-AO tests
that can benefit from checking the related counters, not only from
validating operations timeouts.

It also prepares the code for introduction of mixed select()+poll mode,
see the follow-up patches.

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-3-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25 06:10:30 -07:00
Dmitry Safonov
1fe4221093 selftests/net: Provide tcp-ao counters comparison helper
Rename __test_tcp_ao_counters_cmp() into test_assert_counters_ao() and
test_tcp_ao_key_counters_cmp() into test_assert_counters_key() as they
are asserts, rather than just compare functions.

Provide test_cmp_counters() helper, that's going to be used to compare
ao_info and netns counters as a stop condition for polling the sockets.

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-2-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25 06:10:29 -07:00
Dmitry Safonov
65ffdf31be selftests/net: Print TCP flags in more common format
Before:
># 13145[lib/ftrace-tcp.c:427] trace event filter tcp_ao_key_not_found [2001:db8:1::1:-1 => 2001:db8:254::1:7010, L3index 0, flags: !FS!R!P!., keyid: 100, rnext: 100, maclen: -1, sne: -1] = 1

After:
># 13487[lib/ftrace-tcp.c:427] trace event filter tcp_ao_key_not_found [2001:db8:1::1:-1 => 2001:db8:254::1:7010, L3index 0, flags: S, keyid: 100, rnext: 100, maclen: -1, sne: -1] = 1

For the history, I think the initial format was to emphasize the absence
of flags as well as their presence (!R meant no RST flag). But looking
again, it's just unreadable and hard to understand.
Make it the standard/expected one.

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-1-da48040153d1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25 06:10:29 -07:00
Linus Torvalds
a49a879f0a Merge tag 'x86-cleanups-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
 "Miscellaneous x86 cleanups by Arnd Bergmann, Charles Han, Mirsad
  Todorovac, Randy Dunlap, Thorsten Blum and Zhang Kunbo"

* tag 'x86-cleanups-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/coco: Replace 'static const cc_mask' with the newly introduced cc_get_mask() function
  x86/delay: Fix inconsistent whitespace
  selftests/x86/syscall: Fix coccinelle WARNING recommending the use of ARRAY_SIZE()
  x86/platform: Fix missing declaration of 'x86_apple_machine'
  x86/irq: Fix missing declaration of 'io_apic_irqs'
  x86/usercopy: Fix kernel-doc func param name in clean_cache_range()'s description
  x86/apic: Use str_disabled_enabled() helper in print_ipi_mode()
2025-03-24 22:39:53 -07:00
Linus Torvalds
71b639af06 Merge tag 'x86-fpu-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/fpu updates from Ingo Molnar:

 - Improve crypto performance by making kernel-mode FPU reliably usable
   in softirqs ((Eric Biggers)

 - Fully optimize out WARN_ON_FPU() (Eric Biggers)

 - Initial steps to support Support Intel APX (Advanced Performance
   Extensions) (Chang S. Bae)

 - Fix KASAN for arch_dup_task_struct() (Benjamin Berg)

 - Refine and simplify the FPU magic number check during signal return
   (Chang S. Bae)

 - Fix inconsistencies in guest FPU xfeatures (Chao Gao, Stanislav
   Spassov)

 - selftests/x86/xstate: Introduce common code for testing extended
   states (Chang S. Bae)

 - Misc fixes and cleanups (Borislav Petkov, Colin Ian King, Uros
   Bizjak)

* tag 'x86-fpu-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu/xstate: Fix inconsistencies in guest FPU xfeatures
  x86/fpu: Clarify the "xa" symbolic name used in the XSTATE* macros
  x86/fpu: Use XSAVE{,OPT,C,S} and XRSTOR{,S} mnemonics in xstate.h
  x86/fpu: Improve crypto performance by making kernel-mode FPU reliably usable in softirqs
  x86/fpu/xstate: Simplify print_xstate_features()
  x86/fpu: Refine and simplify the magic number check during signal return
  selftests/x86/xstate: Fix spelling mistake "hader" -> "header"
  x86/fpu: Avoid copying dynamic FP state from init_task in arch_dup_task_struct()
  vmlinux.lds.h: Remove entry to place init_task onto init_stack
  selftests/x86/avx: Add AVX tests
  selftests/x86/xstate: Clarify supported xstates
  selftests/x86/xstate: Consolidate test invocations into a single entry
  selftests/x86/xstate: Introduce signal ABI test
  selftests/x86/xstate: Refactor ptrace ABI test
  selftests/x86/xstate: Refactor context switching test
  selftests/x86/xstate: Enumerate and name xstate components
  selftests/x86/xstate: Refactor XSAVE helpers for general use
  selftests/x86: Consolidate redundant signal helper functions
  x86/fpu: Fix guest FPU state buffer allocation size
  x86/fpu: Fully optimize out WARN_ON_FPU()
2025-03-24 22:27:18 -07:00
Linus Torvalds
e34c38057a Merge tag 'x86-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core x86 updates from Ingo Molnar:
 "x86 CPU features support:
   - Generate the <asm/cpufeaturemasks.h> header based on build config
     (H. Peter Anvin, Xin Li)
   - x86 CPUID parsing updates and fixes (Ahmed S. Darwish)
   - Introduce the 'setcpuid=' boot parameter (Brendan Jackman)
   - Enable modifying CPU bug flags with '{clear,set}puid=' (Brendan
     Jackman)
   - Utilize CPU-type for CPU matching (Pawan Gupta)
   - Warn about unmet CPU feature dependencies (Sohil Mehta)
   - Prepare for new Intel Family numbers (Sohil Mehta)

  Percpu code:
   - Standardize & reorganize the x86 percpu layout and related cleanups
     (Brian Gerst)
   - Convert the stackprotector canary to a regular percpu variable
     (Brian Gerst)
   - Add a percpu subsection for cache hot data (Brian Gerst)
   - Unify __pcpu_op{1,2}_N() macros to __pcpu_op_N() (Uros Bizjak)
   - Construct __percpu_seg_override from __percpu_seg (Uros Bizjak)

  MM:
   - Add support for broadcast TLB invalidation using AMD's INVLPGB
     instruction (Rik van Riel)
   - Rework ROX cache to avoid writable copy (Mike Rapoport)
   - PAT: restore large ROX pages after fragmentation (Kirill A.
     Shutemov, Mike Rapoport)
   - Make memremap(MEMREMAP_WB) map memory as encrypted by default
     (Kirill A. Shutemov)
   - Robustify page table initialization (Kirill A. Shutemov)
   - Fix flush_tlb_range() when used for zapping normal PMDs (Jann Horn)
   - Clear _PAGE_DIRTY for kernel mappings when we clear _PAGE_RW
     (Matthew Wilcox)

  KASLR:
   - x86/kaslr: Reduce KASLR entropy on most x86 systems, to support PCI
     BAR space beyond the 10TiB region (CONFIG_PCI_P2PDMA=y) (Balbir
     Singh)

  CPU bugs:
   - Implement FineIBT-BHI mitigation (Peter Zijlstra)
   - speculation: Simplify and make CALL_NOSPEC consistent (Pawan Gupta)
   - speculation: Add a conditional CS prefix to CALL_NOSPEC (Pawan
     Gupta)
   - RFDS: Exclude P-only parts from the RFDS affected list (Pawan
     Gupta)

  System calls:
   - Break up entry/common.c (Brian Gerst)
   - Move sysctls into arch/x86 (Joel Granados)

  Intel LAM support updates: (Maciej Wieczor-Retman)
   - selftests/lam: Move cpu_has_la57() to use cpuinfo flag
   - selftests/lam: Skip test if LAM is disabled
   - selftests/lam: Test get_user() LAM pointer handling

  AMD SMN access updates:
   - Add SMN offsets to exclusive region access (Mario Limonciello)
   - Add support for debugfs access to SMN registers (Mario Limonciello)
   - Have HSMP use SMN through AMD_NODE (Yazen Ghannam)

  Power management updates: (Patryk Wlazlyn)
   - Allow calling mwait_play_dead with an arbitrary hint
   - ACPI/processor_idle: Add FFH state handling
   - intel_idle: Provide the default enter_dead() handler
   - Eliminate mwait_play_dead_cpuid_hint()

  Build system:
   - Raise the minimum GCC version to 8.1 (Brian Gerst)
   - Raise the minimum LLVM version to 15.0.0 (Nathan Chancellor)

  Kconfig: (Arnd Bergmann)
   - Add cmpxchg8b support back to Geode CPUs
   - Drop 32-bit "bigsmp" machine support
   - Rework CONFIG_GENERIC_CPU compiler flags
   - Drop configuration options for early 64-bit CPUs
   - Remove CONFIG_HIGHMEM64G support
   - Drop CONFIG_SWIOTLB for PAE
   - Drop support for CONFIG_HIGHPTE
   - Document CONFIG_X86_INTEL_MID as 64-bit-only
   - Remove old STA2x11 support
   - Only allow CONFIG_EISA for 32-bit

  Headers:
   - Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI and non-UAPI
     headers (Thomas Huth)

  Assembly code & machine code patching:
   - x86/alternatives: Simplify alternative_call() interface (Josh
     Poimboeuf)
   - x86/alternatives: Simplify callthunk patching (Peter Zijlstra)
   - KVM: VMX: Use named operands in inline asm (Josh Poimboeuf)
   - x86/hyperv: Use named operands in inline asm (Josh Poimboeuf)
   - x86/traps: Cleanup and robustify decode_bug() (Peter Zijlstra)
   - x86/kexec: Merge x86_32 and x86_64 code using macros from
     <asm/asm.h> (Uros Bizjak)
   - Use named operands in inline asm (Uros Bizjak)
   - Improve performance by using asm_inline() for atomic locking
     instructions (Uros Bizjak)

  Earlyprintk:
   - Harden early_serial (Peter Zijlstra)

  NMI handler:
   - Add an emergency handler in nmi_desc & use it in
     nmi_shootdown_cpus() (Waiman Long)

  Miscellaneous fixes and cleanups:
   - by Ahmed S. Darwish, Andy Shevchenko, Ard Biesheuvel, Artem
     Bityutskiy, Borislav Petkov, Brendan Jackman, Brian Gerst, Dan
     Carpenter, Dr. David Alan Gilbert, H. Peter Anvin, Ingo Molnar,
     Josh Poimboeuf, Kevin Brodsky, Mike Rapoport, Lukas Bulwahn, Maciej
     Wieczor-Retman, Max Grobecker, Patryk Wlazlyn, Pawan Gupta, Peter
     Zijlstra, Philip Redkin, Qasim Ijaz, Rik van Riel, Thomas Gleixner,
     Thorsten Blum, Tom Lendacky, Tony Luck, Uros Bizjak, Vitaly
     Kuznetsov, Xin Li, liuye"

* tag 'x86-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (211 commits)
  zstd: Increase DYNAMIC_BMI2 GCC version cutoff from 4.8 to 11.0 to work around compiler segfault
  x86/asm: Make asm export of __ref_stack_chk_guard unconditional
  x86/mm: Only do broadcast flush from reclaim if pages were unmapped
  perf/x86/intel, x86/cpu: Replace Pentium 4 model checks with VFM ones
  perf/x86/intel, x86/cpu: Simplify Intel PMU initialization
  x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-UAPI headers
  x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI headers
  x86/locking/atomic: Improve performance by using asm_inline() for atomic locking instructions
  x86/asm: Use asm_inline() instead of asm() in clwb()
  x86/asm: Use CLFLUSHOPT and CLWB mnemonics in <asm/special_insns.h>
  x86/hweight: Use asm_inline() instead of asm()
  x86/hweight: Use ASM_CALL_CONSTRAINT in inline asm()
  x86/hweight: Use named operands in inline asm()
  x86/stackprotector/64: Only export __ref_stack_chk_guard on CONFIG_SMP
  x86/head/64: Avoid Clang < 17 stack protector in startup code
  x86/kexec: Merge x86_32 and x86_64 code using macros from <asm/asm.h>
  x86/runtime-const: Add the RUNTIME_CONST_PTR assembly macro
  x86/cpu/intel: Limit the non-architectural constant_tsc model checks
  x86/mm/pat: Replace Intel x86_model checks with VFM ones
  x86/cpu/intel: Fix fast string initialization for extended Families
  ...
2025-03-24 22:06:11 -07:00
Linus Torvalds
32b22538be Merge tag 'sched-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "Core & fair scheduler changes:

   - Cancel the slice protection of the idle entity (Zihan Zhou)
   - Reduce the default slice to avoid tasks getting an extra tick
     (Zihan Zhou)
   - Force propagating min_slice of cfs_rq when {en,de}queue tasks
     (Tianchen Ding)
   - Refactor can_migrate_task() to elimate looping (I Hsin Cheng)
   - Add unlikey branch hints to several system calls (Colin Ian King)
   - Optimize current_clr_polling() on certain architectures (Yujun
     Dong)

  Deadline scheduler: (Juri Lelli)
   - Remove redundant dl_clear_root_domain call
   - Move dl_rebuild_rd_accounting to cpuset.h

  Uclamp:
   - Use the uclamp_is_used() helper instead of open-coding it (Xuewen
     Yan)
   - Optimize sched_uclamp_used static key enabling (Xuewen Yan)

  Scheduler topology support: (Juri Lelli)
   - Ignore special tasks when rebuilding domains
   - Add wrappers for sched_domains_mutex
   - Generalize unique visiting of root domains
   - Rebuild root domain accounting after every update
   - Remove partition_and_rebuild_sched_domains
   - Stop exposing partition_sched_domains_locked

  RSEQ: (Michael Jeanson)
   - Update kernel fields in lockstep with CONFIG_DEBUG_RSEQ=y
   - Fix segfault on registration when rseq_cs is non-zero
   - selftests: Add rseq syscall errors test
   - selftests: Ensure the rseq ABI TLS is actually 1024 bytes

  Membarriers:
   - Fix redundant load of membarrier_state (Nysal Jan K.A.)

  Scheduler debugging:
   - Introduce and use preempt_model_str() (Sebastian Andrzej Siewior)
   - Make CONFIG_SCHED_DEBUG unconditional (Ingo Molnar)

  Fixes and cleanups:
   - Always save/restore x86 TSC sched_clock() on suspend/resume
     (Guilherme G. Piccoli)
   - Misc fixes and cleanups (Thorsten Blum, Juri Lelli, Sebastian
     Andrzej Siewior)"

* tag 'sched-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
  cpuidle, sched: Use smp_mb__after_atomic() in current_clr_polling()
  sched/debug: Remove CONFIG_SCHED_DEBUG
  sched/debug: Remove CONFIG_SCHED_DEBUG from self-test config files
  sched/debug, Documentation: Remove (most) CONFIG_SCHED_DEBUG references from documentation
  sched/debug: Make CONFIG_SCHED_DEBUG functionality unconditional
  sched/debug: Make 'const_debug' tunables unconditional __read_mostly
  sched/debug: Change SCHED_WARN_ON() to WARN_ON_ONCE()
  rseq/selftests: Fix namespace collision with rseq UAPI header
  include/{topology,cpuset}: Move dl_rebuild_rd_accounting to cpuset.h
  sched/topology: Stop exposing partition_sched_domains_locked
  cgroup/cpuset: Remove partition_and_rebuild_sched_domains
  sched/topology: Remove redundant dl_clear_root_domain call
  sched/deadline: Rebuild root domain accounting after every update
  sched/deadline: Generalize unique visiting of root domains
  sched/topology: Wrappers for sched_domains_mutex
  sched/deadline: Ignore special tasks when rebuilding domains
  tracing: Use preempt_model_str()
  xtensa: Rely on generic printing of preemption model
  x86: Rely on generic printing of preemption model
  s390: Rely on generic printing of preemption model
  ...
2025-03-24 21:28:12 -07:00
Linus Torvalds
3ba7dfb8da Merge tag 'rcu-next-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux
Pull RCU updates from Boqun Feng:
 "Documentation:
   - Add broken-timing possibility to stallwarn.rst
   - Improve discussion of this_cpu_ptr(), add raw_cpu_ptr()
   - Document self-propagating callbacks
   - Point call_srcu() to call_rcu() for detailed memory ordering
   - Add CONFIG_RCU_LAZY delays to call_rcu() kernel-doc header
   - Clarify RCU_LAZY and RCU_LAZY_DEFAULT_OFF help text
   - Remove references to old grace-period-wait primitives

  srcu:
   - Introduce srcu_read_{un,}lock_fast(), which is similar to
     srcu_read_{un,}lock_lite(): avoid smp_mb()s in lock and unlock
     at the cost of calling synchronize_rcu() in synchronize_srcu()

     Moreover, by returning the percpu offset of the counter at
     srcu_read_lock_fast() time, srcu_read_unlock_fast() can avoid
     extra pointer dereferencing, which makes it faster than
     srcu_read_{un,}lock_lite()

     srcu_read_{un,}lock_fast() are intended to replace
     rcu_read_{un,}lock_trace() if possible

  RCU torture:
   - Add get_torture_init_jiffies() to return the start time of the test
   - Add a test_boost_holdoff module parameter to allow delaying
     boosting tests when building rcutorture as built-in
   - Add grace period sequence number logging at the beginning and end
     of failure/close-call results
   - Switch to hexadecimal for the expedited grace period sequence
     number in the rcu_exp_grace_period trace point
   - Make cur_ops->format_gp_seqs take buffer length
   - Move RCU_TORTURE_TEST_{CHK_RDR_STATE,LOG_CPU} to bool
   - Complain when invalid SRCU reader_flavor is specified
   - Add FORCE_NEED_SRCU_NMI_SAFE Kconfig for testing, which forces SRCU
     uses atomics even when percpu ops are NMI safe, and use the Kconfig
     for SRCU lockdep testing

  Misc:
   - Split rcu_report_exp_cpu_mult() mask parameter and use for tracing
   - Remove READ_ONCE() for rdp->gpwrap access in __note_gp_changes()
   - Fix get_state_synchronize_rcu_full() GP-start detection
   - Move RCU Tasks self-tests to core_initcall()
   - Print segment lengths in show_rcu_nocb_gp_state()
   - Make RCU watch ct_kernel_exit_state() warning
   - Flush console log from kernel_power_off()
   - rcutorture: Allow a negative value for nfakewriters
   - rcu: Update TREE05.boot to test normal synchronize_rcu()
   - rcu: Use _full() API to debug synchronize_rcu()

  Make RCU handle PREEMPT_LAZY better:
   - Fix header guard for rcu_all_qs()
   - rcu: Rename PREEMPT_AUTO to PREEMPT_LAZY
   - Update __cond_resched comment about RCU quiescent states
   - Handle unstable rdp in rcu_read_unlock_strict()
   - Handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y
   - osnoise: Provide quiescent states
   - Adjust rcutorture with possible PREEMPT_RCU=n && PREEMPT_COUNT=y
     combination
   - Limit PREEMPT_RCU configurations
   - Make rcutorture senario TREE07 and senario TREE10 use
     PREEMPT_LAZY=y"

* tag 'rcu-next-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (59 commits)
  rcutorture: Make scenario TREE07 build CONFIG_PREEMPT_LAZY=y
  rcutorture: Make scenario TREE10 build CONFIG_PREEMPT_LAZY=y
  rcu: limit PREEMPT_RCU configurations
  rcutorture: Update ->extendables check for lazy preemption
  rcutorture: Update rcutorture_one_extend_check() for lazy preemption
  osnoise: provide quiescent states
  rcu: Use _full() API to debug synchronize_rcu()
  rcu: Update TREE05.boot to test normal synchronize_rcu()
  rcutorture: Allow a negative value for nfakewriters
  Flush console log from kernel_power_off()
  context_tracking: Make RCU watch ct_kernel_exit_state() warning
  rcu/nocb: Print segment lengths in show_rcu_nocb_gp_state()
  rcu-tasks: Move RCU Tasks self-tests to core_initcall()
  rcu: Fix get_state_synchronize_rcu_full() GP-start detection
  torture: Make SRCU lockdep testing use srcu_read_lock_nmisafe()
  srcu: Add FORCE_NEED_SRCU_NMI_SAFE Kconfig for testing
  rcutorture: Complain when invalid SRCU reader_flavor is specified
  rcutorture: Move RCU_TORTURE_TEST_{CHK_RDR_STATE,LOG_CPU} to bool
  rcutorture: Make cur_ops->format_gp_seqs take buffer length
  rcutorture: Add ftrace-compatible timestamp to GP# failure/close-call output
  ...
2025-03-24 19:41:37 -07:00
Linus Torvalds
418becac37 Merge tag 'nolibc-20250308-for-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull nolibc updates from Paul McKenney:
 - 32bit s390 support
 - opendir() and friends
 - openat() support
 - sscanf() support
 - various cleanups

[ Paul has just forwarded the pull request from Thomas Weißschuh, so
  the tag signature is from Thomas, not Paul   - Linus ]

* tag 'nolibc-20250308-for-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (26 commits)
  tools/nolibc: don't use asm/ UAPI headers
  selftests/nolibc: stop testing constructor order
  selftests/nolibc: use O_RDONLY flag instead of 0
  tools/nolibc: drop outdated example from overview comment
  tools/nolibc: process open() vararg as mode_t
  tools/nolibc: always use openat(2) instead of open(2)
  tools/nolibc: add support for openat(2)
  selftests/nolibc: add armthumb configuration
  selftests/nolibc: explicitly enable ARM mode
  Revert "selftests: kselftest: Fix build failure with NOLIBC"
  tools/nolibc: add support for [v]sscanf()
  tools/nolibc: add support for 32-bit s390
  selftests/nolibc: rename s390 to s390x
  selftests/nolibc: only run constructor tests on nolibc
  selftests/nolibc: split up architecture list in run-tests.sh
  tools/nolibc: add support for directory access
  tools/nolibc: add support for sys_llseek()
  selftests/nolibc: always keep test kernel configuration up to date
  selftests/nolibc: execute defconfig before other targets
  selftests/nolibc: drop call to mrproper target
  ...
2025-03-24 17:59:29 -07:00
Linus Torvalds
bcb044256d Merge tag 'sched_ext-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext updates from Tejun Heo:

 - Add mechanism to count and report internal events. This significantly
   improves visibility on subtle corner conditions.

 - The default idle CPU selection logic is revamped and improved in
   multiple ways including being made topology aware.

 - sched_ext was disabling ttwu_queue for simplicity, which can be
   costly when hardware topology is more complex. Implement
   SCX_OPS_ALLOWED_QUEUED_WAKEUP so that BPF schedulers can selectively
   enable ttwu_queue.

 - tools/sched_ext updates to improve compatibility among others.

 - Other misc updates and fixes.

 - sched_ext/for-6.14-fixes were pulled a few times to receive
   prerequisite fixes and resolve conflicts.

* tag 'sched_ext-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (42 commits)
  sched_ext: idle: Refactor scx_select_cpu_dfl()
  sched_ext: idle: Honor idle flags in the built-in idle selection policy
  sched_ext: Skip per-CPU tasks in scx_bpf_reenqueue_local()
  sched_ext: Add trace point to track sched_ext core events
  sched_ext: Change the event type from u64 to s64
  sched_ext: Documentation: add task lifecycle summary
  tools/sched_ext: Provide a compatible helper for scx_bpf_events()
  selftests/sched_ext: Add NUMA-aware scheduler test
  tools/sched_ext: Provide consistent access to scx flags
  sched_ext: idle: Fix scx_bpf_pick_any_cpu_node() behavior
  sched_ext: idle: Introduce scx_bpf_nr_node_ids()
  sched_ext: idle: Introduce node-aware idle cpu kfunc helpers
  sched_ext: idle: Per-node idle cpumasks
  sched_ext: idle: Introduce SCX_OPS_BUILTIN_IDLE_PER_NODE
  sched_ext: idle: Make idle static keys private
  sched/topology: Introduce for_each_node_numadist() iterator
  mm/numa: Introduce nearest_node_nodemask()
  nodemask: numa: reorganize inclusion path
  nodemask: add nodes_copy()
  tools/sched_ext: Sync with scx repo
  ...
2025-03-24 17:23:48 -07:00
Linus Torvalds
11c2b2e332 Merge tag 'seccomp-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp updates from Kees Cook:

 - avoid the lock trip seccomp_filter_release in common case (Mateusz
   Guzik)

 - remove unused 'sd' argument through-out (Oleg Nesterov)

 - selftests/seccomp: Add hard-coded __NR_uretprobe for x86_64

* tag 'seccomp-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  seccomp: avoid the lock trip seccomp_filter_release in common case
  seccomp: remove the 'sd' argument from __seccomp_filter()
  seccomp: remove the 'sd' argument from __secure_computing()
  seccomp: fix the __secure_computing() stub for !HAVE_ARCH_SECCOMP_FILTER
  seccomp/mips: change syscall_trace_enter() to use secure_computing()
  selftests/seccomp: Add hard-coded __NR_uretprobe for x86_64
2025-03-24 15:34:38 -07:00
Linus Torvalds
06961fbbbd Merge tag 'move-lib-kunit-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull lib kunit selftest move from Kees Cook:
 "This is a one-off tree to coordinate the move of selftests out of lib/
  and into lib/tests/. A separate tree was used for this to keep the
  paths sane with all the work in the same place.

   - move lib/ selftests into lib/tests/ (Kees Cook, Gabriela
     Bittencourt, Luis Felipe Hernandez, Lukas Bulwahn, Tamir
     Duberstein)

   - lib/math: Add int_log test suite (Bruno Sobreira França)

   - lib/math: Add Kunit test suite for gcd() (Yu-Chun Lin)

   - lib/tests/kfifo_kunit.c: add tests for the kfifo structure (Diego
     Vieira)

   - unicode: refactor selftests into KUnit (Gabriela Bittencourt)

   - lib/prime_numbers: convert self-test to KUnit (Tamir Duberstein)

   - printf: convert self-test to KUnit (Tamir Duberstein)

   - scanf: convert self-test to KUnit (Tamir Duberstein)"

* tag 'move-lib-kunit-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (21 commits)
  scanf: break kunit into test cases
  scanf: convert self-test to KUnit
  scanf: remove redundant debug logs
  scanf: implicate test line in failure messages
  printf: implicate test line in failure messages
  printf: break kunit into test cases
  printf: convert self-test to KUnit
  kunit/fortify: Replace "volatile" with OPTIMIZER_HIDE_VAR()
  kunit/fortify: Expand testing of __compiletime_strlen()
  kunit/stackinit: Use fill byte different from Clang i386 pattern
  kunit/overflow: Fix DEFINE_FLEX tests for counted_by
  selftests: remove reference to prime_numbers.sh
  MAINTAINERS: adjust entries in FORTIFY_SOURCE and KERNEL HARDENING
  lib/prime_numbers: convert self-test to KUnit
  lib/math: Add Kunit test suite for gcd()
  unicode: kunit: change tests filename and path
  unicode: kunit: refactor selftest to kunit tests
  lib/tests/kfifo_kunit.c: add tests for the kfifo structure
  lib: Move KUnit tests into tests/ subdirectory
  lib/math: Add int_log test suite
  ...
2025-03-24 15:15:11 -07:00
Amit Cohen
36ed81bcad selftests: vxlan_bridge: Test flood with unresolved FDB entry
Extend flood test to configure FDB entry with unresolved destination IP,
check that packets are not sent twice.

Without the previous patch which handles such scenario in mlxsw, the
tests fail:

$ TESTS='test_flood' ./vxlan_bridge_1d.sh
Running tests with UDP port 4789
TEST: VXLAN: flood                                                  [ OK ]
TEST: VXLAN: flood, unresolved FDB entry                            [FAIL]
        vx2 ns2: Expected to capture 10 packets, got 20.

$ TESTS='test_flood' ./vxlan_bridge_1q.sh
INFO: Running tests with UDP port 4789
TEST: VXLAN: flood vlan 10                                          [ OK ]
TEST: VXLAN: flood vlan 20                                          [ OK ]
TEST: VXLAN: flood vlan 10, unresolved FDB entry                    [FAIL]
        vx10 ns2: Expected to capture 10 packets, got 20.
TEST: VXLAN: flood vlan 20, unresolved FDB entry                    [FAIL]
        vx20 ns2: Expected to capture 10 packets, got 20.

With the previous patch, the tests pass.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/7bc96e317531f3bf06319fb2ea447bd8666f29fa.1742224300.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24 15:09:31 -07:00
Gal Pressman
c61209eeb0 selftests: drv-net: rss_ctx: Don't assume indirection table is present
The test_rss_context_dump() test assumes the indirection table is always
supported, which is not true for all drivers, e.g., virtio_net when
VIRTIO_NET_F_RSS is disabled.

Skip the check if 'indir' is not present.

Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250318112426.386651-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24 12:22:37 -07:00
Peter Seiderer
3099f9e156 selftest: net: update proc_net_pktgen (add more imix_weights test cases)
Add more imix_weights test cases (for incomplete input).

Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250317090401.1240704-2-ps.report@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24 12:01:38 -07:00
Linus Torvalds
130e696aa6 Merge tag 'vfs-6.15-rc1.mount.namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount namespace updates from Christian Brauner:
 "This expands the ability of anonymous mount namespaces:

   - Creating detached mounts from detached mounts

     Currently, detached mounts can only be created from attached
     mounts. This limitaton prevents various use-cases. For example, the
     ability to mount a subdirectory without ever having to make the
     whole filesystem visible first.

     The current permission modelis:

      (1) Check that the caller is privileged over the owning user
          namespace of it's current mount namespace.

      (2) Check that the caller is located in the mount namespace of the
          mount it wants to create a detached copy of.

     While it is not strictly necessary to do it this way it is
     consistently applied in the new mount api. This model will also be
     used when allowing the creation of detached mount from another
     detached mount.

     The (1) requirement can simply be met by performing the same check
     as for the non-detached case, i.e., verify that the caller is
     privileged over its current mount namespace.

     To meet the (2) requirement it must be possible to infer the origin
     mount namespace that the anonymous mount namespace of the detached
     mount was created from.

     The origin mount namespace of an anonymous mount is the mount
     namespace that the mounts that were copied into the anonymous mount
     namespace originate from.

     In order to check the origin mount namespace of an anonymous mount
     namespace the sequence number of the original mount namespace is
     recorded in the anonymous mount namespace.

     With this in place it is possible to perform an equivalent check
     (2') to (2). The origin mount namespace of the anonymous mount
     namespace must be the same as the caller's mount namespace. To
     establish this the sequence number of the caller's mount namespace
     and the origin sequence number of the anonymous mount namespace are
     compared.

     The caller is always located in a non-anonymous mount namespace
     since anonymous mount namespaces cannot be setns()ed into. The
     caller's mount namespace will thus always have a valid sequence
     number.

     The owning namespace of any mount namespace, anonymous or
     non-anonymous, can never change. A mount attached to a
     non-anonymous mount namespace can never change mount namespace.

     If the sequence number of the non-anonymous mount namespace and the
     origin sequence number of the anonymous mount namespace match, the
     owning namespaces must match as well.

     Hence, the capability check on the owning namespace of the caller's
     mount namespace ensures that the caller has the ability to copy the
     mount tree.

   - Allow mount detached mounts on detached mounts

     Currently, detached mounts can only be mounted onto attached
     mounts. This limitation makes it impossible to assemble a new
     private rootfs and move it into place. Instead, a detached tree
     must be created, attached, then mounted open and then either moved
     or detached again. Lift this restriction.

     In order to allow mounting detached mounts onto other detached
     mounts the same permission model used for creating detached mounts
     from detached mounts can be used (cf. above).

     Allowing to mount detached mounts onto detached mounts leaves three
     cases to consider:

      (1) The source mount is an attached mount and the target mount is
          a detached mount. This would be equivalent to moving a mount
          between different mount namespaces. A caller could move an
          attached mount to a detached mount. The detached mount can now
          be freely attached to any mount namespace. This changes the
          current delegatioh model significantly for no good reason. So
          this will fail.

      (2) Anonymous mount namespaces are always attached fully, i.e., it
          is not possible to only attach a subtree of an anoymous mount
          namespace. This simplifies the implementation and reasoning.

          Consequently, if the anonymous mount namespace of the source
          detached mount and the target detached mount are the identical
          the mount request will fail.

      (3) The source mount's anonymous mount namespace is different from
          the target mount's anonymous mount namespace.

          In this case the source anonymous mount namespace of the
          source mount tree must be freed after its mounts have been
          moved to the target anonymous mount namespace. The source
          anonymous mount namespace must be empty afterwards.

     By allowing to mount detached mounts onto detached mounts a caller
     may do the following:

       fd_tree1 = open_tree(-EBADF, "/mnt", OPEN_TREE_CLONE)
       fd_tree2 = open_tree(-EBADF, "/tmp", OPEN_TREE_CLONE)

     fd_tree1 and fd_tree2 refer to two different detached mount trees
     that belong to two different anonymous mount namespace.

     It is important to note that fd_tree1 and fd_tree2 both refer to
     the root of their respective anonymous mount namespaces.

     By allowing to mount detached mounts onto detached mounts the
     caller may now do:

         move_mount(fd_tree1, "", fd_tree2, "",
                    MOVE_MOUNT_F_EMPTY_PATH | MOVE_MOUNT_T_EMPTY_PATH)

     This will cause the detached mount referred to by fd_tree1 to be
     mounted on top of the detached mount referred to by fd_tree2.

     Thus, the detached mount fd_tree1 is moved from its separate
     anonymous mount namespace into fd_tree2's anonymous mount
     namespace.

     It also means that while fd_tree2 continues to refer to the root of
     its respective anonymous mount namespace fd_tree1 doesn't anymore.

     This has the consequence that only fd_tree2 can be moved to another
     anonymous or non-anonymous mount namespace. Moving fd_tree1 will
     now fail as fd_tree1 doesn't refer to the root of an anoymous mount
     namespace anymore.

     Now fd_tree1 and fd_tree2 refer to separate detached mount trees
     referring to the same anonymous mount namespace.

     This is conceptually fine. The new mount api does allow for this to
     happen already via:

       mount -t tmpfs tmpfs /mnt
       mkdir -p /mnt/A
       mount -t tmpfs tmpfs /mnt/A

       fd_tree3 = open_tree(-EBADF, "/mnt", OPEN_TREE_CLONE | AT_RECURSIVE)
       fd_tree4 = open_tree(-EBADF, "/mnt/A", 0)

     Both fd_tree3 and fd_tree4 refer to two different detached mount
     trees but both detached mount trees refer to the same anonymous
     mount namespace. An as with fd_tree1 and fd_tree2, only fd_tree3
     may be moved another mount namespace as fd_tree3 refers to the root
     of the anonymous mount namespace just while fd_tree4 doesn't.

     However, there's an important difference between the
     fd_tree3/fd_tree4 and the fd_tree1/fd_tree2 example.

     Closing fd_tree4 and releasing the respective struct file will have
     no further effect on fd_tree3's detached mount tree.

     However, closing fd_tree3 will cause the mount tree and the
     respective anonymous mount namespace to be destroyed causing the
     detached mount tree of fd_tree4 to be invalid for further mounting.

     By allowing to mount detached mounts on detached mounts as in the
     fd_tree1/fd_tree2 example both struct files will affect each other.

     Both fd_tree1 and fd_tree2 refer to struct files that have
     FMODE_NEED_UNMOUNT set.

     To handle this we use the fact that @fd_tree1 will have a parent
     mount once it has been attached to @fd_tree2.

     When dissolve_on_fput() is called the mount that has been passed in
     will refer to the root of the anonymous mount namespace. If it
     doesn't it would mean that mounts are leaked. So before allowing to
     mount detached mounts onto detached mounts this would be a bug.

     Now that detached mounts can be mounted onto detached mounts it
     just means that the mount has been attached to another anonymous
     mount namespace and thus dissolve_on_fput() must not unmount the
     mount tree or free the anonymous mount namespace as the file
     referring to the root of the namespace hasn't been closed yet.

     If it had been closed yet it would be obvious because the mount
     namespace would be NULL, i.e., the @fd_tree1 would have already
     been unmounted. If @fd_tree1 hasn't been unmounted yet and has a
     parent mount it is safe to skip any cleanup as closing @fd_tree2
     will take care of all cleanup operations.

   - Allow mount propagation for detached mount trees

     In commit ee2e3f5062 ("mount: fix mounting of detached mounts
     onto targets that reside on shared mounts") I fixed a bug where
     propagating the source mount tree of an anonymous mount namespace
     into a target mount tree of a non-anonymous mount namespace could
     be used to trigger an integer overflow in the non-anonymous mount
     namespace causing any new mounts to fail.

     The cause of this was that the propagation algorithm was unable to
     recognize mounts from the source mount tree that were already
     propagated into the target mount tree and then reappeared as
     propagation targets when walking the destination propagation mount
     tree.

     When fixing this I disabled mount propagation into anonymous mount
     namespaces. Make it possible for anonymous mount namespace to
     receive mount propagation events correctly. This is now also a
     correctness issue now that we allow mounting detached mount trees
     onto detached mount trees.

     Mark the source anonymous mount namespace with MNTNS_PROPAGATING
     indicating that all mounts belonging to this mount namespace are
     currently in the process of being propagated and make the
     propagation algorithm discard those if they appear as propagation
     targets"

* tag 'vfs-6.15-rc1.mount.namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (21 commits)
  selftests: test subdirectory mounting
  selftests: add test for detached mount tree propagation
  fs: namespace: fix uninitialized variable use
  mount: handle mount propagation for detached mount trees
  fs: allow creating detached mounts from fsmount() file descriptors
  selftests: seventh test for mounting detached mounts onto detached mounts
  selftests: sixth test for mounting detached mounts onto detached mounts
  selftests: fifth test for mounting detached mounts onto detached mounts
  selftests: fourth test for mounting detached mounts onto detached mounts
  selftests: third test for mounting detached mounts onto detached mounts
  selftests: second test for mounting detached mounts onto detached mounts
  selftests: first test for mounting detached mounts onto detached mounts
  fs: mount detached mounts onto detached mounts
  fs: support getname_maybe_null() in move_mount()
  selftests: create detached mounts from detached mounts
  fs: create detached mounts from detached mounts
  fs: add may_copy_tree()
  fs: add fastpath for dissolve_on_fput()
  fs: add assert for move_mount()
  fs: add mnt_ns_empty() helper
  ...
2025-03-24 11:41:41 -07:00
Linus Torvalds
74adf9e353 Merge tag 'vfs-6.15-rc1.nsfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs nsfs updates from Christian Brauner:
 "This contains non-urgent fixes for nsfs to validate ioctls before
  performing any relevant operations.

  We alredy did this for a few other filesystems last cycle"

* tag 'vfs-6.15-rc1.nsfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  selftests/nsfs: add ioctl validation tests
  nsfs: validate ioctls
2025-03-24 11:38:12 -07:00
Linus Torvalds
804382d59b Merge tag 'vfs-6.15-rc1.overlayfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs overlayfs updates from Christian Brauner:
 "Currently overlayfs uses the mounter's credentials for its
  override_creds() calls. That provides a consistent permission model.

  This patches allows a caller to instruct overlayfs to use its
  credentials instead. The caller must be located in the same user
  namespace hierarchy as the user namespace the overlayfs instance will
  be mounted in. This provides a consistent and simple security model.

  With this it is possible to e.g., mount an overlayfs instance where
  the mounter must have CAP_SYS_ADMIN but the credentials used for
  override_creds() have dropped CAP_SYS_ADMIN. It also allows the usage
  of custom fs{g,u}id different from the callers and other tweaks"

* tag 'vfs-6.15-rc1.overlayfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  selftests/ovl: add third selftest for "override_creds"
  selftests/ovl: add second selftest for "override_creds"
  selftests/filesystems: add utils.{c,h}
  selftests/ovl: add first selftest for "override_creds"
  ovl: allow to specify override credentials
2025-03-24 10:37:40 -07:00
Linus Torvalds
df00ded23a Merge tag 'vfs-6.15-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs pidfs updates from Christian Brauner:

 - Allow retrieving exit information after a process has been reaped
   through pidfds via the new PIDFD_INTO_EXIT extension for the
   PIDFD_GET_INFO ioctl. Various tools need access to information about
   a process/task even after it has already been reaped.

   Pidfd polling allows waiting on either task exit or for a task to
   have been reaped. The contract for PIDFD_INFO_EXIT is simply that
   EPOLLHUP must be observed before exit information can be retrieved,
   i.e., exit information is only provided once the task has been reaped
   and then can be retrieved as long as the pidfd is open.

 - Add PIDFD_SELF_{THREAD,THREAD_GROUP} sentinels allowing userspace to
   forgo allocating a file descriptor for their own process. This is
   useful in scenarios where users want to act on their own process
   through pidfds and is akin to AT_FDCWD.

 - Improve premature thread-group leader and subthread exec behavior
   when polling on pidfds:

   (1) During a multi-threaded exec by a subthread, i.e.,
       non-thread-group leader thread, all other threads in the
       thread-group including the thread-group leader are killed and the
       struct pid of the thread-group leader will be taken over by the
       subthread that called exec. IOW, two tasks change their TIDs.

   (2) A premature thread-group leader exit means that the thread-group
       leader exited before all of the other subthreads in the
       thread-group have exited.

   Both cases lead to inconsistencies for pidfd polling with
   PIDFD_THREAD. Any caller that holds a PIDFD_THREAD pidfd to the
   current thread-group leader may or may not see an exit notification
   on the file descriptor depending on when poll is performed. If the
   poll is performed before the exec of the subthread has concluded an
   exit notification is generated for the old thread-group leader. If
   the poll is performed after the exec of the subthread has concluded
   no exit notification is generated for the old thread-group leader.

   The correct behavior is to simply not generate an exit notification
   on the struct pid of a subhthread exec because the struct pid is
   taken over by the subthread and thus remains alive.

   But this is difficult to handle because a thread-group may exit
   premature as mentioned in (2). In that case an exit notification is
   reliably generated but the subthreads may continue to run for an
   indeterminate amount of time and thus also may exec at some point.

   After this pull no exit notifications will be generated for a
   PIDFD_THREAD pidfd for a thread-group leader until all subthreads
   have been reaped. If a subthread should exec before no exit
   notification will be generated until that task exits or it creates
   subthreads and repeates the cycle.

   This means an exit notification indicates the ability for the father
   to reap the child.

* tag 'vfs-6.15-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits)
  selftests/pidfd: third test for multi-threaded exec polling
  selftests/pidfd: second test for multi-threaded exec polling
  selftests/pidfd: first test for multi-threaded exec polling
  pidfs: improve multi-threaded exec and premature thread-group leader exit polling
  pidfs: ensure that PIDFS_INFO_EXIT is available
  selftests/pidfd: add seventh PIDFD_INFO_EXIT selftest
  selftests/pidfd: add sixth PIDFD_INFO_EXIT selftest
  selftests/pidfd: add fifth PIDFD_INFO_EXIT selftest
  selftests/pidfd: add fourth PIDFD_INFO_EXIT selftest
  selftests/pidfd: add third PIDFD_INFO_EXIT selftest
  selftests/pidfd: add second PIDFD_INFO_EXIT selftest
  selftests/pidfd: add first PIDFD_INFO_EXIT selftest
  selftests/pidfd: expand common pidfd header
  pidfs/selftests: ensure correct headers for ioctl handling
  selftests/pidfd: fix header inclusion
  pidfs: allow to retrieve exit information
  pidfs: record exit code and cgroupid at exit
  pidfs: use private inode slab cache
  pidfs: move setting flags into pidfs_alloc_file()
  pidfd: rely on automatic cleanup in __pidfd_prepare()
  ...
2025-03-24 10:16:37 -07:00
Linus Torvalds
fd101da676 Merge tag 'vfs-6.15-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount updates from Christian Brauner:

 - Mount notifications

   The day has come where we finally provide a new api to listen for
   mount topology changes outside of /proc/<pid>/mountinfo. A mount
   namespace file descriptor can be supplied and registered with
   fanotify to listen for mount topology changes.

   Currently notifications for mount, umount and moving mounts are
   generated. The generated notification record contains the unique
   mount id of the mount.

   The listmount() and statmount() api can be used to query detailed
   information about the mount using the received unique mount id.

   This allows userspace to figure out exactly how the mount topology
   changed without having to generating diffs of /proc/<pid>/mountinfo
   in userspace.

 - Support O_PATH file descriptors with FSCONFIG_SET_FD in the new mount
   api

 - Support detached mounts in overlayfs

   Since last cycle we support specifying overlayfs layers via file
   descriptors. However, we don't allow detached mounts which means
   userspace cannot user file descriptors received via
   open_tree(OPEN_TREE_CLONE) and fsmount() directly. They have to
   attach them to a mount namespace via move_mount() first.

   This is cumbersome and means they have to undo mounts via umount().
   Allow them to directly use detached mounts.

 - Allow to retrieve idmappings with statmount

   Currently it isn't possible to figure out what idmapping has been
   attached to an idmapped mount. Add an extension to statmount() which
   allows to read the idmapping from the mount.

 - Allow creating idmapped mounts from mounts that are already idmapped

   So far it isn't possible to allow the creation of idmapped mounts
   from already idmapped mounts as this has significant lifetime
   implications. Make the creation of idmapped mounts atomic by allow to
   pass struct mount_attr together with the open_tree_attr() system call
   allowing to solve these issues without complicating VFS lookup in any
   way.

   The system call has in general the benefit that creating a detached
   mount and applying mount attributes to it becomes an atomic operation
   for userspace.

 - Add a way to query statmount() for supported options

   Allow userspace to query which mount information can be retrieved
   through statmount().

 - Allow superblock owners to force unmount

* tag 'vfs-6.15-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (21 commits)
  umount: Allow superblock owners to force umount
  selftests: add tests for mount notification
  selinux: add FILE__WATCH_MOUNTNS
  samples/vfs: fix printf format string for size_t
  fs: allow changing idmappings
  fs: add kflags member to struct mount_kattr
  fs: add open_tree_attr()
  fs: add copy_mount_setattr() helper
  fs: add vfs_open_tree() helper
  statmount: add a new supported_mask field
  samples/vfs: add STATMOUNT_MNT_{G,U}IDMAP
  selftests: add tests for using detached mount with overlayfs
  samples/vfs: check whether flag was raised
  statmount: allow to retrieve idmappings
  uidgid: add map_id_range_up()
  fs: allow detached mounts in clone_private_mount()
  selftests/overlayfs: test specifying layers as O_PATH file descriptors
  fs: support O_PATH fds with FSCONFIG_SET_FD
  vfs: add notifications for mount attach and detach
  fanotify: notify on mount attach and detach
  ...
2025-03-24 09:34:10 -07:00
Ming Lei
0f3ebf2d4b selftests: ublk: add stripe target
Add ublk stripe target which can take 1~4 underlying backing files
or block device, with stripe size 4k ~ 512K.

Add two basic tests(write verify & mkfs/mount/umount) over ublk/stripe.

This target is helpful to cover multiple IOs aiming at same
fixed/registered IO kernel buffer.

It is also capable of verifying vectored registered (kernel)buffers
in future for zero copy, so far it isn't supported yet.

Todo: support vectored registered kernel buffer for ublk/zc.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-9-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-22 08:35:08 -06:00
Ming Lei
263846eb43 selftests: ublk: simplify loop io completion
Use the added target io handling helpers for simplifying loop io
completion.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-8-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-22 08:35:08 -06:00
Ming Lei
8cb9b971e2 selftests: ublk: enable zero copy for null target
Enable zero copy for null target so that we can evaluate performance
from zero copy or not.

Also this should be the simplest ublk zero copy implementation, which
can be served as zc example.

Add test for covering 'add -t null -z'.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-7-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-22 08:35:08 -06:00
Ming Lei
8842b72a82 selftests: ublk: prepare for supporting stripe target
- pass 'truct dev_ctx *ctx' to target init function

- add 'private_data' to 'struct ublk_dev' for storing target specific data

- add 'private_data' to 'struct ublk_io' for storing per-IO data

- add 'tgt_ios' to 'struct ublk_io' for counting how many io_uring ios
for handling the current io command

- add helper ublk_get_io() for supporting stripe target

- add two helpers for simplifying target io handling

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-6-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-22 08:35:08 -06:00
Ming Lei
10d962dae2 selftests: ublk: move common code into common.c
Move two functions for initializing & de-initializing backing file
into common.c.

Also move one common helper into kublk.h.

Prepare for supporting ublk-stripe.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-5-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-22 08:35:08 -06:00
Ming Lei
9413c0ca8e selftests: ublk: increase max buffer size to 1MB
Increase max buffer size to 1MB, and 64KB is too small to evaluate
performance with builtin ublk server implementation.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-22 08:35:08 -06:00
Ming Lei
f2639ed11e selftests: ublk: add single sqe allocator helper
Unify the sqe allocator helper, and we will use it for supporting
more cases, such as ublk stripe, in which variable sqe allocation
is required.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-22 08:35:08 -06:00
Ming Lei
723977cab4 selftests: ublk: add generic_01 for verifying sequential IO order
block layer, ublk and io_uring might re-order IO in the past

- plug

- queue ublk io command via task work

Add one test for verifying if sequential WRITE IO is dispatched in order.

- null target is taken, so we can just observe io order from
`tracepoint:block:block_rq_complete` which represents the dispatch order

- WRITE IO is taken because READ may come from system-wide utility

Cc: Uday Shankar <ushankar@purestorage.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-22 08:35:08 -06:00
Ming Lei
ffde32a49a selftests: ublk: fix starting ublk device
Firstly ublk char device node may not be created by udev yet, so wait
a while until it can be opened or timeout.

Secondly delete created ublk device in case of start failure, otherwise
the device becomes zombie.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250321135324.259677-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-21 14:09:24 -06:00
John Stultz
e40d3709c0 selftests/timers: Improve skew_consistency by testing with other clockids
Lei Chen reported a bug with CLOCK_MONOTONIC_COARSE having inconsistencies
when NTP is adjusting the clock frequency.

This has gone seemingly undetected for ~15 years, illustrating a clear gap
in our testing.

The skew_consistency test is intended to catch this sort of problem, but
was focused on only evaluating CLOCK_MONOTONIC, and thus missed the problem
on CLOCK_MONOTONIC_COARSE.

So adjust the test to run with all clockids for 60 seconds each instead of
10 minutes with just CLOCK_MONOTONIC.

Reported-by: Lei Chen <lei.chen@smartx.com>
Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250320200306.1712599-2-jstultz@google.com
Closes: https://lore.kernel.org/lkml/20250310030004.3705801-1-lei.chen@smartx.com/
2025-03-21 19:16:18 +01:00
Breno Leitao
4b73dc83ed selftests: netconsole: Add tests for 'release' feature in sysdata
Expands the self-tests to include the 'release' feature in
sysdata.

Verifies that enabling the 'release' feature appends the
correct data and ensures that disabling it functions as expected.

When enabled, the message should have an item similar to in the
userdata: `release=$(uname -r)`

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250314-netcons_release-v1-5-07979c4b86af@debian.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21 18:59:25 +01:00
Ming Lei
96af5af47b selftests: ublk: fix write cache implementation
For loop target, write cache isn't enabled, and each write isn't be
marked as DSYNC too.

Fix it by enabling write cache, meantime fix FLUSH implementation
by not taking LBA range into account, and there isn't such info
for FLUSH command.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250321004758.152572-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-20 20:01:03 -06:00
Ming Lei
beb31982ad selftests: ublk: add variable for user to not show test result
Some user decides test result by exit code only, and wouldn't like to be
bothered by the test result.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250320013743.4167489-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-20 17:18:55 -06:00
Ming Lei
fe2230d921 selftests: ublk: don't show modprobe failure
ublk_drv may be built-in, so don't show modprobe failure, and we
do check `/dev/ublk-control` for skipping test if ublk_drv isn't
enabled.

Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250320013743.4167489-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-20 17:18:55 -06:00
Ming Lei
8764c1a72b selftests: ublk: add one dependency header
Add one dependency helper which can include new uapi definition which
isn't synced from kernel.

This way also helps a lot for downstream test deployment.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250320013743.4167489-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-20 17:18:55 -06:00