Commit Graph

17860 Commits

Author SHA1 Message Date
Linus Torvalds
fd56e5104a Merge tag 'for-linus' of https://github.com/openrisc/linux
Pull OpenRISC updates from Stafford Horne:

 - Added support for restartable sequences (me)

 - Migration to Generic built-in DTB (Masahiro Yamada)

* tag 'for-linus' of https://github.com/openrisc/linux:
  rseq/selftests: Add support for OpenRISC
  openrisc: Add support for restartable sequences
  openrisc: Add HAVE_REGS_AND_STACK_ACCESS_API support
  openrisc: migrate to the generic rule for built-in DTB
2025-01-25 10:16:56 -08:00
Linus Torvalds
0f8e26b38d Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
 "Loongarch:

   - Clear LLBCTL if secondary mmu mapping changes

   - Add hypercall service support for usermode VMM

  x86:

   - Add a comment to kvm_mmu_do_page_fault() to explain why KVM
     performs a direct call to kvm_tdp_page_fault() when RETPOLINE is
     enabled

   - Ensure that all SEV code is compiled out when disabled in Kconfig,
     even if building with less brilliant compilers

   - Remove a redundant TLB flush on AMD processors when guest CR4.PGE
     changes

   - Use str_enabled_disabled() to replace open coded strings

   - Drop kvm_x86_ops.hwapic_irr_update() as KVM updates hardware's
     APICv cache prior to every VM-Enter

   - Overhaul KVM's CPUID feature infrastructure to track all vCPU
     capabilities instead of just those where KVM needs to manage state
     and/or explicitly enable the feature in hardware. Along the way,
     refactor the code to make it easier to add features, and to make it
     more self-documenting how KVM is handling each feature

   - Rework KVM's handling of VM-Exits during event vectoring; this
     plugs holes where KVM unintentionally puts the vCPU into infinite
     loops in some scenarios (e.g. if emulation is triggered by the
     exit), and brings parity between VMX and SVM

   - Add pending request and interrupt injection information to the
     kvm_exit and kvm_entry tracepoints respectively

   - Fix a relatively benign flaw where KVM would end up redoing RDPKRU
     when loading guest/host PKRU, due to a refactoring of the kernel
     helpers that didn't account for KVM's pre-checking of the need to
     do WRPKRU

   - Make the completion of hypercalls go through the complete_hypercall
     function pointer argument, no matter if the hypercall exits to
     userspace or not.

     Previously, the code assumed that KVM_HC_MAP_GPA_RANGE specifically
     went to userspace, and all the others did not; the new code need
     not special case KVM_HC_MAP_GPA_RANGE and in fact does not care at
     all whether there was an exit to userspace or not

   - As part of enabling TDX virtual machines, support support
     separation of private/shared EPT into separate roots.

     When TDX will be enabled, operations on private pages will need to
     go through the privileged TDX Module via SEAMCALLs; as a result,
     they are limited and relatively slow compared to reading a PTE.

     The patches included in 6.14 allow KVM to keep a mirror of the
     private EPT in host memory, and define entries in kvm_x86_ops to
     operate on external page tables such as the TDX private EPT

   - The recently introduced conversion of the NX-page reclamation
     kthread to vhost_task moved the task under the main process. The
     task is created as soon as KVM_CREATE_VM was invoked and this, of
     course, broke userspace that didn't expect to see any child task of
     the VM process until it started creating its own userspace threads.

     In particular crosvm refuses to fork() if procfs shows any child
     task, so unbreak it by creating the task lazily. This is arguably a
     userspace bug, as there can be other kinds of legitimate worker
     tasks and they wouldn't impede fork(); but it's not like userspace
     has a way to distinguish kernel worker tasks right now. Should they
     show as "Kthread: 1" in proc/.../status?

  x86 - Intel:

   - Fix a bug where KVM updates hardware's APICv cache of the highest
     ISR bit while L2 is active, while ultimately results in a
     hardware-accelerated L1 EOI effectively being lost

   - Honor event priority when emulating Posted Interrupt delivery
     during nested VM-Enter by queueing KVM_REQ_EVENT instead of
     immediately handling the interrupt

   - Rework KVM's processing of the Page-Modification Logging buffer to
     reap entries in the same order they were created, i.e. to mark gfns
     dirty in the same order that hardware marked the page/PTE dirty

   - Misc cleanups

  Generic:

   - Cleanup and harden kvm_set_memory_region(); add proper lockdep
     assertions when setting memory regions and add a dedicated API for
     setting KVM-internal memory regions. The API can then explicitly
     disallow all flags for KVM-internal memory regions

   - Explicitly verify the target vCPU is online in kvm_get_vcpu() to
     fix a bug where KVM would return a pointer to a vCPU prior to it
     being fully online, and give kvm_for_each_vcpu() similar treatment
     to fix a similar flaw

   - Wait for a vCPU to come online prior to executing a vCPU ioctl, to
     fix a bug where userspace could coerce KVM into handling the ioctl
     on a vCPU that isn't yet onlined

   - Gracefully handle xarray insertion failures; even though such
     failures are impossible in practice after xa_reserve(), reserving
     an entry is always followed by xa_store() which does not know (or
     differentiate) whether there was an xa_reserve() before or not

  RISC-V:

   - Zabha, Svvptc, and Ziccrse extension support for guests. None of
     them require anything in KVM except for detecting them and marking
     them as supported; Zabha adds byte and halfword atomic operations,
     while the others are markers for specific operation of the TLB and
     of LL/SC instructions respectively

   - Virtualize SBI system suspend extension for Guest/VM

   - Support firmware counters which can be used by the guests to
     collect statistics about traps that occur in the host

  Selftests:

   - Rework vcpu_get_reg() to return a value instead of using an
     out-param, and update all affected arch code accordingly

   - Convert the max_guest_memory_test into a more generic
     mmu_stress_test. The basic gist of the "conversion" is to have the
     test do mprotect() on guest memory while vCPUs are accessing said
     memory, e.g. to verify KVM and mmu_notifiers are working as
     intended

   - Play nice with treewrite builds of unsupported architectures, e.g.
     arm (32-bit), as KVM selftests' Makefile doesn't do anything to
     ensure the target architecture is actually one KVM selftests
     supports

   - Use the kernel's $(ARCH) definition instead of the target triple
     for arch specific directories, e.g. arm64 instead of aarch64,
     mainly so as not to be different from the rest of the kernel

   - Ensure that format strings for logging statements are checked by
     the compiler even when the logging statement itself is disabled

   - Attempt to whack the last LLC references/misses mole in the Intel
     PMU counters test by adding a data load and doing CLFLUSH{OPT} on
     the data instead of the code being executed. It seems that modern
     Intel CPUs have learned new code prefetching tricks that bypass the
     PMU counters

   - Fix a flaw in the Intel PMU counters test where it asserts that
     events are counting correctly without actually knowing what the
     events count given the underlying hardware; this can happen if
     Intel reuses a formerly microarchitecture-specific event encoding
     as an architectural event, as was the case for Top-Down Slots"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (151 commits)
  kvm: defer huge page recovery vhost task to later
  KVM: x86/mmu: Return RET_PF* instead of 1 in kvm_mmu_page_fault()
  KVM: Disallow all flags for KVM-internal memslots
  KVM: x86: Drop double-underscores from __kvm_set_memory_region()
  KVM: Add a dedicated API for setting KVM-internal memslots
  KVM: Assert slots_lock is held when setting memory regions
  KVM: Open code kvm_set_memory_region() into its sole caller (ioctl() API)
  LoongArch: KVM: Add hypercall service support for usermode VMM
  LoongArch: KVM: Clear LLBCTL if secondary mmu mapping is changed
  KVM: SVM: Use str_enabled_disabled() helper in svm_hardware_setup()
  KVM: VMX: read the PML log in the same order as it was written
  KVM: VMX: refactor PML terminology
  KVM: VMX: Fix comment of handle_vmx_instruction()
  KVM: VMX: Reinstate __exit attribute for vmx_exit()
  KVM: SVM: Use str_enabled_disabled() helper in sev_hardware_setup()
  KVM: x86: Avoid double RDPKRU when loading host/guest PKRU
  KVM: x86: Use LVT_TIMER instead of an open coded literal
  RISC-V: KVM: Add new exit statstics for redirected traps
  RISC-V: KVM: Update firmware counters for various events
  RISC-V: KVM: Redirect instruction access fault trap to guest
  ...
2025-01-25 09:55:09 -08:00
Linus Torvalds
ae8b53aac3 Merge tag 'efi-next-for-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI updates from Ard Biesheuvel:

 - Increase the headroom in the EFI memory map allocation created by the
   EFI stub. This is needed because event callbacks called during
   ExitBootServices() may cause fragmentation, and reallocation is not
   allowed after that.

 - Drop obsolete UGA graphics code and switch to a more ergonomic API to
   traverse handle buffers. Simplify some error paths using a __free()
   helper while at it.

 - Fix some W=1 warnings when CONFIG_EFI=n

 - Rely on the dentry cache to keep track of the contents of the
   efivarfs filesystem, rather than using a separate linked list.

 - Improve and extend efivarfs test cases.

 - Synchronize efivarfs with underlying variable store on resume from
   hibernation - this is needed because the firmware itself or another
   OS running on the same machine may have modified it.

 - Fix x86 EFI stub build with GCC 15.

 - Fix kexec/x86 false positive warning in EFI memory attributes table
   sanity check.

* tag 'efi-next-for-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (23 commits)
  x86/efi: skip memattr table on kexec boot
  efivarfs: add variable resync after hibernation
  efivarfs: abstract initial variable creation routine
  efi: libstub: Use '-std=gnu11' to fix build with GCC 15
  selftests/efivarfs: add concurrent update tests
  selftests/efivarfs: fix tests for failed write removal
  efivarfs: fix error on write to new variable leaving remnants
  efivarfs: remove unused efivarfs_list
  efivarfs: move variable lifetime management into the inodes
  selftests/efivarfs: add check for disallowing file truncation
  efivarfs: prevent setting of zero size on the inodes in the cache
  efi: sysfb_efi: fix W=1 warnings when EFI is not set
  efi/libstub: Use __free() helper for pool deallocations
  efi/libstub: Use cleanup helpers for freeing copies of the memory map
  efi/libstub: Simplify PCI I/O handle buffer traversal
  efi/libstub: Refactor and clean up GOP resolution picker code
  efi/libstub: Simplify GOP handling code
  efi/libstub: Use C99-style for loop to traverse handle buffer
  x86/efistub: Drop long obsolete UGA support
  efivarfs: make variable_is_present use dcache lookup
  ...
2025-01-24 15:33:33 -08:00
Linus Torvalds
bc8198dc7e Merge tag 'sched_ext-for-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext updates from Tejun Heo:

 - scx_bpf_now() added so that BPF scheduler can access the cached
   timestamp in struct rq to avoid reading TSC multiple times within a
   locked scheduling operation.

 - Minor updates to the built-in idle CPU selection logic.

 - tool/sched_ext updates and other misc changes.

* tag 'sched_ext-for-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext: fix kernel-doc warnings
  sched_ext: Use time helpers in BPF schedulers
  sched_ext: Replace bpf_ktime_get_ns() to scx_bpf_now()
  sched_ext: Add time helpers for BPF schedulers
  sched_ext: Add scx_bpf_now() for BPF scheduler
  sched_ext: Implement scx_bpf_now()
  sched_ext: Relocate scx_enabled() related code
  sched_ext: Add option -l in selftest runner to list all available tests
  sched_ext: Include remaining task time slice in error state dump
  sched_ext: update scx_bpf_dsq_insert() doc for SCX_DSQ_LOCAL_ON
  sched_ext: idle: small CPU iteration refactoring
  sched_ext: idle: introduce check_builtin_idle_enabled() helper
  sched_ext: idle: clarify comments
  sched_ext: idle: use assign_cpu() to update the idle cpumask
  sched_ext: Use str_enabled_disabled() helper in update_selcpu_topology()
  sched_ext: Use sizeof_field for key_len in dsq_hash_params
  tools/sched_ext: Receive updates from SCX repo
  sched_ext: Use the NUMA scheduling domain for NUMA optimizations
2025-01-23 18:49:43 -08:00
Linus Torvalds
e8744fbc83 Merge tag 'trace-v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:

 - Cleanup with guard() and free() helpers

   There were several places in the code that had a lot of "goto out" in
   the error paths to either unlock a lock or free some memory that was
   allocated. But this is error prone. Convert the code over to use the
   guard() and free() helpers that let the compiler unlock locks or free
   memory when the function exits.

 - Update the Rust tracepoint code to use the C code too

   There was some duplication of the tracepoint code for Rust that did
   the same logic as the C code. Add a helper that makes it possible for
   both algorithms to use the same logic in one place.

 - Add poll to trace event hist files

   It is useful to know when an event is triggered, or even with some
   filtering. Since hist files of events get updated when active and the
   event is triggered, allow applications to poll the hist file and wake
   up when an event is triggered. This will let the application know
   that the event it is waiting for happened.

 - Add :mod: command to enable events for current or future modules

   The function tracer already has a way to enable functions to be
   traced in modules by writing ":mod:<module>" into set_ftrace_filter.
   That will enable either all the functions for the module if it is
   loaded, or if it is not, it will cache that command, and when the
   module is loaded that matches <module>, its functions will be
   enabled. This also allows init functions to be traced. But currently
   events do not have that feature.

   Add the command where if ':mod:<module>' is written into set_event,
   then either all the modules events are enabled if it is loaded, or
   cache it so that the module's events are enabled when it is loaded.
   This also works from the kernel command line, where
   "trace_event=:mod:<module>", when the module is loaded at boot up,
   its events will be enabled then.

* tag 'trace-v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (26 commits)
  tracing: Fix output of set_event for some cached module events
  tracing: Fix allocation of printing set_event file content
  tracing: Rename update_cache() to update_mod_cache()
  tracing: Fix #if CONFIG_MODULES to #ifdef CONFIG_MODULES
  selftests/ftrace: Add test that tests event :mod: commands
  tracing: Cache ":mod:" events for modules not loaded yet
  tracing: Add :mod: command to enabled module events
  selftests/tracing: Add hist poll() support test
  tracing/hist: Support POLLPRI event for poll on histogram
  tracing/hist: Add poll(POLLIN) support on hist file
  tracing: Fix using ret variable in tracing_set_tracer()
  tracepoint: Reduce duplication of __DO_TRACE_CALL
  tracing/string: Create and use __free(argv_free) in trace_dynevent.c
  tracing: Switch trace_stat.c code over to use guard()
  tracing: Switch trace_stack.c code over to use guard()
  tracing: Switch trace_osnoise.c code over to use guard() and __free()
  tracing: Switch trace_events_synth.c code over to use guard()
  tracing: Switch trace_events_filter.c code over to use guard()
  tracing: Switch trace_events_trigger.c code over to use guard()
  tracing: Switch trace_events_hist.c code over to use guard()
  ...
2025-01-23 17:51:16 -08:00
Linus Torvalds
7f71554b4e Merge tag 'ktest-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest
Pull ktest updates from Steven Rostedt:

 - Fix use of KERNEL_VERSION in newly created output directory

   If a new output directory is created (O=/dir), and one of the options
   uses KERNEL_VERSION which will run a "make kernelversion" in the
   output directory, it will fail because there is no config file yet.

   In this case, have it do a "make allnoconfig" which is the minimal
   needed to run the "make kernelversion".

 - Remove unused variables

 - Fix some typos

* tag 'ktest-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
  ktest.pl: Fix typo "accesing"
  ktest.pl: Fix typo in comment
  ktest.pl: Remove unused declarations in run_bisect_test function
  ktest.pl: Check kernelrelease return in get_version
2025-01-23 17:45:24 -08:00
Linus Torvalds
d0d106a2bd Merge tag 'bpf-next-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
 "A smaller than usual release cycle.

  The main changes are:

   - Prepare selftest to run with GCC-BPF backend (Ihor Solodrai)

     In addition to LLVM-BPF runs the BPF CI now runs GCC-BPF in compile
     only mode. Half of the tests are failing, since support for
     btf_decl_tag is still WIP, but this is a great milestone.

   - Convert various samples/bpf to selftests/bpf/test_progs format
     (Alexis Lothoré and Bastien Curutchet)

   - Teach verifier to recognize that array lookup with constant
     in-range index will always succeed (Daniel Xu)

   - Cleanup migrate disable scope in BPF maps (Hou Tao)

   - Fix bpf_timer destroy path in PREEMPT_RT (Hou Tao)

   - Always use bpf_mem_alloc in bpf_local_storage in PREEMPT_RT (Martin
     KaFai Lau)

   - Refactor verifier lock support (Kumar Kartikeya Dwivedi)

     This is a prerequisite for upcoming resilient spin lock.

   - Remove excessive 'may_goto +0' instructions in the verifier that
     LLVM leaves when unrolls the loops (Yonghong Song)

   - Remove unhelpful bpf_probe_write_user() warning message (Marco
     Elver)

   - Add fd_array_cnt attribute for prog_load command (Anton Protopopov)

     This is a prerequisite for upcoming support for static_branch"

* tag 'bpf-next-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (125 commits)
  selftests/bpf: Add some tests related to 'may_goto 0' insns
  bpf: Remove 'may_goto 0' instruction in opt_remove_nops()
  bpf: Allow 'may_goto 0' instruction in verifier
  selftests/bpf: Add test case for the freeing of bpf_timer
  bpf: Cancel the running bpf_timer through kworker for PREEMPT_RT
  bpf: Free element after unlock in __htab_map_lookup_and_delete_elem()
  bpf: Bail out early in __htab_map_lookup_and_delete_elem()
  bpf: Free special fields after unlock in htab_lru_map_delete_node()
  tools: Sync if_xdp.h uapi tooling header
  libbpf: Work around kernel inconsistently stripping '.llvm.' suffix
  bpf: selftests: verifier: Add nullness elision tests
  bpf: verifier: Support eliding map lookup nullness
  bpf: verifier: Refactor helper access type tracking
  bpf: tcp: Mark bpf_load_hdr_opt() arg2 as read-write
  bpf: verifier: Add missing newline on verbose() call
  selftests/bpf: Add distilled BTF test about marking BTF_IS_EMBEDDED
  libbpf: Fix incorrect traversal end type ID when marking BTF_IS_EMBEDDED
  libbpf: Fix return zero when elf_begin failed
  selftests/bpf: Fix btf leak on new btf alloc failure in btf_distill test
  veristat: Load struct_ops programs only once
  ...
2025-01-23 08:04:07 -08:00
Linus Torvalds
21266b8df5 Merge tag 'AT_EXECVE_CHECK-v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull AT_EXECVE_CHECK from Kees Cook:

 - Implement AT_EXECVE_CHECK flag to execveat(2) (Mickaël Salaün)

 - Implement EXEC_RESTRICT_FILE and EXEC_DENY_INTERACTIVE securebits
   (Mickaël Salaün)

 - Add selftests and samples for AT_EXECVE_CHECK (Mickaël Salaün)

* tag 'AT_EXECVE_CHECK-v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  ima: instantiate the bprm_creds_for_exec() hook
  samples/check-exec: Add an enlighten "inc" interpreter and 28 tests
  selftests: ktap_helpers: Fix uninitialized variable
  samples/check-exec: Add set-exec
  selftests/landlock: Add tests for execveat + AT_EXECVE_CHECK
  selftests/exec: Add 32 tests for AT_EXECVE_CHECK and exec securebits
  security: Add EXEC_RESTRICT_FILE and EXEC_DENY_INTERACTIVE securebits
  exec: Add a new AT_EXECVE_CHECK flag to execveat(2)
2025-01-22 20:34:42 -08:00
Linus Torvalds
de5817bbfb Merge tag 'landlock-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock updates from Mickaël Salaün:
 "This mostly factors out some Landlock code and prepares for upcoming
  audit support.

  Because files with invalid modes might be visible after filesystem
  corruption, Landlock now handles those weird files too.

  A few sample and test issues are also fixed"

* tag 'landlock-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
  selftests/landlock: Add layout1.umount_sandboxer tests
  selftests/landlock: Add wrappers.h
  selftests/landlock: Fix error message
  landlock: Optimize file path walks and prepare for audit support
  selftests/landlock: Add test to check partial access in a mount tree
  landlock: Align partial refer access checks with final ones
  landlock: Simplify initially denied access rights
  landlock: Move access types
  landlock: Factor out check_access_path()
  selftests/landlock: Fix build with non-default pthread linking
  landlock: Use scoped guards for ruleset in landlock_add_rule()
  landlock: Use scoped guards for ruleset
  landlock: Constify get_mode_access()
  landlock: Handle weird files
  samples/landlock: Fix possible NULL dereference in parse_path()
  selftests/landlock: Remove unused macros in ptrace_test.c
2025-01-22 20:20:55 -08:00
Linus Torvalds
37b33c68b0 Merge tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull CRC updates from Eric Biggers:

 - Reorganize the architecture-optimized CRC32 and CRC-T10DIF code to be
   directly accessible via the library API, instead of requiring the
   crypto API. This is much simpler and more efficient.

 - Convert some users such as ext4 to use the CRC32 library API instead
   of the crypto API. More conversions like this will come later.

 - Add a KUnit test that tests and benchmarks multiple CRC variants.
   Remove older, less-comprehensive tests that are made redundant by
   this.

 - Add an entry to MAINTAINERS for the kernel's CRC library code. I'm
   volunteering to maintain it. I have additional cleanups and
   optimizations planned for future cycles.

* tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (31 commits)
  MAINTAINERS: add entry for CRC library
  powerpc/crc: delete obsolete crc-vpmsum_test.c
  lib/crc32test: delete obsolete crc32test.c
  lib/crc16_kunit: delete obsolete crc16_kunit.c
  lib/crc_kunit.c: add KUnit test suite for CRC library functions
  powerpc/crc-t10dif: expose CRC-T10DIF function through lib
  arm64/crc-t10dif: expose CRC-T10DIF function through lib
  arm/crc-t10dif: expose CRC-T10DIF function through lib
  x86/crc-t10dif: expose CRC-T10DIF function through lib
  crypto: crct10dif - expose arch-optimized lib function
  lib/crc-t10dif: add support for arch overrides
  lib/crc-t10dif: stop wrapping the crypto API
  scsi: target: iscsi: switch to using the crc32c library
  f2fs: switch to using the crc32 library
  jbd2: switch to using the crc32c library
  ext4: switch to using the crc32c library
  lib/crc32: make crc32c() go directly to lib
  bcachefs: Explicitly select CRYPTO from BCACHEFS_FS
  x86/crc32: expose CRC32 functions through lib
  x86/crc32: update prototype for crc32_pclmul_le_16()
  ...
2025-01-22 19:55:08 -08:00
Linus Torvalds
7004a2e46d Merge tag 'linux_kselftest-nolibc-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull nolibc updates from Shuah Khan:

 - add support for waitid()

 - use waitid() over waitpid()

 - use a pipe in vfprintf tests

 - skip tests for unimplemented syscalls

 - rename riscv to riscv64

 - add configurations for riscv32

 - add detecting missing toolchain to run-tests.sh

* tag 'linux_kselftest-nolibc-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/nolibc: add configurations for riscv32
  selftests/nolibc: rename riscv to riscv64
  selftests/nolibc: skip tests for unimplemented syscalls
  selftests/nolibc: use a pipe to in vfprintf tests
  selftests/nolibc: use waitid() over waitpid()
  tools/nolibc: add support for waitid()
  selftests/nolibc: run-tests.sh: detect missing toolchain
2025-01-22 12:36:16 -08:00
Linus Torvalds
e8f17cb6f5 Merge tag 'linux_kselftest-kunit-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit updates from Shuah Khan:

 - fix struct completion warning

 - introduce autorun option

 - add fallback for os.sched_getaffinity

 - enable hardware acceleration when available

* tag 'linux_kselftest-kunit-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  kunit: Introduce autorun option
  kunit: enable hardware acceleration when available
  kunit: add fallback for os.sched_getaffinity
  kunit: platform: Resolve 'struct completion' warning
2025-01-22 12:32:39 -08:00
Linus Torvalds
8fb1e2eed1 Merge tag 'linux_kselftest-next-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest updates from Shuah Khan:

 - fixes, reporting improvements, and cleanup changes to several tests

 - add support for DT_GNU_HASH to selftests/vDSO

* tag 'linux_kselftest-next-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/rseq: Fix handling of glibc without rseq support
  selftests/resctrl: Discover SNC kernel support and adjust messages
  selftests/resctrl: Adjust effective L3 cache size with SNC enabled
  selftests/ftrace: Make uprobe test more robust against binary name
  selftests/ftrace: Fix to use remount when testing mount GID option
  selftests: tmpfs: Add kselftest support to tmpfs
  selftests: tmpfs: Add Test-skip if not run as root
  selftests: harness: fix printing of mismatch values in __EXPECT()
  selftests/ring-buffer: Add test for out-of-bound pgoff mapping
  selftests/run_kselftest.sh: Fix help string for --per-test-log
  selftests: acct: Add ksft_exit_skip if not running as root
  selftests: kselftest: Fix the wrong format specifier
  selftests: timers: clocksource-switch: Adapt progress to kselftest framework
  selftests/zram: gitignore output file
  selftests/filesystems: Add missing gitignore file
  selftests: Warn about skipped tests in result summary
  selftests: kselftest: Add ksft_test_result_xpass
  selftests/vDSO: support DT_GNU_HASH
  selftests/ipc: Remove unused variables
  selftest: media_tests: fix trivial UAF typo
2025-01-22 12:30:20 -08:00
Linus Torvalds
27c0278477 Merge tag 'hid-for-linus-2025012001' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID updates from Jiri Kosina:

 - newly added support for Intel Touch Host Controller (Even Xu, Xinpeng
   Sun)

 - hid-core fix for long-standing syzbot-reported cornercase of
   Resolution Multiplier not being present in any of the Logical
   Collections in the device HID report descriptor (Alan Stern)

 - improvement of behavior for non-standard LED brightness values for
   Wacom driver (Jason Gerecke)

 - PCI Wacom device support (depends on Intel THC support) (Even Xu)

 - SteelSeries Arctis 9 support (Christian Mayer)

 - constification of 'struct bin_attribute' in various HID driver
   (Thomas Weißschuh)

 - other assorted code cleanups / fixes and device ID additions

* tag 'hid-for-linus-2025012001' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: (63 commits)
  HID: hid-asus: Disable OOBE mode on the ProArt P16
  HID: steelseries: remove unnecessary return
  HID: steelseries: export model and manufacturer
  HID: steelseries: export charging state for the SteelSeries Arctis 9 headset
  HID: steelseries: add SteelSeries Arctis 9 support
  HID: steelseries: preparation for adding SteelSeries Arctis 9 support
  HID: intel-thc-hid: fix build errors in um mode
  HID: intel-thc-hid: intel-quicki2c: fix potential memory corruption
  HID: intel-thc-hid: intel-thc: Fix error code in thc_i2c_subip_init()
  HID: lenovo: Fix undefined platform_profile_cycle in ThinkPad X12 keyboard patch
  HID: uclogic: make const read-only array touch_ring_model_params_buf static
  HID: hid-steam: Make sure rumble work is canceled on removal
  HID: Wacom: Add PCI Wacom device support
  HID: intel-thc-hid: intel-quicki2c: Add PM implementation
  HID: intel-thc-hid: intel-quicki2c: Complete THC QuickI2C driver
  HID: intel-thc-hid: intel-quicki2c: Add HIDI2C protocol implementation
  HID: intel-thc-hid: intel-quicki2c: Add THC QuickI2C ACPI interfaces
  HID: intel-thc-hid: intel-quicki2c: Add THC QuickI2C driver hid layer
  HID: intel-thc-hid: intel-quicki2c: Add THC QuickI2C driver skeleton
  HID: intel-thc-hid: intel-quickspi: Add PM implementation
  ...
2025-01-22 11:56:39 -08:00
Linus Torvalds
f4b9d3bf44 Merge tag 'pm-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
 "The majority of changes here are cpufreq updates which are dominated
  by amd-pstate driver changes, like in the previous cycle. Moreover,
  changes related to amd-pstate are also the majority of cpupower
  utility updates.

  Included are some pieces of new hardware support, like the addition of
  Clearwater Forest processors support to intel_idle, new cpufreq driver
  for Airoha SoCs, and Apple cpufreq driver extensions to support more
  SoCs. The intel_pstate driver is also extended to be able to support
  new platforms by using ACPI CPPC to compute scaling factors between
  HWP performance states and frequency.

  The rest is mostly fixes and cleanups in assorted pieces of power
  management code.

  Specifics:

   - Use str_enable_disable()-like helpers in cpufreq (Krzysztof
     Kozlowski)

   - Extend the Apple cpufreq driver to support more SoCs (Hector
     Martin, Nick Chan)

   - Add new cpufreq driver for Airoha SoCs (Christian Marangi)

   - Fix using cpufreq-dt as module (Andreas Kemnade)

   - Minor fixes for Sparc, SCMI, and Qcom cpufreq drivers (Ethan Carter
     Edwards, Sibi Sankar, Manivannan Sadhasivam)

   - Fix the maximum supported frequency computation in the ACPI cpufreq
     driver to avoid relying on unfounded assumptions (Gautham Shenoy)

   - Fix an amd-pstate driver regression with preferred core rankings
     not being used (Mario Limonciello)

   - Fix a precision issue with frequency calculation in the amd-pstate
     driver (Naresh Solanki)

   - Add ftrace event to the amd-pstate driver for active mode (Mario
     Limonciello)

   - Set default EPP policy on Ryzen processors in amd-pstate (Mario
     Limonciello)

   - Clean up the amd-pstate cpufreq driver and optimize it to increase
     code reuse (Mario Limonciello, Dhananjay Ugwekar)

   - Use CPPC to get scaling factors between HWP performance levels and
     frequency in the intel_pstate driver and make it stop using a
     built-in scaling factor for Arrow Lake processors (Rafael Wysocki)

   - Make intel_pstate initialize epp_policy to CPUFREQ_POLICY_UNKNOWN
     for consistency with CPU offline (Christian Loehle)

   - Fix superfluous updates caused by need_freq_update in the schedutil
     cpufreq governor (Sultan Alsawaf)

   - Allow configuring the system suspend-resume (DPM) watchdog to warn
     earlier than panic (Douglas Anderson)

   - Implement devm_device_init_wakeup() helper and introduce a device-
     managed variant of dev_pm_set_wake_irq() (Joe Hattori, Peng Fan)

   - Remove direct inclusions of 'pm_wakeup.h' which should be only
     included via 'device.h' (Wolfram Sang)

   - Clean up two comments in the core system-wide PM code (Rafael
     Wysocki, Randy Dunlap)

   - Add Clearwater Forest processor support to the intel_idle cpuidle
     driver (Artem Bityutskiy)

   - Clean up the Exynos devfreq driver and devfreq core (Markus
     Elfring, Jeongjun Park)

   - Minor cleanups and fixes for OPP (Dan Carpenter, Neil Armstrong,
     Joe Hattori)

   - Implement dev_pm_opp_get_bw() (Neil Armstrong)

   - Expose OPP reference counting helpers for Rust (Viresh Kumar)

   - Fix TSC MHz calculation in cpupower (He Rongguang)

   - Add install and uninstall options to bindings Makefile and add
     header changes for cpufreq.h to SWIG bindings in cpupower (John B.
     Wyatt IV)

   - Add missing residency header changes in cpuidle.h to SWIG bindings
     in cpupower (John B. Wyatt IV)

   - Add output files to .gitignore and clean them up in "make clean" in
     selftests/cpufreq (Li Zhijian)

   - Fix cross-compilation in cpupower Makefile (Peng Fan)

   - Revise the is_valid flag handling for idle_monitor in the cpupower
     utility (wangfushuai)

   - Extend and clean up AMD processors support in cpupower (Mario
     Limonciello)"

* tag 'pm-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (67 commits)
  PM / OPP: Add reference counting helpers for Rust implementation
  PM: sleep: wakeirq: Introduce device-managed variant of dev_pm_set_wake_irq()
  cpufreq: Use str_enable_disable()-like helpers
  cpufreq: airoha: Add EN7581 CPUFreq SMCCC driver
  PM: sleep: Allow configuring the DPM watchdog to warn earlier than panic
  PM: sleep: convert comment from kernel-doc to plain comment
  cpufreq: ACPI: Fix max-frequency computation
  pm: cpupower: Add missing residency header changes in cpuidle.h to SWIG
  PM / devfreq: exynos: remove unused function parameter
  OPP: OF: Fix an OF node leak in _opp_add_static_v2()
  cpufreq/amd-pstate: Refactor max frequency calculation
  cpufreq/amd-pstate: Fix prefcore rankings
  pm: cpupower: Add header changes for cpufreq.h to SWIG bindings
  cpufreq: sparc: change kzalloc to kcalloc
  cpufreq: qcom: Implement clk_ops::determine_rate() for qcom_cpufreq* clocks
  cpufreq: qcom: Fix qcom_cpufreq_hw_recalc_rate() to query LUT if LMh IRQ is not available
  cpufreq: apple-soc: Add Apple A7-A8X SoC cpufreq support
  cpufreq: apple-soc: Set fallback transition latency to APPLE_DVFS_TRANSITION_TIMEOUT
  cpufreq: apple-soc: Increase cluster switch timeout to 400us
  cpufreq: apple-soc: Use 32-bit read for status register
  ...
2025-01-22 11:16:14 -08:00
Linus Torvalds
0ad9617c78 Merge tag 'net-next-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
 "This is slightly smaller than usual, with the most interesting work
  being still around RTNL scope reduction.

  Core:

   - More core refactoring to reduce the RTNL lock contention, including
     preparatory work for the per-network namespace RTNL lock, replacing
     RTNL lock with a per device-one to protect NAPI-related net device
     data and moving synchronize_net() calls outside such lock.

   - Extend drop reasons usage, adding net scheduler, AF_UNIX, bridge
     and more specific TCP coverage.

   - Reduce network namespace tear-down time by removing per-subsystems
     synchronize_net() in tipc and sched.

   - Add flow label selector support for fib rules, allowing traffic
     redirection based on such header field.

  Netfilter:

   - Do not remove netdev basechain when last device is gone, allowing
     netdev basechains without devices.

   - Revisit the flowtable teardown strategy, dealing better with fin,
     reset and re-open events.

   - Scale-up IP-vs connection dumping by avoiding linear search on each
     restart.

  Protocols:

   - A significant XDP socket refactor, consolidating and optimizing
     several helpers into the core

   - Better scaling of ICMP rate-limiting, by removing false-sharing in
     inet peers handling.

   - Introduces netlink notifications for multicast IPv4 and IPv6
     address changes.

   - Add ipsec support for IP-TFS/AggFrag encapsulation, allowing
     aggregation and fragmentation of the inner IP.

   - Add sysctl to configure TIME-WAIT reuse delay for TCP sockets, to
     avoid local port exhaustion issues when the average connection
     lifetime is very short.

   - Support updating keys (re-keying) for connections using kernel TLS
     (for TLS 1.3 only).

   - Support ipv4-mapped ipv6 address clients in smc-r v2.

   - Add support for jumbo data packet transmission in RxRPC sockets,
     gluing multiple data packets in a single UDP packet.

   - Support RxRPC RACK-TLP to manage packet loss and retransmission in
     conjunction with the congestion control algorithm.

  Driver API:

   - Introduce a unified and structured interface for reporting PHY
     statistics, exposing consistent data across different H/W via
     ethtool.

   - Make timestamping selectable, allow the user to select the desired
     hwtstamp provider (PHY or MAC) administratively.

   - Add support for configuring a header-data-split threshold (HDS)
     value via ethtool, to deal with partial or buggy H/W
     implementation.

   - Consolidate DSA drivers Energy Efficiency Ethernet support.

   - Add EEE management to phylink, making use of the phylib
     implementation.

   - Add phylib support for in-band capabilities negotiation.

   - Simplify how phylib-enabled mac drivers expose the supported
     interfaces.

  Tests and tooling:

   - Make the YNL tool package-friendly to make it easier to deploy it
     separately from the kernel.

   - Increase TCP selftest coverage importing several packetdrill
     test-cases.

   - Regenerate the ethtool uapi header from the YNL spec, to ease
     maintenance and future development.

   - Add YNL support for decoding the link types used in net self-tests,
     allowing a single build to run both net and drivers/net.

  Drivers:

   - Ethernet high-speed NICs:
      - nVidia/Mellanox (mlx5):
         - add cross E-Switch QoS support
         - add SW Steering support for ConnectX-8
         - implement support for HW-Managed Flow Steering, improving the
           rule deletion/insertion rate
         - support for multi-host LAG
      - Intel (ixgbe, ice, igb):
         - ice: add support for devlink health events
         - ixgbe: add initial support for E610 chipset variant
         - igb: add support for AF_XDP zero-copy
      - Meta:
         - add support for basic RSS config
         - allow changing the number of channels
         - add hardware monitoring support
      - Broadcom (bnxt):
         - implement TCP data split and HDS threshold ethtool support,
           enabling Device Memory TCP.
      - Marvell Octeon:
         - implement egress ipsec offload support for the cn10k family
      - Hisilicon (HIBMC):
         - implement unicast MAC filtering

   - Ethernet NICs embedded and virtual:
      - Convert UDP tunnel drivers to NETDEV_PCPU_STAT_DSTATS, avoiding
        contented atomic operations for drop counters
      - Freescale:
         - quicc: phylink conversion
         - enetc: support Tx and Rx checksum offload and improve TSO
           performances
      - MediaTek:
         - airoha: introduce support for ETS and HTB Qdisc offload
      - Microchip:
         - lan78XX USB: preparation work for phylink conversion
      - Synopsys (stmmac):
         - support DWMAC IP on NXP Automotive SoCs S32G2xx/S32G3xx/S32R45
         - refactor EEE support to leverage the new driver API
         - optimize DMA and cache access to increase raw RX performances
           by 40%
      - TI:
         - icssg-prueth: add multicast filtering support for VLAN
           interface
      - netkit:
         - add ability to configure head/tailroom
      - VXLAN:
         - accepts packets with user-defined reserved bit

   - Ethernet switches:
      - Microchip:
         - lan969x: add RGMII support
         - lan969x: improve TX and RX performance using the FDMA engine
      - nVidia/Mellanox:
         - move Tx header handling to PCI driver, to ease XDP support

   - Ethernet PHYs:
      - Texas Instruments DP83822:
         - add support for GPIO2 clock output
      - Realtek:
         - 8169: add support for RTL8125D rev.b
         - rtl822x: add hwmon support for the temperature sensor
      - Microchip:
         - add support for RDS PTP hardware
         - consolidate periodic output signal generation

   - CAN:
      - several DT-bindings to DT schema conversions
      - tcan4x5x:
         - add HW standby support
         - support nWKRQ voltage selection
      - kvaser:
         - allowing Bus Error Reporting runtime configuration

   - WiFi:
      - the on-going Multi-Link Operation (MLO) effort continues,
        affecting both the stack and in drivers
      - mac80211/cfg80211:
         - Emergency Preparedness Communication Services (EPCS) station
           mode support
         - support for adding and removing station links for MLO
         - add support for WiFi 7/EHT mesh over 320 MHz channels
         - report Tx power info for each link
      - RealTek (rtw88):
         - enable USB Rx aggregation and USB 3 to improve performance
         - LED support
      - RealTek (rtw89):
         - refactor power save to support Multi-Link Operations
         - add support for RTL8922AE-VS variant
      - MediaTek (mt76):
         - single wiphy multiband support (preparation for MLO)
         - p2p device support
         - add TP-Link TXE50UH USB adapter support
      - Qualcomm (ath10k):
         - support for the QCA6698AQ IP core
      - Qualcomm (ath12k):
         - enable MLO for QCN9274

   - Bluetooth:
      - Allow sysfs to trigger hdev reset, to allow recovering devices
        not responsive from user-space
      - MediaTek: add support for MT7922, MT7925, MT7921e devices
      - Realtek: add support for RTL8851BE devices
      - Qualcomm: add support for WCN785x devices
      - ISO: allow BIG re-sync"

* tag 'net-next-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1386 commits)
  net/rose: prevent integer overflows in rose_setsockopt()
  net: phylink: fix regression when binding a PHY
  net: ethernet: ti: am65-cpsw: streamline TX queue creation and cleanup
  net: ethernet: ti: am65-cpsw: streamline RX queue creation and cleanup
  net: ethernet: ti: am65-cpsw: ensure proper channel cleanup in error path
  ipv6: Convert inet6_rtm_deladdr() to per-netns RTNL.
  ipv6: Convert inet6_rtm_newaddr() to per-netns RTNL.
  ipv6: Move lifetime validation to inet6_rtm_newaddr().
  ipv6: Set cfg.ifa_flags before device lookup in inet6_rtm_newaddr().
  ipv6: Pass dev to inet6_addr_add().
  ipv6: Convert inet6_ioctl() to per-netns RTNL.
  ipv6: Hold rtnl_net_lock() in addrconf_init() and addrconf_cleanup().
  ipv6: Hold rtnl_net_lock() in addrconf_dad_work().
  ipv6: Hold rtnl_net_lock() in addrconf_verify_work().
  ipv6: Convert net.ipv6.conf.${DEV}.XXX sysctl to per-netns RTNL.
  ipv6: Add __in6_dev_get_rtnl_net().
  net: stmmac: Drop redundant skb_mark_for_recycle() for SKB frags
  net: mii: Fix the Speed display when the network cable is not connected
  sysctl net: Remove macro checks for CONFIG_SYSCTL
  eth: bnxt: update header sizing defaults
  ...
2025-01-22 08:28:57 -08:00
Linus Torvalds
f96a974170 Merge tag 'lsm-pr-20250121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm updates from Paul Moore:

 - Improved handling of LSM "secctx" strings through lsm_context struct

   The LSM secctx string interface is from an older time when only one
   LSM was supported, migrate over to the lsm_context struct to better
   support the different LSMs we now have and make it easier to support
   new LSMs in the future.

   These changes explain the Rust, VFS, and networking changes in the
   diffstat.

 - Only build lsm_audit.c if CONFIG_SECURITY and CONFIG_AUDIT are
   enabled

   Small tweak to be a bit smarter about when we build the LSM's common
   audit helpers.

 - Check for absurdly large policies from userspace in SafeSetID

   SafeSetID policies rules are fairly small, basically just "UID:UID",
   it easy to impose a limit of KMALLOC_MAX_SIZE on policy writes which
   helps quiet a number of syzbot related issues. While work is being
   done to address the syzbot issues through other mechanisms, this is a
   trivial and relatively safe fix that we can do now.

 - Various minor improvements and cleanups

   A collection of improvements to the kernel selftests, constification
   of some function parameters, removing redundant assignments, and
   local variable renames to improve readability.

* tag 'lsm-pr-20250121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
  lockdown: initialize local array before use to quiet static analysis
  safesetid: check size of policy writes
  net: corrections for security_secid_to_secctx returns
  lsm: rename variable to avoid shadowing
  lsm: constify function parameters
  security: remove redundant assignment to return variable
  lsm: Only build lsm_audit.c if CONFIG_SECURITY and CONFIG_AUDIT are set
  selftests: refactor the lsm `flags_overset_lsm_set_self_attr` test
  binder: initialize lsm_context structure
  rust: replace lsm context+len with lsm_context
  lsm: secctx provider check on release
  lsm: lsm_context in security_dentry_init_security
  lsm: use lsm_context in security_inode_getsecctx
  lsm: replace context+len with lsm_context
  lsm: ensure the correct LSM context releaser
2025-01-21 20:03:04 -08:00
Linus Torvalds
2e04247f7c Merge tag 'ftrace-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ftrace updates from Steven Rostedt:

 - Have fprobes built on top of function graph infrastructure

   The fprobe logic is an optimized kprobe that uses ftrace to attach to
   functions when a probe is needed at the start or end of the function.
   The fprobe and kretprobe logic implements a similar method as the
   function graph tracer to trace the end of the function. That is to
   hijack the return address and jump to a trampoline to do the trace
   when the function exits. To do this, a shadow stack needs to be
   created to store the original return address. Fprobes and function
   graph do this slightly differently. Fprobes (and kretprobes) has
   slots per callsite that are reserved to save the return address. This
   is fine when just a few points are traced. But users of fprobes, such
   as BPF programs, are starting to add many more locations, and this
   method does not scale.

   The function graph tracer was created to trace all functions in the
   kernel. In order to do this, when function graph tracing is started,
   every task gets its own shadow stack to hold the return address that
   is going to be traced. The function graph tracer has been updated to
   allow multiple users to use its infrastructure. Now have fprobes be
   one of those users. This will also allow for the fprobe and kretprobe
   methods to trace the return address to become obsolete. With new
   technologies like CFI that need to know about these methods of
   hijacking the return address, going toward a solution that has only
   one method of doing this will make the kernel less complex.

 - Cleanup with guard() and free() helpers

   There were several places in the code that had a lot of "goto out" in
   the error paths to either unlock a lock or free some memory that was
   allocated. But this is error prone. Convert the code over to use the
   guard() and free() helpers that let the compiler unlock locks or free
   memory when the function exits.

 - Remove disabling of interrupts in the function graph tracer

   When function graph tracer was first introduced, it could race with
   interrupts and NMIs. To prevent that race, it would disable
   interrupts and not trace NMIs. But the code has changed to allow NMIs
   and also interrupts. This change was done a long time ago, but the
   disabling of interrupts was never removed. Remove the disabling of
   interrupts in the function graph tracer is it is not needed. This
   greatly improves its performance.

 - Allow the :mod: command to enable tracing module functions on the
   kernel command line.

   The function tracer already has a way to enable functions to be
   traced in modules by writing ":mod:<module>" into set_ftrace_filter.
   That will enable either all the functions for the module if it is
   loaded, or if it is not, it will cache that command, and when the
   module is loaded that matches <module>, its functions will be
   enabled. This also allows init functions to be traced. But currently
   events do not have that feature.

   Because enabling function tracing can be done very early at boot up
   (before scheduling is enabled), the commands that can be done when
   function tracing is started is limited. Having the ":mod:" command to
   trace module functions as they are loaded is very useful. Update the
   kernel command line function filtering to allow it.

* tag 'ftrace-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (26 commits)
  ftrace: Implement :mod: cache filtering on kernel command line
  tracing: Adopt __free() and guard() for trace_fprobe.c
  bpf: Use ftrace_get_symaddr() for kprobe_multi probes
  ftrace: Add ftrace_get_symaddr to convert fentry_ip to symaddr
  Documentation: probes: Update fprobe on function-graph tracer
  selftests/ftrace: Add a test case for repeating register/unregister fprobe
  selftests: ftrace: Remove obsolate maxactive syntax check
  tracing/fprobe: Remove nr_maxactive from fprobe
  fprobe: Add fprobe_header encoding feature
  fprobe: Rewrite fprobe on function-graph tracer
  s390/tracing: Enable HAVE_FTRACE_GRAPH_FUNC
  ftrace: Add CONFIG_HAVE_FTRACE_GRAPH_FUNC
  bpf: Enable kprobe_multi feature if CONFIG_FPROBE is enabled
  tracing/fprobe: Enable fprobe events with CONFIG_DYNAMIC_FTRACE_WITH_ARGS
  tracing: Add ftrace_fill_perf_regs() for perf event
  tracing: Add ftrace_partial_regs() for converting ftrace_regs to pt_regs
  fprobe: Use ftrace_regs in fprobe exit handler
  fprobe: Use ftrace_regs in fprobe entry handler
  fgraph: Pass ftrace_regs to retfunc
  fgraph: Replace fgraph_ret_regs with ftrace_regs
  ...
2025-01-21 15:15:28 -08:00
Linus Torvalds
9f3ee94e70 Merge tag 'rcu.release.v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux
Pull RCU updates from Uladzislau Rezki:
 "Misc fixes:
   - check if IRQs are disabled in rcu_exp_need_qs()
   - instrument KCSAN exclusive-writer assertions
   - add extra WARN_ON_ONCE() check
   - set the cpu_no_qs.b.exp under lock
   - warn if callback enqueued on offline CPU

  Torture-test updates:
   - add rcutorture.preempt_duration kernel module parameter
   - make the TREE03 scenario do preemption
   - improve pooling timeouts for rcu_torture_writer()
   - improve output of "Failure/close-call rcutorture reader segments"
   - add some reader-state debugging checks
   - update doc of polled APIs
   - add extra diagnostics for per-reader-segment preemption
   - add an extra test for sched_clock()
   - improve testing on unresponsive systems

  SRCU updates:
   - improve doc for srcu_read_lock() in terms of return value
   - fix typo in comments
   - remove redundant GP sequence checks in the srcu_funnel_gp_start"

* tag 'rcu.release.v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (31 commits)
  srcu: Remove redundant GP sequence checks in srcu_funnel_gp_start
  srcu: Fix typo s/srcu_check_read_flavor()/__srcu_check_read_flavor()/
  srcu: Guarantee non-negative return value from srcu_read_lock()
  MAINTAINERS: Update RCU git tree
  rcu: Add lockdep_assert_irqs_disabled() to rcu_exp_need_qs()
  rcu: Add KCSAN exclusive-writer assertions for rdp->cpu_no_qs.b.exp
  rcu: Make preemptible rcu_exp_handler() check idempotency
  rcu: Replace open-coded rcu_exp_need_qs() from rcu_exp_handler() with call
  rcu: Move rcu_report_exp_rdp() setting of ->cpu_no_qs.b.exp under lock
  rcu: Make rcu_report_exp_cpu_mult() caller acquire lock
  rcu: Report callbacks enqueued on offline CPU blind spot
  rcutorture: Use symbols for SRCU reader flavors
  rcutorture: Add per-reader-segment preemption diagnostics
  rcutorture: Read CPU ID for decoration protected by both reader types
  rcutorture: Add preempt_count() to rcutorture_one_extend_check() diagnostics
  rcutorture: Add parameters to control polled/conditional wait interval
  rcutorture: Add documentation for recent conditional and polled APIs
  rcutorture: Ignore attempts to test preemption and forward progress
  rcutorture: Make rcutorture_one_extend() check reader state
  rcutorture: Pretty-print rcutorture reader segments
  ...
2025-01-21 14:39:21 -08:00
Linus Torvalds
336088234e Merge tag 'livepatching-for-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching
Pull livepatching updates from Petr Mladek:

 - Add a sysfs attribute showing the livepatch ordering

 - Some code clean up

* tag 'livepatching-for-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
  selftests: livepatch: add test cases of stack_order sysfs interface
  livepatch: Add stack_order sysfs attribute
  selftests/livepatch: Replace hardcoded module name with variable in test-callbacks.sh
2025-01-21 13:11:26 -08:00
Linus Torvalds
6c4aa896eb Merge tag 'perf-core-2025-01-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull performance events updates from Ingo Molnar:
 "Seqlock optimizations that arose in a perf context and were merged
  into the perf tree:

   - seqlock: Add raw_seqcount_try_begin (Suren Baghdasaryan)
   - mm: Convert mm_lock_seq to a proper seqcount (Suren Baghdasaryan)
   - mm: Introduce mmap_lock_speculate_{try_begin|retry} (Suren
     Baghdasaryan)
   - mm/gup: Use raw_seqcount_try_begin() (Peter Zijlstra)

  Core perf enhancements:

   - Reduce 'struct page' footprint of perf by mapping pages in advance
     (Lorenzo Stoakes)
   - Save raw sample data conditionally based on sample type (Yabin Cui)
   - Reduce sampling overhead by checking sample_type in
     perf_sample_save_callchain() and perf_sample_save_brstack() (Yabin
     Cui)
   - Export perf_exclude_event() (Namhyung Kim)

  Uprobes scalability enhancements: (Andrii Nakryiko)

   - Simplify find_active_uprobe_rcu() VMA checks
   - Add speculative lockless VMA-to-inode-to-uprobe resolution
   - Simplify session consumer tracking
   - Decouple return_instance list traversal and freeing
   - Ensure return_instance is detached from the list before freeing
   - Reuse return_instances between multiple uretprobes within task
   - Guard against kmemdup() failing in dup_return_instance()

  AMD core PMU driver enhancements:

   - Relax privilege filter restriction on AMD IBS (Namhyung Kim)

  AMD RAPL energy counters support: (Dhananjay Ugwekar)

   - Introduce topology_logical_core_id() (K Prateek Nayak)
   - Remove the unused get_rapl_pmu_cpumask() function
   - Remove the cpu_to_rapl_pmu() function
   - Rename rapl_pmu variables
   - Make rapl_model struct global
   - Add arguments to the init and cleanup functions
   - Modify the generic variable names to *_pkg*
   - Remove the global variable rapl_msrs
   - Move the cntr_mask to rapl_pmus struct
   - Add core energy counter support for AMD CPUs

  Intel core PMU driver enhancements:

   - Support RDPMC 'metrics clear mode' feature (Kan Liang)
   - Clarify adaptive PEBS processing (Kan Liang)
   - Factor out functions for PEBS records processing (Kan Liang)
   - Simplify the PEBS records processing for adaptive PEBS (Kan Liang)

  Intel uncore driver enhancements: (Kan Liang)

   - Convert buggy pmu->func_id use to pmu->registered
   - Support more units on Granite Rapids"

* tag 'perf-core-2025-01-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
  perf: map pages in advance
  perf/x86/intel/uncore: Support more units on Granite Rapids
  perf/x86/intel/uncore: Clean up func_id
  perf/x86/intel: Support RDPMC metrics clear mode
  uprobes: Guard against kmemdup() failing in dup_return_instance()
  perf/x86: Relax privilege filter restriction on AMD IBS
  perf/core: Export perf_exclude_event()
  uprobes: Reuse return_instances between multiple uretprobes within task
  uprobes: Ensure return_instance is detached from the list before freeing
  uprobes: Decouple return_instance list traversal and freeing
  uprobes: Simplify session consumer tracking
  uprobes: add speculative lockless VMA-to-inode-to-uprobe resolution
  uprobes: simplify find_active_uprobe_rcu() VMA checks
  mm: introduce mmap_lock_speculate_{try_begin|retry}
  mm: convert mm_lock_seq to a proper seqcount
  mm/gup: Use raw_seqcount_try_begin()
  seqlock: add raw_seqcount_try_begin
  perf/x86/rapl: Add core energy counter support for AMD CPUs
  perf/x86/rapl: Move the cntr_mask to rapl_pmus struct
  perf/x86/rapl: Remove the global variable rapl_msrs
  ...
2025-01-21 10:52:03 -08:00
James Bottomley
87e6cd7cdb selftests/efivarfs: add concurrent update tests
The delete on last close functionality can now only be tested properly
by using multiple threads to hold open the variable files and testing
what happens as they complete.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-01-21 16:34:41 +01:00
Linus Torvalds
95ec54a420 Merge tag 'powerpc-6.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Madhavan Srinivasan:

 - Add preempt lazy support

 - Deprecate cxl and cxl flash driver

 - Fix a possible IOMMU related OOPS at boot on pSeries

 - Optimize sched_clock() in ppc32 by replacing mulhdu() by
   mul_u64_u64_shr()

Thanks to Andrew Donnellan, Andy Shevchenko, Ankur Arora, Christophe
Leroy, Frederic Barrat, Gaurav Batra, Luis Felipe Hernandez, Michael
Ellerman, Nilay Shroff, Ricardo B.  Marliere, Ritesh Harjani (IBM),
Sebastian Andrzej Siewior, Shrikanth Hegde, Sourabh Jain, Thorsten Blum,
and Zhu Jun.

* tag 'powerpc-6.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  selftests/powerpc: Fix argument order to timer_sub()
  powerpc/prom_init: Use IS_ENABLED()
  powerpc/pseries/iommu: IOMMU incorrectly marks MMIO range in DDW
  powerpc: Use str_on_off() helper in check_cache_coherency()
  powerpc: Large user copy aware of full:rt:lazy preemption
  powerpc: Add preempt lazy support
  powerpc/book3s64/hugetlb: Fix disabling hugetlb when fadump is active
  powerpc/vdso: Mark the vDSO code read-only after init
  powerpc/64: Use get_user() in start_thread()
  macintosh: declare ctl_table as const
  selftest/powerpc/ptrace: Cleanup duplicate macro definitions
  selftest/powerpc/ptrace/ptrace-pkey: Remove duplicate macros
  selftest/powerpc/ptrace/core-pkey: Remove duplicate macros
  powerpc/8xx: Drop legacy-of-mm-gpiochip.h header
  scsi/cxlflash: Deprecate driver
  cxl: Deprecate driver
  selftests/powerpc: Fix typo in test-vphn.c
  powerpc/xmon: Use str_yes_no() helper in dump_one_paca()
  powerpc/32: Replace mulhdu() by mul_u64_u64_shr()
2025-01-20 21:40:19 -08:00
Linus Torvalds
9ad09c4f28 Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
 "We've got a little less than normal thanks to the holidays in
  December, but there's the usual summary below. The highlight is
  probably the 52-bit physical addressing (LPA2) clean-up from Ard.

  Confidential Computing:

   - Register a platform device when running in CCA realm mode to enable
     automatic loading of dependent modules

  CPU Features:

   - Update a bunch of system register definitions to pick up new field
     encodings from the architectural documentation

   - Add hwcaps and selftests for the new (2024) dpISA extensions

  Documentation:

   - Update EL3 (firmware) requirements for booting Linux on modern
     arm64 designs

   - Remove stale information about the kernel virtual memory map

  Miscellaneous:

   - Minor cleanups and typo fixes

  Memory management:

   - Fix vmemmap_check_pmd() to look at the PMD type bits

   - LPA2 (52-bit physical addressing) cleanups and minor fixes

   - Adjust physical address space depending upon whether or not LPA2 is
     enabled

  Perf and PMUs:

   - Add port filtering support for NVIDIA's NVLINK-C2C Coresight PMU

   - Extend AXI filtering support for the DDR PMU on NXP IMX SoCs

   - Fix Designware PCIe PMU event numbering

   - Add generic branch events for the Apple M1 CPU PMU

   - Add support for Marvell Odyssey DDR and LLC-TAD PMUs

   - Cleanups to the Hisilicon DDRC and Uncore PMU code

   - Advertise discard mode for the SPE PMU

   - Add the perf users mailing list to our MAINTAINERS entry"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (64 commits)
  Documentation: arm64: Remove stale and redundant virtual memory diagrams
  perf docs: arm_spe: Document new discard mode
  perf: arm_spe: Add format option for discard mode
  MAINTAINERS: Add perf list for drivers/perf/
  arm64: Remove duplicate included header
  drivers/perf: apple_m1: Map generic branch events
  arm64: rsi: Add automatic arm-cca-guest module loading
  kselftest/arm64: Add 2024 dpISA extensions to hwcap test
  KVM: arm64: Allow control of dpISA extensions in ID_AA64ISAR3_EL1
  arm64/hwcap: Describe 2024 dpISA extensions to userspace
  arm64/sysreg: Update ID_AA64SMFR0_EL1 to DDI0601 2024-12
  arm64: Filter out SVE hwcaps when FEAT_SVE isn't implemented
  drivers/perf: hisi: Set correct IRQ affinity for PMUs with no association
  arm64/sme: Move storage of reg_smidr to __cpuinfo_store_cpu()
  arm64: mm: Test for pmd_sect() in vmemmap_check_pmd()
  arm64/mm: Replace open encodings with PXD_TABLE_BIT
  arm64/mm: Rename pte_mkpresent() as pte_mkvalid()
  arm64/sysreg: Update ID_AA64ISAR2_EL1 to DDI0601 2024-09
  arm64/sysreg: Update ID_AA64ZFR0_EL1 to DDI0601 2024-09
  arm64/sysreg: Update ID_AA64FPFR0_EL1 to DDI0601 2024-09
  ...
2025-01-20 21:21:49 -08:00
Linus Torvalds
fadc3ed9ce Merge tag 'execve-v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve updates from Kees Cook:

 - fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case (Tycho
   Andersen, Kees Cook)

 - binfmt_misc: Fix comment typos (Christophe JAILLET)

 - move empty argv[0] warning closer to actual logic (Nir Lichtman)

 - remove legacy custom binfmt modules autoloading (Nir Lichtman)

 - Make sure set_task_comm() always NUL-terminates

 - binfmt_flat: Fix integer overflow bug on 32 bit systems (Dan
   Carpenter)

 - coredump: Do not lock when copying "comm"

 - MAINTAINERS: add auxvec.h and set myself as maintainer

* tag 'execve-v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  binfmt_flat: Fix integer overflow bug on 32 bit systems
  selftests/exec: add a test for execveat()'s comm
  exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case
  exec: Make sure task->comm is always NUL-terminated
  exec: remove legacy custom binfmt modules autoloading
  exec: move warning of null argv to be next to the relevant code
  fs: binfmt: Fix a typo
  MAINTAINERS: exec: Mark Kees as maintainer
  MAINTAINERS: exec: Add auxvec.h UAPI
  coredump: Do not lock during 'comm' reporting
2025-01-20 13:27:58 -08:00
Rafael J. Wysocki
1c91c99075 Merge branch 'pm-tools'
Merge cpupower utility updates for 6.14:

 - Fix TSC MHz calculation in cpupower (He Rongguang).

 - Add install and uninstall options to bindings Makefile and add header
   changes for cpufreq.h to SWIG bindings in cpupower (John B. Wyatt IV).

 - Add missing residency header changes in cpuidle.h to SWIG bindings in
   cpupower (John B. Wyatt IV).

 - Add output files to .gitignore and clean them up in "make clean" in
   selftests/cpufreq (Li Zhijian).

 - Fix cross-compilation in cpupower Makefile (Peng Fan).

 - Revise the is_valid flag handling for idle_monitor in the cpupower
   utility (wangfushuai).

 - Extend and clean up AMD processors support in cpupower (Mario
   Limonciello).

* pm-tools:
  pm: cpupower: Add missing residency header changes in cpuidle.h to SWIG
  pm: cpupower: Add header changes for cpufreq.h to SWIG bindings
  pm: cpupower: Add install and uninstall options to bindings makefile
  cpupower: Adjust whitespace for amd-pstate specific prints
  cpupower: Don't fetch maximum latency when EPP is enabled
  cpupower: Add support for showing energy performance preference
  cpupower: Don't try to read frequency from hardware when kernel uses aperfmperf
  cpupower: Add support for amd-pstate preferred core rankings
  cpupower: Add support for parsing 'enabled' or 'disabled' strings from table
  cpupower: Remove spurious return statement
  cpupower: fix TSC MHz calculation
  cpupower: revise is_valid flag handling for idle_monitor
  pm: cpupower: Makefile: Fix cross compilation
  selftests/cpufreq: gitignore output files and clean them in make clean
2025-01-20 21:01:50 +01:00
Liu Ye
3a0b7fa095 selftests/net/ipsec: Fix Null pointer dereference in rtattr_pack()
Address Null pointer dereference / undefined behavior in rtattr_pack
(note that size is 0 in the bad case).

Flagged by cppcheck as:
    tools/testing/selftests/net/ipsec.c:230:25: warning: Possible null pointer
    dereference: payload [nullPointer]
    memcpy(RTA_DATA(attr), payload, size);
                           ^
    tools/testing/selftests/net/ipsec.c:1618:54: note: Calling function 'rtattr_pack',
    4th argument 'NULL' value is 0
    if (rtattr_pack(&req.nh, sizeof(req), XFRMA_IF_ID, NULL, 0)) {
                                                       ^
    tools/testing/selftests/net/ipsec.c:230:25: note: Null pointer dereference
    memcpy(RTA_DATA(attr), payload, size);
                           ^
Signed-off-by: Liu Ye <liuye@kylinos.cn>

Link: https://patch.msgid.link/20250116013037.29470-1-liuye@kylinos.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20 11:25:25 -08:00
Linus Torvalds
100ceb4817 Merge tag 'vfs-6.14-rc1.mount.v2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount updates from Christian Brauner:

 - Add a mountinfo program to demonstrate statmount()/listmount()

   Add a new "mountinfo" sample userland program that demonstrates how
   to use statmount() and listmount() to get at the same info that
   /proc/pid/mountinfo provides

 - Remove pointless nospec.h include

 - Prepend statmount.mnt_opts string with security_sb_mnt_opts()

   Currently these mount options aren't accessible via statmount()

 - Add new mount namespaces to mount namespace rbtree outside of the
   namespace semaphore

 - Lockless mount namespace lookup

   Currently we take the read lock when looking for a mount namespace to
   list mounts in. We can make this lockless. The simple search case can
   just use a sequence counter to detect concurrent changes to the
   rbtree

   For walking the list of mount namespaces sequentially via nsfs we
   keep a separate rcu list as rb_prev() and rb_next() aren't usable
   safely with rcu. Currently there is no primitive for retrieving the
   previous list member. To do this we need a new deletion primitive
   that doesn't poison the prev pointer and a corresponding retrieval
   helper

   Since creating mount namespaces is a relatively rare event compared
   with querying mounts in a foreign mount namespace this is worth it.
   Once libmount and systemd pick up this mechanism to list mounts in
   foreign mount namespaces this will be used very frequently

     - Add extended selftests for lockless mount namespace iteration

     - Add a sample program to list all mounts on the system, i.e., in
       all mount namespaces

 - Improve mount namespace iteration performance

   Make finding the last or first mount to start iterating the mount
   namespace from an O(1) operation and add selftests for iterating the
   mount table starting from the first and last mount

 - Use an xarray for the old mount id

   While the ida does use the xarray internally we can use it explicitly
   which allows us to increment the unique mount id under the xa lock.
   This allows us to remove the atomic as we're now allocating both ids
   in one go

 - Use a shared header for vfs sample programs

 - Fix build warnings for new sample program to list all mounts

* tag 'vfs-6.14-rc1.mount.v2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  samples/vfs: fix build warnings
  samples/vfs: use shared header
  samples/vfs/mountinfo: Use __u64 instead of uint64_t
  fs: remove useless lockdep assertion
  fs: use xarray for old mount id
  selftests: add listmount() iteration tests
  fs: cache first and last mount
  samples: add test-list-all-mounts
  selftests: remove unneeded include
  selftests: add tests for mntns iteration
  seltests: move nsfs into filesystems subfolder
  fs: simplify rwlock to spinlock
  fs: lockless mntns lookup for nsfs
  rculist: add list_bidir_{del,prev}_rcu()
  fs: lockless mntns rbtree lookup
  fs: add mount namespace to rbtree late
  fs: prepend statmount.mnt_opts string with security_sb_mnt_opts()
  mount: remove inlude/nospec.h include
  samples: add a mountinfo program to demonstrate statmount()/listmount()
2025-01-20 10:44:51 -08:00
Linus Torvalds
1a89a6924b Merge tag 'kernel-6.14-rc1.pid' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull pid_max namespacing update from Christian Brauner:
 "The pid_max sysctl is a global value. For a long time the default
  value has been 65535 and during the pidfd dicussions Linus proposed to
  bump pid_max by default. Based on this discussion systemd started
  bumping pid_max to 2^22. So all new systems now run with a very high
  pid_max limit with some distros having also backported that change.

  The decision to bump pid_max is obviously correct. It just doesn't
  make a lot of sense nowadays to enforce such a low pid number. There's
  sufficient tooling to make selecting specific processes without typing
  really large pid numbers available.

  In any case, there are workloads that have expections about how large
  pid numbers they accept. Either for historical reasons or
  architectural reasons. One concreate example is the 32-bit version of
  Android's bionic libc which requires pid numbers less than 65536.
  There are workloads where it is run in a 32-bit container on a 64-bit
  kernel. If the host has a pid_max value greater than 65535 the libc
  will abort thread creation because of size assumptions of
  pthread_mutex_t.

  That's a fairly specific use-case however, in general specific
  workloads that are moved into containers running on a host with a new
  kernel and a new systemd can run into issues with large pid_max
  values. Obviously making assumptions about the size of the allocated
  pid is suboptimal but we have userspace that does it.

  Of course, giving containers the ability to restrict the number of
  processes in their respective pid namespace indepent of the global
  limit through pid_max is something desirable in itself and comes in
  handy in general.

  Independent of motivating use-cases the existence of pid namespaces
  makes this also a good semantical extension and there have been prior
  proposals pushing in a similar direction. The trick here is to
  minimize the risk of regressions which I think is doable. The fact
  that pid namespaces are hierarchical will help us here.

  What we mostly care about is that when the host sets a low pid_max
  limit, say (crazy number) 100 that no descendant pid namespace can
  allocate a higher pid number in its namespace. Since pid allocation is
  hierarchial this can be ensured by checking each pid allocation
  against the pid namespace's pid_max limit. This means if the
  allocation in the descendant pid namespace succeeds, the ancestor pid
  namespace can reject it. If the ancestor pid namespace has a higher
  limit than the descendant pid namespace the descendant pid namespace
  will reject the pid allocation. The ancestor pid namespace will
  obviously not care about this.

  All in all this means pid_max continues to enforce a system wide limit
  on the number of processes but allows pid namespaces sufficient leeway
  in handling workloads with assumptions about pid values and allows
  containers to restrict the number of processes in a pid namespace
  through the pid_max interface"

* tag 'kernel-6.14-rc1.pid' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  tests/pid_namespace: add pid_max tests
  pid: allow pid_max to be set per pid namespace
2025-01-20 10:29:11 -08:00
Linus Torvalds
5f85bd6aec Merge tag 'vfs-6.14-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull pidfs updates from Christian Brauner:

 - Rework inode number allocation

   Recently we received a patchset that aims to enable file handle
   encoding and decoding via name_to_handle_at(2) and
   open_by_handle_at(2).

   A crucical step in the patch series is how to go from inode number to
   struct pid without leaking information into unprivileged contexts.
   The issue is that in order to find a struct pid the pid number in the
   initial pid namespace must be encoded into the file handle via
   name_to_handle_at(2).

   This can be used by containers using a separate pid namespace to
   learn what the pid number of a given process in the initial pid
   namespace is. While this is a weak information leak it could be used
   in various exploits and in general is an ugly wart in the design.

   To solve this problem a new way is needed to lookup a struct pid
   based on the inode number allocated for that struct pid. The other
   part is to remove the custom inode number allocation on 32bit systems
   that is also an ugly wart that should go away.

   Allocate unique identifiers for struct pid by simply incrementing a
   64 bit counter and insert each struct pid into the rbtree so it can
   be looked up to decode file handles avoiding to leak actual pids
   across pid namespaces in file handles.

   On both 64 bit and 32 bit the same 64 bit identifier is used to
   lookup struct pid in the rbtree. On 64 bit the unique identifier for
   struct pid simply becomes the inode number. Comparing two pidfds
   continues to be as simple as comparing inode numbers.

   On 32 bit the 64 bit number assigned to struct pid is split into two
   32 bit numbers. The lower 32 bits are used as the inode number and
   the upper 32 bits are used as the inode generation number. Whenever a
   wraparound happens on 32 bit the 64 bit number will be incremented by
   2 so inode numbering starts at 2 again.

   When a wraparound happens on 32 bit multiple pidfds with the same
   inode number are likely to exist. This isn't a problem since before
   pidfs pidfds used the anonymous inode meaning all pidfds had the same
   inode number. On 32 bit sserspace can thus reconstruct the 64 bit
   identifier by retrieving both the inode number and the inode
   generation number to compare, or use file handles. This gives the
   same guarantees on both 32 bit and 64 bit.

 - Implement file handle support

   This is based on custom export operation methods which allows pidfs
   to implement permission checking and opening of pidfs file handles
   cleanly without hacking around in the core file handle code too much.

 - Support bind-mounts

   Allow bind-mounting pidfds. Similar to nsfs let's allow bind-mounts
   for pidfds. This allows pidfds to be safely recovered and checked for
   process recycling.

   Instead of checking d_ops for both nsfs and pidfs we could in a
   follow-up patch add a flag argument to struct dentry_operations that
   functions similar to file_operations->fop_flags.

* tag 'vfs-6.14-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  selftests: add pidfd bind-mount tests
  pidfs: allow bind-mounts
  pidfs: lookup pid through rbtree
  selftests/pidfd: add pidfs file handle selftests
  pidfs: check for valid ioctl commands
  pidfs: implement file handle support
  exportfs: add permission method
  fhandle: pull CAP_DAC_READ_SEARCH check into may_decode_fh()
  exportfs: add open method
  fhandle: simplify error handling
  pseudofs: add support for export_ops
  pidfs: support FS_IOC_GETVERSION
  pidfs: remove 32bit inode number handling
  pidfs: rework inode number allocation
2025-01-20 09:59:00 -08:00
Yonghong Song
14a627fe79 selftests/bpf: Add some tests related to 'may_goto 0' insns
Add both asm-based and C-based tests which have 'may_goto 0' insns.

For the following code in C-based test,
   int i, tmp[3];
   for (i = 0; i < 3 && can_loop; i++)
       tmp[i] = 0;

The clang compiler (clang 19 and 20) generates
   may_goto 2
   may_goto 1
   may_goto 0
   r1 = 0
   r2 = 0
   r3 = 0

The above asm codes are due to llvm pass SROAPass. This ensures the
successful verification since tmp[0-2] are initialized.  Otherwise,
the code without SROAPass like
   may_goto 5
   r1 = 0
   may_goto 3
   r2 = 0
   may_goto 1
   r3 = 0
will have verification failure.

Although from the source code C-based test should have verification
failure, clang compiler optimization generates code with successful
verification. If gcc generates different asm codes than clang, the
following code can be used for gcc:
   int i, tmp[3];
   for (i = 0; i < 3; i++)
       tmp[i] = 0;

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250118192034.2124952-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-01-20 09:47:14 -08:00
Linus Torvalds
4b84a4c8d4 Merge tag 'vfs-6.14-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
 "Features:

   - Support caching symlink lengths in inodes

     The size is stored in a new union utilizing the same space as
     i_devices, thus avoiding growing the struct or taking up any more
     space

     When utilized it dodges strlen() in vfs_readlink(), giving about
     1.5% speed up when issuing readlink on /initrd.img on ext4

   - Add RWF_DONTCACHE iocb and FOP_DONTCACHE file_operations flag

     If a file system supports uncached buffered IO, it may set
     FOP_DONTCACHE and enable support for RWF_DONTCACHE.

     If RWF_DONTCACHE is attempted without the file system supporting
     it, it'll get errored with -EOPNOTSUPP

   - Enable VBOXGUEST and VBOXSF_FS on ARM64

     Now that VirtualBox is able to run as a host on arm64 (e.g. the
     Apple M3 processors) we can enable VBOXSF_FS (and in turn
     VBOXGUEST) for this architecture.

     Tested with various runs of bonnie++ and dbench on an Apple MacBook
     Pro with the latest Virtualbox 7.1.4 r165100 installed

  Cleanups:

   - Delay sysctl_nr_open check in expand_files()

   - Use kernel-doc includes in fiemap docbook

   - Use page->private instead of page->index in watch_queue

   - Use a consume fence in mnt_idmap() as it's heavily used in
     link_path_walk()

   - Replace magic number 7 with ARRAY_SIZE() in fc_log

   - Sort out a stale comment about races between fd alloc and dup2()

   - Fix return type of do_mount() from long to int

   - Various cosmetic cleanups for the lockref code

  Fixes:

   - Annotate spinning as unlikely() in __read_seqcount_begin

     The annotation already used to be there, but got lost in commit
     52ac39e5db ("seqlock: seqcount_t: Implement all read APIs as
     statement expressions")

   - Fix proc_handler for sysctl_nr_open

   - Flush delayed work in delayed fput()

   - Fix grammar and spelling in propagate_umount()

   - Fix ESP not readable during coredump

     In /proc/PID/stat, there is the kstkesp field which is the stack
     pointer of a thread. While the thread is active, this field reads
     zero. But during a coredump, it should have a valid value

     However, at the moment, kstkesp is zero even during coredump

   - Don't wake up the writer if the pipe is still full

   - Fix unbalanced user_access_end() in select code"

* tag 'vfs-6.14-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (28 commits)
  gfs2: use lockref_init for qd_lockref
  erofs: use lockref_init for pcl->lockref
  dcache: use lockref_init for d_lockref
  lockref: add a lockref_init helper
  lockref: drop superfluous externs
  lockref: use bool for false/true returns
  lockref: improve the lockref_get_not_zero description
  lockref: remove lockref_put_not_zero
  fs: Fix return type of do_mount() from long to int
  select: Fix unbalanced user_access_end()
  vbox: Enable VBOXGUEST and VBOXSF_FS on ARM64
  pipe_read: don't wake up the writer if the pipe is still full
  selftests: coredump: Add stackdump test
  fs/proc: do_task_stat: Fix ESP not readable during coredump
  fs: add RWF_DONTCACHE iocb and FOP_DONTCACHE file_operations flag
  fs: sort out a stale comment about races between fd alloc and dup2
  fs: Fix grammar and spelling in propagate_umount()
  fs: fc_log replace magic number 7 with ARRAY_SIZE()
  fs: use a consume fence in mnt_idmap()
  file: flush delayed work in delayed fput()
  ...
2025-01-20 09:40:49 -08:00
Hou Tao
0a5d2efa38 selftests/bpf: Add test case for the freeing of bpf_timer
The main purpose of the test is to demonstrate the lock problem for the
free of bpf_timer under PREEMPT_RT. When freeing a bpf_timer which is
running on other CPU in bpf_timer_cancel_and_free(), hrtimer_cancel()
will try to acquire a spin-lock (namely softirq_expiry_lock), however
the freeing procedure has already held a raw-spin-lock.

The test first creates two threads: one to start timers and the other to
free timers. The start-timers thread will start the timer and then wake
up the free-timers thread to free these timers when the starts complete.
After freeing, the free-timer thread will wake up the start-timer thread
to complete the current iteration. A loop of 10 iterations is used.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250117101816.2101857-6-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-01-20 09:09:02 -08:00
Petr Mladek
49dcb50d6c Merge branch 'for-6.14/selftests-trivial' into for-linus 2025-01-20 14:25:01 +01:00
Paolo Bonzini
43f640f4b9 Merge tag 'kvm-riscv-6.14-1' of https://github.com/kvm-riscv/linux into HEAD
KVM/riscv changes for 6.14

- Svvptc, Zabha, and Ziccrse extension support for Guest/VM
- Virtualize SBI system suspend extension for Guest/VM
- Trap related exit statstics as SBI PMU firmware counters for Guest/VM
2025-01-20 07:01:17 -05:00
Paolo Bonzini
4f7ff70c05 Merge tag 'kvm-x86-misc-6.14' of https://github.com/kvm-x86/linux into HEAD
KVM x86 misc changes for 6.14:

 - Overhaul KVM's CPUID feature infrastructure to track all vCPU capabilities
   instead of just those where KVM needs to manage state and/or explicitly
   enable the feature in hardware.  Along the way, refactor the code to make
   it easier to add features, and to make it more self-documenting how KVM
   is handling each feature.

 - Rework KVM's handling of VM-Exits during event vectoring; this plugs holes
   where KVM unintentionally puts the vCPU into infinite loops in some scenarios
   (e.g. if emulation is triggered by the exit), and brings parity between VMX
   and SVM.

 - Add pending request and interrupt injection information to the kvm_exit and
   kvm_entry tracepoints respectively.

 - Fix a relatively benign flaw where KVM would end up redoing RDPKRU when
   loading guest/host PKRU, due to a refactoring of the kernel helpers that
   didn't account for KVM's pre-checking of the need to do WRPKRU.
2025-01-20 06:49:39 -05:00
Jiri Kosina
670af65d2a Merge branch 'for-6.14/constify-bin-attribute' into for-linus
- constification of 'struct bin_attribute' in various HID driver (Thomas Weißschuh)
2025-01-20 09:58:12 +01:00
James Bottomley
fd3aa3d5e5 selftests/efivarfs: fix tests for failed write removal
The current self tests expect the zero size remnants that failed
variable creation leaves.  Update the tests to verify these are now
absent.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-01-19 17:51:16 +01:00
James Bottomley
8a32d46b20 selftests/efivarfs: add check for disallowing file truncation
Now that the ability of arbitrary writes to set the inode size is
fixed, verify that a variable file accepts a truncation operation but
does not change the stat size because of it.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-01-19 17:49:10 +01:00
Jakub Kicinski
54ea680b75 selftests: net: give up on the cmsg_time accuracy on slow machines
Commit b9d5f5711d ("selftests: net: increase the delay for relative
cmsg_time.sh test") widened the accepted value range 8x but we still
see flakes (at a rate of around 7%).

Return XFAIL for the most timing sensitive test on slow machines.

Before:

  # ./cmsg_time.sh
    Case UDPv4  - TXTIME rel returned '8074us - 7397us < 4000', expected 'OK'
  FAIL - 1/36 cases failed

After:

  # ./cmsg_time.sh
    Case UDPv4  - TXTIME rel returned '1123us - 941us < 500', expected 'OK' (XFAIL)
    Case UDPv6  - TXTIME rel returned '1227us - 776us < 500', expected 'OK' (XFAIL)
  OK

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250116020105.931338-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-17 18:36:14 -08:00
Mickaël Salaün
2a794ee613 selftests/landlock: Add layout1.umount_sandboxer tests
Check that a domain is not tied to the executable file that created it.
For instance, that could happen if a Landlock domain took a reference to
a struct path.

Move global path names to common.h and replace copy_binary() with a more
generic copy_file() helper.

Test coverage for security/landlock is 92.7% of 1133 lines according to
gcc/gcov-14.

Cc: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20250108154338.1129069-23-mic@digikod.net
[mic: Update date and add test coverage]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-01-17 19:05:38 +01:00
Mickaël Salaün
5147779d5e selftests/landlock: Add wrappers.h
Extract syscall wrappers to make them usable by standalone binaries (see
next commit).

Cc: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20250108154338.1129069-22-mic@digikod.net
[mic: Fix comments]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-01-17 19:05:38 +01:00
Mickaël Salaün
2107c35128 selftests/landlock: Fix error message
The global variable errno may not be set in test_execute().  Do not use
it in related error message.

Cc: Günther Noack <gnoack@google.com>
Fixes: e1199815b4 ("selftests/landlock: Add user space tests")
Link: https://lore.kernel.org/r/20250108154338.1129069-21-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-01-17 19:05:37 +01:00
Mickaël Salaün
12264f721f selftests/landlock: Add test to check partial access in a mount tree
Add layout1.refer_part_mount_tree_is_allowed to test the masked logical
issue regarding collect_domain_accesses() calls followed by the
is_access_to_paths_allowed() check in current_check_refer_path().  See
previous commit.

This test should work without the previous fix as well, but it enables
us to make sure future changes will not have impact regarding this
behavior.

Cc: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20250108154338.1129069-13-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-01-17 19:05:36 +01:00
Mickaël Salaün
0e4db4f843 selftests/landlock: Fix build with non-default pthread linking
Old toolchains require explicit -lpthread (e.g. on Debian 11).

Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Tahera Fahimi <fahimitahera@gmail.com>
Fixes: c899496501 ("selftests/landlock: Test signal scoping for threads")
Reviewed-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20250115145409.312226-1-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-01-17 19:05:31 +01:00
Daniel Xu
f932a8e482 bpf: selftests: verifier: Add nullness elision tests
Test that nullness elision works for common use cases. For example, we
want to check that both constant scalar spills and STACK_ZERO functions.
As well as when there's both const and non-const values of R2 leading up
to a lookup. And obviously some bound checks.

Particularly tricky are spills both smaller or larger than key size. For
smaller, we need to ensure verifier doesn't let through a potential read
into unchecked bytes. For larger, endianness comes into play, as the
native endian value tracked in the verifier may not be the bytes the
kernel would have read out of the key pointer. So check that we disallow
both.

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/f1dacaa777d4516a5476162e0ea549f7c3354d73.1736886479.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-01-16 17:51:10 -08:00
Daniel Xu
d2102f2f5d bpf: verifier: Support eliding map lookup nullness
This commit allows progs to elide a null check on statically known map
lookup keys. In other words, if the verifier can statically prove that
the lookup will be in-bounds, allow the prog to drop the null check.

This is useful for two reasons:

1. Large numbers of nullness checks (especially when they cannot fail)
   unnecessarily pushes prog towards BPF_COMPLEXITY_LIMIT_JMP_SEQ.
2. It forms a tighter contract between programmer and verifier.

For (1), bpftrace is starting to make heavier use of percpu scratch
maps. As a result, for user scripts with large number of unrolled loops,
we are starting to hit jump complexity verification errors.  These
percpu lookups cannot fail anyways, as we only use static key values.
Eliding nullness probably results in less work for verifier as well.

For (2), percpu scratch maps are often used as a larger stack, as the
currrent stack is limited to 512 bytes. In these situations, it is
desirable for the programmer to express: "this lookup should never fail,
and if it does, it means I messed up the code". By omitting the null
check, the programmer can "ask" the verifier to double check the logic.

Tests also have to be updated in sync with these changes, as the
verifier is more efficient with this change. Notable, iters.c tests had
to be changed to use a map type that still requires null checks, as it's
exercising verifier tracking logic w.r.t iterators.

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/68f3ea96ff3809a87e502a11a4bd30177fc5823e.1736886479.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-01-16 17:51:10 -08:00
Daniel Xu
37cce22dbd bpf: verifier: Refactor helper access type tracking
Previously, the verifier was treating all PTR_TO_STACK registers passed
to a helper call as potentially written to by the helper. However, all
calls to check_stack_range_initialized() already have precise access type
information available.

Rather than treat ACCESS_HELPER as a proxy for BPF_WRITE, pass
enum bpf_access_type to check_stack_range_initialized() to more
precisely track helper arguments.

One benefit from this precision is that registers tracked as valid
spills and passed as a read-only helper argument remain tracked after
the call.  Rather than being marked STACK_MISC afterwards.

An additional benefit is the verifier logs are also more precise. For
this particular error, users will enjoy a slightly clearer message. See
included selftest updates for examples.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/ff885c0e5859e0cd12077c3148ff0754cad4f7ed.1736886479.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-01-16 17:51:10 -08:00
Jakub Kicinski
3030e3d57b selftests/net: packetdrill: make tcp buf limited timing tests benign
The following tests are failing on debug kernels:

  tcp_tcp_info_tcp-info-rwnd-limited.pkt
  tcp_tcp_info_tcp-info-sndbuf-limited.pkt

with reports like:

      assert 19000 <= tcpi_sndbuf_limited <= 21000, tcpi_sndbuf_limited; \
  AssertionError: 18000

and:

      assert 348000 <= tcpi_busy_time <= 360000, tcpi_busy_time
  AssertionError: 362000

Extend commit 912d6f6697 ("selftests/net: packetdrill: report benign
debug flakes as xfail") to cover them.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250115232129.845884-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-16 17:23:22 -08:00
John Daley
8d20dcda40 selftests: drv-net-hw: inject pp_alloc_fail errors in the right place
The tool pp_alloc_fail.py tested error recovery by injecting errors
into the function page_pool_alloc_pages(). The page pool allocation
function page_pool_dev_alloc() does not end up calling
page_pool_alloc_pages(). page_pool_alloc_netmems() seems to be the
function that is called by all of the page pool alloc functions in
the API, so move error injection to that function instead.

Signed-off-by: John Daley <johndale@cisco.com>
Link: https://patch.msgid.link/20250115181312.3544-2-johndale@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-16 17:18:53 -08:00