Commit Graph

46384 Commits

Author SHA1 Message Date
Linus Torvalds
a49a879f0a Merge tag 'x86-cleanups-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
 "Miscellaneous x86 cleanups by Arnd Bergmann, Charles Han, Mirsad
  Todorovac, Randy Dunlap, Thorsten Blum and Zhang Kunbo"

* tag 'x86-cleanups-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/coco: Replace 'static const cc_mask' with the newly introduced cc_get_mask() function
  x86/delay: Fix inconsistent whitespace
  selftests/x86/syscall: Fix coccinelle WARNING recommending the use of ARRAY_SIZE()
  x86/platform: Fix missing declaration of 'x86_apple_machine'
  x86/irq: Fix missing declaration of 'io_apic_irqs'
  x86/usercopy: Fix kernel-doc func param name in clean_cache_range()'s description
  x86/apic: Use str_disabled_enabled() helper in print_ipi_mode()
2025-03-24 22:39:53 -07:00
Linus Torvalds
71b639af06 Merge tag 'x86-fpu-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/fpu updates from Ingo Molnar:

 - Improve crypto performance by making kernel-mode FPU reliably usable
   in softirqs ((Eric Biggers)

 - Fully optimize out WARN_ON_FPU() (Eric Biggers)

 - Initial steps to support Support Intel APX (Advanced Performance
   Extensions) (Chang S. Bae)

 - Fix KASAN for arch_dup_task_struct() (Benjamin Berg)

 - Refine and simplify the FPU magic number check during signal return
   (Chang S. Bae)

 - Fix inconsistencies in guest FPU xfeatures (Chao Gao, Stanislav
   Spassov)

 - selftests/x86/xstate: Introduce common code for testing extended
   states (Chang S. Bae)

 - Misc fixes and cleanups (Borislav Petkov, Colin Ian King, Uros
   Bizjak)

* tag 'x86-fpu-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu/xstate: Fix inconsistencies in guest FPU xfeatures
  x86/fpu: Clarify the "xa" symbolic name used in the XSTATE* macros
  x86/fpu: Use XSAVE{,OPT,C,S} and XRSTOR{,S} mnemonics in xstate.h
  x86/fpu: Improve crypto performance by making kernel-mode FPU reliably usable in softirqs
  x86/fpu/xstate: Simplify print_xstate_features()
  x86/fpu: Refine and simplify the magic number check during signal return
  selftests/x86/xstate: Fix spelling mistake "hader" -> "header"
  x86/fpu: Avoid copying dynamic FP state from init_task in arch_dup_task_struct()
  vmlinux.lds.h: Remove entry to place init_task onto init_stack
  selftests/x86/avx: Add AVX tests
  selftests/x86/xstate: Clarify supported xstates
  selftests/x86/xstate: Consolidate test invocations into a single entry
  selftests/x86/xstate: Introduce signal ABI test
  selftests/x86/xstate: Refactor ptrace ABI test
  selftests/x86/xstate: Refactor context switching test
  selftests/x86/xstate: Enumerate and name xstate components
  selftests/x86/xstate: Refactor XSAVE helpers for general use
  selftests/x86: Consolidate redundant signal helper functions
  x86/fpu: Fix guest FPU state buffer allocation size
  x86/fpu: Fully optimize out WARN_ON_FPU()
2025-03-24 22:27:18 -07:00
Linus Torvalds
e34c38057a Merge tag 'x86-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core x86 updates from Ingo Molnar:
 "x86 CPU features support:
   - Generate the <asm/cpufeaturemasks.h> header based on build config
     (H. Peter Anvin, Xin Li)
   - x86 CPUID parsing updates and fixes (Ahmed S. Darwish)
   - Introduce the 'setcpuid=' boot parameter (Brendan Jackman)
   - Enable modifying CPU bug flags with '{clear,set}puid=' (Brendan
     Jackman)
   - Utilize CPU-type for CPU matching (Pawan Gupta)
   - Warn about unmet CPU feature dependencies (Sohil Mehta)
   - Prepare for new Intel Family numbers (Sohil Mehta)

  Percpu code:
   - Standardize & reorganize the x86 percpu layout and related cleanups
     (Brian Gerst)
   - Convert the stackprotector canary to a regular percpu variable
     (Brian Gerst)
   - Add a percpu subsection for cache hot data (Brian Gerst)
   - Unify __pcpu_op{1,2}_N() macros to __pcpu_op_N() (Uros Bizjak)
   - Construct __percpu_seg_override from __percpu_seg (Uros Bizjak)

  MM:
   - Add support for broadcast TLB invalidation using AMD's INVLPGB
     instruction (Rik van Riel)
   - Rework ROX cache to avoid writable copy (Mike Rapoport)
   - PAT: restore large ROX pages after fragmentation (Kirill A.
     Shutemov, Mike Rapoport)
   - Make memremap(MEMREMAP_WB) map memory as encrypted by default
     (Kirill A. Shutemov)
   - Robustify page table initialization (Kirill A. Shutemov)
   - Fix flush_tlb_range() when used for zapping normal PMDs (Jann Horn)
   - Clear _PAGE_DIRTY for kernel mappings when we clear _PAGE_RW
     (Matthew Wilcox)

  KASLR:
   - x86/kaslr: Reduce KASLR entropy on most x86 systems, to support PCI
     BAR space beyond the 10TiB region (CONFIG_PCI_P2PDMA=y) (Balbir
     Singh)

  CPU bugs:
   - Implement FineIBT-BHI mitigation (Peter Zijlstra)
   - speculation: Simplify and make CALL_NOSPEC consistent (Pawan Gupta)
   - speculation: Add a conditional CS prefix to CALL_NOSPEC (Pawan
     Gupta)
   - RFDS: Exclude P-only parts from the RFDS affected list (Pawan
     Gupta)

  System calls:
   - Break up entry/common.c (Brian Gerst)
   - Move sysctls into arch/x86 (Joel Granados)

  Intel LAM support updates: (Maciej Wieczor-Retman)
   - selftests/lam: Move cpu_has_la57() to use cpuinfo flag
   - selftests/lam: Skip test if LAM is disabled
   - selftests/lam: Test get_user() LAM pointer handling

  AMD SMN access updates:
   - Add SMN offsets to exclusive region access (Mario Limonciello)
   - Add support for debugfs access to SMN registers (Mario Limonciello)
   - Have HSMP use SMN through AMD_NODE (Yazen Ghannam)

  Power management updates: (Patryk Wlazlyn)
   - Allow calling mwait_play_dead with an arbitrary hint
   - ACPI/processor_idle: Add FFH state handling
   - intel_idle: Provide the default enter_dead() handler
   - Eliminate mwait_play_dead_cpuid_hint()

  Build system:
   - Raise the minimum GCC version to 8.1 (Brian Gerst)
   - Raise the minimum LLVM version to 15.0.0 (Nathan Chancellor)

  Kconfig: (Arnd Bergmann)
   - Add cmpxchg8b support back to Geode CPUs
   - Drop 32-bit "bigsmp" machine support
   - Rework CONFIG_GENERIC_CPU compiler flags
   - Drop configuration options for early 64-bit CPUs
   - Remove CONFIG_HIGHMEM64G support
   - Drop CONFIG_SWIOTLB for PAE
   - Drop support for CONFIG_HIGHPTE
   - Document CONFIG_X86_INTEL_MID as 64-bit-only
   - Remove old STA2x11 support
   - Only allow CONFIG_EISA for 32-bit

  Headers:
   - Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI and non-UAPI
     headers (Thomas Huth)

  Assembly code & machine code patching:
   - x86/alternatives: Simplify alternative_call() interface (Josh
     Poimboeuf)
   - x86/alternatives: Simplify callthunk patching (Peter Zijlstra)
   - KVM: VMX: Use named operands in inline asm (Josh Poimboeuf)
   - x86/hyperv: Use named operands in inline asm (Josh Poimboeuf)
   - x86/traps: Cleanup and robustify decode_bug() (Peter Zijlstra)
   - x86/kexec: Merge x86_32 and x86_64 code using macros from
     <asm/asm.h> (Uros Bizjak)
   - Use named operands in inline asm (Uros Bizjak)
   - Improve performance by using asm_inline() for atomic locking
     instructions (Uros Bizjak)

  Earlyprintk:
   - Harden early_serial (Peter Zijlstra)

  NMI handler:
   - Add an emergency handler in nmi_desc & use it in
     nmi_shootdown_cpus() (Waiman Long)

  Miscellaneous fixes and cleanups:
   - by Ahmed S. Darwish, Andy Shevchenko, Ard Biesheuvel, Artem
     Bityutskiy, Borislav Petkov, Brendan Jackman, Brian Gerst, Dan
     Carpenter, Dr. David Alan Gilbert, H. Peter Anvin, Ingo Molnar,
     Josh Poimboeuf, Kevin Brodsky, Mike Rapoport, Lukas Bulwahn, Maciej
     Wieczor-Retman, Max Grobecker, Patryk Wlazlyn, Pawan Gupta, Peter
     Zijlstra, Philip Redkin, Qasim Ijaz, Rik van Riel, Thomas Gleixner,
     Thorsten Blum, Tom Lendacky, Tony Luck, Uros Bizjak, Vitaly
     Kuznetsov, Xin Li, liuye"

* tag 'x86-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (211 commits)
  zstd: Increase DYNAMIC_BMI2 GCC version cutoff from 4.8 to 11.0 to work around compiler segfault
  x86/asm: Make asm export of __ref_stack_chk_guard unconditional
  x86/mm: Only do broadcast flush from reclaim if pages were unmapped
  perf/x86/intel, x86/cpu: Replace Pentium 4 model checks with VFM ones
  perf/x86/intel, x86/cpu: Simplify Intel PMU initialization
  x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-UAPI headers
  x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI headers
  x86/locking/atomic: Improve performance by using asm_inline() for atomic locking instructions
  x86/asm: Use asm_inline() instead of asm() in clwb()
  x86/asm: Use CLFLUSHOPT and CLWB mnemonics in <asm/special_insns.h>
  x86/hweight: Use asm_inline() instead of asm()
  x86/hweight: Use ASM_CALL_CONSTRAINT in inline asm()
  x86/hweight: Use named operands in inline asm()
  x86/stackprotector/64: Only export __ref_stack_chk_guard on CONFIG_SMP
  x86/head/64: Avoid Clang < 17 stack protector in startup code
  x86/kexec: Merge x86_32 and x86_64 code using macros from <asm/asm.h>
  x86/runtime-const: Add the RUNTIME_CONST_PTR assembly macro
  x86/cpu/intel: Limit the non-architectural constant_tsc model checks
  x86/mm/pat: Replace Intel x86_model checks with VFM ones
  x86/cpu/intel: Fix fast string initialization for extended Families
  ...
2025-03-24 22:06:11 -07:00
Linus Torvalds
327ecdbc0f Merge tag 'perf-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull performance events updates from Ingo Molnar:
 "Core:
   - Move perf_event sysctls into kernel/events/ (Joel Granados)
   - Use POLLHUP for pinned events in error (Namhyung Kim)
   - Avoid the read if the count is already updated (Peter Zijlstra)
   - Allow the EPOLLRDNORM flag for poll (Tao Chen)
   - locking/percpu-rwsem: Add guard support [ NOTE: this got
     (mis-)merged into the perf tree due to related work ] (Peter
     Zijlstra)

  perf_pmu_unregister() related improvements: (Peter Zijlstra)
   - Simplify the perf_event_alloc() error path
   - Simplify the perf_pmu_register() error path
   - Simplify perf_pmu_register()
   - Simplify perf_init_event()
   - Simplify perf_event_alloc()
   - Merge struct pmu::pmu_disable_count into struct
     perf_cpu_pmu_context::pmu_disable_count
   - Add this_cpc() helper
   - Introduce perf_free_addr_filters()
   - Robustify perf_event_free_bpf_prog()
   - Simplify the perf_mmap() control flow
   - Further simplify perf_mmap()
   - Remove retry loop from perf_mmap()
   - Lift event->mmap_mutex in perf_mmap()
   - Detach 'struct perf_cpu_pmu_context' and 'struct pmu' lifetimes
   - Fix perf_mmap() failure path

  Uprobes:
   - Harden x86 uretprobe syscall trampoline check (Jiri Olsa)
   - Remove redundant spinlock in uprobe_deny_signal() (Liao Chang)
   - Remove the spinlock within handle_singlestep() (Liao Chang)

  x86 Intel PMU enhancements:
   - Support PEBS counters snapshotting (Kan Liang)
   - Fix intel_pmu_read_event() (Kan Liang)
   - Extend per event callchain limit to branch stack (Kan Liang)
   - Fix system-wide LBR profiling (Kan Liang)
   - Allocate bts_ctx only if necessary (Li RongQing)
   - Apply static call for drain_pebs (Peter Zijlstra)

  x86 AMD PMU enhancements: (Ravi Bangoria)
   - Remove pointless sample period check
   - Fix ->config to sample period calculation for OP PMU
   - Fix perf_ibs_op.cnt_mask for CurCnt
   - Don't allow freq mode event creation through ->config interface
   - Add PMU specific minimum period
   - Add ->check_period() callback
   - Ceil sample_period to min_period
   - Add support for OP Load Latency Filtering
   - Update DTLB/PageSize decode logic

  Hardware breakpoints:
   - Return EOPNOTSUPP for unsupported breakpoint type (Saket Kumar
     Bhaskar)

  Hardlockup detector improvements: (Li Huafei)
   - perf_event memory leak
   - Warn if watchdog_ev is leaked

  Fixes and cleanups:
   - Misc fixes and cleanups (Andy Shevchenko, Kan Liang, Peter
     Zijlstra, Ravi Bangoria, Thorsten Blum, XieLudan)"

* tag 'perf-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits)
  perf: Fix __percpu annotation
  perf: Clean up pmu specific data
  perf/x86: Remove swap_task_ctx()
  perf/x86/lbr: Fix shorter LBRs call stacks for the system-wide mode
  perf: Supply task information to sched_task()
  perf: attach/detach PMU specific data
  locking/percpu-rwsem: Add guard support
  perf: Save PMU specific data in task_struct
  perf: Extend per event callchain limit to branch stack
  perf/ring_buffer: Allow the EPOLLRDNORM flag for poll
  perf/core: Use POLLHUP for pinned events in error
  perf/core: Use sysfs_emit() instead of scnprintf()
  perf/core: Remove optional 'size' arguments from strscpy() calls
  perf/x86/intel/bts: Check if bts_ctx is allocated when calling BTS functions
  uprobes/x86: Harden uretprobe syscall trampoline check
  watchdog/hardlockup/perf: Warn if watchdog_ev is leaked
  watchdog/hardlockup/perf: Fix perf_event memory leak
  perf/x86: Annotate struct bts_buffer::buf with __counted_by()
  perf/core: Clean up perf_try_init_event()
  perf/core: Fix perf_mmap() failure path
  ...
2025-03-24 21:46:36 -07:00
Linus Torvalds
32b22538be Merge tag 'sched-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "Core & fair scheduler changes:

   - Cancel the slice protection of the idle entity (Zihan Zhou)
   - Reduce the default slice to avoid tasks getting an extra tick
     (Zihan Zhou)
   - Force propagating min_slice of cfs_rq when {en,de}queue tasks
     (Tianchen Ding)
   - Refactor can_migrate_task() to elimate looping (I Hsin Cheng)
   - Add unlikey branch hints to several system calls (Colin Ian King)
   - Optimize current_clr_polling() on certain architectures (Yujun
     Dong)

  Deadline scheduler: (Juri Lelli)
   - Remove redundant dl_clear_root_domain call
   - Move dl_rebuild_rd_accounting to cpuset.h

  Uclamp:
   - Use the uclamp_is_used() helper instead of open-coding it (Xuewen
     Yan)
   - Optimize sched_uclamp_used static key enabling (Xuewen Yan)

  Scheduler topology support: (Juri Lelli)
   - Ignore special tasks when rebuilding domains
   - Add wrappers for sched_domains_mutex
   - Generalize unique visiting of root domains
   - Rebuild root domain accounting after every update
   - Remove partition_and_rebuild_sched_domains
   - Stop exposing partition_sched_domains_locked

  RSEQ: (Michael Jeanson)
   - Update kernel fields in lockstep with CONFIG_DEBUG_RSEQ=y
   - Fix segfault on registration when rseq_cs is non-zero
   - selftests: Add rseq syscall errors test
   - selftests: Ensure the rseq ABI TLS is actually 1024 bytes

  Membarriers:
   - Fix redundant load of membarrier_state (Nysal Jan K.A.)

  Scheduler debugging:
   - Introduce and use preempt_model_str() (Sebastian Andrzej Siewior)
   - Make CONFIG_SCHED_DEBUG unconditional (Ingo Molnar)

  Fixes and cleanups:
   - Always save/restore x86 TSC sched_clock() on suspend/resume
     (Guilherme G. Piccoli)
   - Misc fixes and cleanups (Thorsten Blum, Juri Lelli, Sebastian
     Andrzej Siewior)"

* tag 'sched-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
  cpuidle, sched: Use smp_mb__after_atomic() in current_clr_polling()
  sched/debug: Remove CONFIG_SCHED_DEBUG
  sched/debug: Remove CONFIG_SCHED_DEBUG from self-test config files
  sched/debug, Documentation: Remove (most) CONFIG_SCHED_DEBUG references from documentation
  sched/debug: Make CONFIG_SCHED_DEBUG functionality unconditional
  sched/debug: Make 'const_debug' tunables unconditional __read_mostly
  sched/debug: Change SCHED_WARN_ON() to WARN_ON_ONCE()
  rseq/selftests: Fix namespace collision with rseq UAPI header
  include/{topology,cpuset}: Move dl_rebuild_rd_accounting to cpuset.h
  sched/topology: Stop exposing partition_sched_domains_locked
  cgroup/cpuset: Remove partition_and_rebuild_sched_domains
  sched/topology: Remove redundant dl_clear_root_domain call
  sched/deadline: Rebuild root domain accounting after every update
  sched/deadline: Generalize unique visiting of root domains
  sched/topology: Wrappers for sched_domains_mutex
  sched/deadline: Ignore special tasks when rebuilding domains
  tracing: Use preempt_model_str()
  xtensa: Rely on generic printing of preemption model
  x86: Rely on generic printing of preemption model
  s390: Rely on generic printing of preemption model
  ...
2025-03-24 21:28:12 -07:00
Linus Torvalds
5a658afd46 Merge tag 'objtool-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool updates from Ingo Molnar:

 - The biggest change is the new option to automatically fail the build
   on objtool warnings: CONFIG_OBJTOOL_WERROR.

   While there are no currently known unfixed false positives left, such
   an expansion in the severity of objtool warnings inevitably creates a
   risk of build failures, so it's disabled by default and depends on
   !COMPILE_TEST, so it shouldn't be enabled on
   allyesconfig/allmodconfig builds and won't be forced on people who
   just accept build-time defaults in 'make oldconfig'.

   While the option is strongly recommended, only people who enable it
   explicitly should see it.

   (Josh Poimboeuf)

 - Disable branch profiling in noinstr code with a broad brush that
   includes all of arch/x86/ and kernel/sched/. (Josh Poimboeuf)

 - Create backup object files on objtool errors and print exact objtool
   arguments to make failure analysis easier (Josh Poimboeuf)

 - Improve noreturn handling (Josh Poimboeuf)

 - Improve rodata handling (Tiezhu Yang)

 - Support jump tables, switch tables and goto tables on LoongArch
   (Tiezhu Yang)

 - Misc cleanups and fixes (Josh Poimboeuf, David Engraf, Ingo Molnar)

* tag 'objtool-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
  tracing: Disable branch profiling in noinstr code
  objtool: Use O_CREAT with explicit mode mask
  objtool: Add CONFIG_OBJTOOL_WERROR
  objtool: Create backup on error and print args
  objtool: Change "warning:" to "error:" for --Werror
  objtool: Add --Werror option
  objtool: Add --output option
  objtool: Upgrade "Linked object detected" warning to error
  objtool: Consolidate option validation
  objtool: Remove --unret dependency on --rethunk
  objtool: Increase per-function WARN_FUNC() rate limit
  objtool: Update documentation
  objtool: Improve __noreturn annotation warning
  objtool: Fix error handling inconsistencies in check()
  x86/traps: Make exc_double_fault() consistently noreturn
  LoongArch: Enable jump table for objtool
  objtool/LoongArch: Add support for goto table
  objtool/LoongArch: Add support for switch table
  objtool: Handle PC relative relocation type
  objtool: Handle different entry size of rodata
  ...
2025-03-24 21:18:05 -07:00
Linus Torvalds
3ba7dfb8da Merge tag 'rcu-next-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux
Pull RCU updates from Boqun Feng:
 "Documentation:
   - Add broken-timing possibility to stallwarn.rst
   - Improve discussion of this_cpu_ptr(), add raw_cpu_ptr()
   - Document self-propagating callbacks
   - Point call_srcu() to call_rcu() for detailed memory ordering
   - Add CONFIG_RCU_LAZY delays to call_rcu() kernel-doc header
   - Clarify RCU_LAZY and RCU_LAZY_DEFAULT_OFF help text
   - Remove references to old grace-period-wait primitives

  srcu:
   - Introduce srcu_read_{un,}lock_fast(), which is similar to
     srcu_read_{un,}lock_lite(): avoid smp_mb()s in lock and unlock
     at the cost of calling synchronize_rcu() in synchronize_srcu()

     Moreover, by returning the percpu offset of the counter at
     srcu_read_lock_fast() time, srcu_read_unlock_fast() can avoid
     extra pointer dereferencing, which makes it faster than
     srcu_read_{un,}lock_lite()

     srcu_read_{un,}lock_fast() are intended to replace
     rcu_read_{un,}lock_trace() if possible

  RCU torture:
   - Add get_torture_init_jiffies() to return the start time of the test
   - Add a test_boost_holdoff module parameter to allow delaying
     boosting tests when building rcutorture as built-in
   - Add grace period sequence number logging at the beginning and end
     of failure/close-call results
   - Switch to hexadecimal for the expedited grace period sequence
     number in the rcu_exp_grace_period trace point
   - Make cur_ops->format_gp_seqs take buffer length
   - Move RCU_TORTURE_TEST_{CHK_RDR_STATE,LOG_CPU} to bool
   - Complain when invalid SRCU reader_flavor is specified
   - Add FORCE_NEED_SRCU_NMI_SAFE Kconfig for testing, which forces SRCU
     uses atomics even when percpu ops are NMI safe, and use the Kconfig
     for SRCU lockdep testing

  Misc:
   - Split rcu_report_exp_cpu_mult() mask parameter and use for tracing
   - Remove READ_ONCE() for rdp->gpwrap access in __note_gp_changes()
   - Fix get_state_synchronize_rcu_full() GP-start detection
   - Move RCU Tasks self-tests to core_initcall()
   - Print segment lengths in show_rcu_nocb_gp_state()
   - Make RCU watch ct_kernel_exit_state() warning
   - Flush console log from kernel_power_off()
   - rcutorture: Allow a negative value for nfakewriters
   - rcu: Update TREE05.boot to test normal synchronize_rcu()
   - rcu: Use _full() API to debug synchronize_rcu()

  Make RCU handle PREEMPT_LAZY better:
   - Fix header guard for rcu_all_qs()
   - rcu: Rename PREEMPT_AUTO to PREEMPT_LAZY
   - Update __cond_resched comment about RCU quiescent states
   - Handle unstable rdp in rcu_read_unlock_strict()
   - Handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y
   - osnoise: Provide quiescent states
   - Adjust rcutorture with possible PREEMPT_RCU=n && PREEMPT_COUNT=y
     combination
   - Limit PREEMPT_RCU configurations
   - Make rcutorture senario TREE07 and senario TREE10 use
     PREEMPT_LAZY=y"

* tag 'rcu-next-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (59 commits)
  rcutorture: Make scenario TREE07 build CONFIG_PREEMPT_LAZY=y
  rcutorture: Make scenario TREE10 build CONFIG_PREEMPT_LAZY=y
  rcu: limit PREEMPT_RCU configurations
  rcutorture: Update ->extendables check for lazy preemption
  rcutorture: Update rcutorture_one_extend_check() for lazy preemption
  osnoise: provide quiescent states
  rcu: Use _full() API to debug synchronize_rcu()
  rcu: Update TREE05.boot to test normal synchronize_rcu()
  rcutorture: Allow a negative value for nfakewriters
  Flush console log from kernel_power_off()
  context_tracking: Make RCU watch ct_kernel_exit_state() warning
  rcu/nocb: Print segment lengths in show_rcu_nocb_gp_state()
  rcu-tasks: Move RCU Tasks self-tests to core_initcall()
  rcu: Fix get_state_synchronize_rcu_full() GP-start detection
  torture: Make SRCU lockdep testing use srcu_read_lock_nmisafe()
  srcu: Add FORCE_NEED_SRCU_NMI_SAFE Kconfig for testing
  rcutorture: Complain when invalid SRCU reader_flavor is specified
  rcutorture: Move RCU_TORTURE_TEST_{CHK_RDR_STATE,LOG_CPU} to bool
  rcutorture: Make cur_ops->format_gp_seqs take buffer length
  rcutorture: Add ftrace-compatible timestamp to GP# failure/close-call output
  ...
2025-03-24 19:41:37 -07:00
Linus Torvalds
2f2d529458 Merge tag 'bitmap-for-6.15' of https://github.com/norov/linux
Pull bitmap updates from Yury Norov:

 - cpumask_next_wrap() rework (me)

 - GENMASK() simplification (I Hsin)

 - rust bindings for cpumasks (Viresh and me)

 - scattered cleanups (Andy, Tamir, Vincent, Ignacio and Joel)

* tag 'bitmap-for-6.15' of https://github.com/norov/linux: (22 commits)
  cpumask: align text in comment
  riscv: fix test_and_{set,clear}_bit ordering documentation
  treewide: fix typo 'unsigned __init128' -> 'unsigned __int128'
  MAINTAINERS: add rust bindings entry for bitmap API
  rust: Add cpumask helpers
  uapi: Revert "bitops: avoid integer overflow in GENMASK(_ULL)"
  cpumask: drop cpumask_next_wrap_old()
  PCI: hv: Switch hv_compose_multi_msi_req_get_cpu() to using cpumask_next_wrap()
  scsi: lpfc: rework lpfc_next_{online,present}_cpu()
  scsi: lpfc: switch lpfc_irq_rebalance() to using cpumask_next_wrap()
  s390: switch stop_machine_yield() to using cpumask_next_wrap()
  padata: switch padata_find_next() to using cpumask_next_wrap()
  cpumask: use cpumask_next_wrap() where appropriate
  cpumask: re-introduce cpumask_next{,_and}_wrap()
  cpumask: deprecate cpumask_next_wrap()
  powerpc/xmon: simplify xmon_batch_next_cpu()
  ibmvnic: simplify ibmvnic_set_queue_affinity()
  virtio_net: simplify virtnet_set_affinity()
  objpool: rework objpool_pop()
  cpumask: add for_each_{possible,online}_cpu_wrap
  ...
2025-03-24 19:11:58 -07:00
Linus Torvalds
72b40807d2 Merge tag 'lkmm.2025.03.21a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull kernel memory model updates from Paul McKenney:
 "Add more atomic operations, rework tags, and update documentation:

   - Add additional atomic operations (Puranjay Mohan)

   - Make better use of herd7 tags (Jonas Oberhauser)

   - Update documentation (Akira Yokosawa)

  These changes require v7.58 of the herd7 and klitmus tools, up from
  v7.52"

* tag 'lkmm.2025.03.21a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  tools/memory-model: glossary.txt: Fix indents
  tools/memory-model/README: Fix typo
  tools/memory-model: Distinguish between syntactic and semantic tags
  tools/memory-model: Switch to softcoded herd7 tags
  tools/memory-model: Define effect of Mb tags on RMWs in tools/...
  tools/memory-model: Define applicable tags on operation in tools/...
  tools/memory-model: Legitimize current use of tags in LKMM macros
  tools/memory-model: Add atomic_andnot() with its variants
  tools/memory-model: Add atomic_and()/or()/xor() and add_negative
2025-03-24 18:24:11 -07:00
Linus Torvalds
418becac37 Merge tag 'nolibc-20250308-for-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull nolibc updates from Paul McKenney:
 - 32bit s390 support
 - opendir() and friends
 - openat() support
 - sscanf() support
 - various cleanups

[ Paul has just forwarded the pull request from Thomas Weißschuh, so
  the tag signature is from Thomas, not Paul   - Linus ]

* tag 'nolibc-20250308-for-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (26 commits)
  tools/nolibc: don't use asm/ UAPI headers
  selftests/nolibc: stop testing constructor order
  selftests/nolibc: use O_RDONLY flag instead of 0
  tools/nolibc: drop outdated example from overview comment
  tools/nolibc: process open() vararg as mode_t
  tools/nolibc: always use openat(2) instead of open(2)
  tools/nolibc: add support for openat(2)
  selftests/nolibc: add armthumb configuration
  selftests/nolibc: explicitly enable ARM mode
  Revert "selftests: kselftest: Fix build failure with NOLIBC"
  tools/nolibc: add support for [v]sscanf()
  tools/nolibc: add support for 32-bit s390
  selftests/nolibc: rename s390 to s390x
  selftests/nolibc: only run constructor tests on nolibc
  selftests/nolibc: split up architecture list in run-tests.sh
  tools/nolibc: add support for directory access
  tools/nolibc: add support for sys_llseek()
  selftests/nolibc: always keep test kernel configuration up to date
  selftests/nolibc: execute defconfig before other targets
  selftests/nolibc: drop call to mrproper target
  ...
2025-03-24 17:59:29 -07:00
Namhyung Kim
35d13f841a perf bpf-filter: Fix a parsing error with comma
The previous change to support cgroup filters introduced a bug that
pathname can include commas.  It confused the lexer to treat an item and
the trailing comma as a single token.  And it resulted in a parse error:

  $ sudo perf record -e cycles:P --filter 'period > 0, ip > 64' -- true
  perf_bpf_filter: Error: Unexpected item: 0,
  perf_bpf_filter: syntax error, unexpected BFT_ERROR, expecting BFT_NUM

   Usage: perf record [<options>] [<command>]
      or: perf record [<options>] -- <command> [<options>]

          --filter <filter>
                            event filter

It should get "0" and "," separately.

An easiest fix would be to remove "," from the possible pathname
characters.  As it's for cgroup names, probably ok to assume it won't
have commas in the pathname.

I found that the existing BPF filtering test didn't have any complex
filter condition with commas.  Let's update the group filter test which
is supposed to test filter combinations like this.

Link: https://lore.kernel.org/r/20250307220922.434319-1-namhyung@kernel.org
Fixes: 91e88437d5 ("perf bpf-filter: Support filtering on cgroups")
Reported-by: Sally Shi <sshii@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-24 17:29:58 -07:00
Linus Torvalds
bcb044256d Merge tag 'sched_ext-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext updates from Tejun Heo:

 - Add mechanism to count and report internal events. This significantly
   improves visibility on subtle corner conditions.

 - The default idle CPU selection logic is revamped and improved in
   multiple ways including being made topology aware.

 - sched_ext was disabling ttwu_queue for simplicity, which can be
   costly when hardware topology is more complex. Implement
   SCX_OPS_ALLOWED_QUEUED_WAKEUP so that BPF schedulers can selectively
   enable ttwu_queue.

 - tools/sched_ext updates to improve compatibility among others.

 - Other misc updates and fixes.

 - sched_ext/for-6.14-fixes were pulled a few times to receive
   prerequisite fixes and resolve conflicts.

* tag 'sched_ext-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (42 commits)
  sched_ext: idle: Refactor scx_select_cpu_dfl()
  sched_ext: idle: Honor idle flags in the built-in idle selection policy
  sched_ext: Skip per-CPU tasks in scx_bpf_reenqueue_local()
  sched_ext: Add trace point to track sched_ext core events
  sched_ext: Change the event type from u64 to s64
  sched_ext: Documentation: add task lifecycle summary
  tools/sched_ext: Provide a compatible helper for scx_bpf_events()
  selftests/sched_ext: Add NUMA-aware scheduler test
  tools/sched_ext: Provide consistent access to scx flags
  sched_ext: idle: Fix scx_bpf_pick_any_cpu_node() behavior
  sched_ext: idle: Introduce scx_bpf_nr_node_ids()
  sched_ext: idle: Introduce node-aware idle cpu kfunc helpers
  sched_ext: idle: Per-node idle cpumasks
  sched_ext: idle: Introduce SCX_OPS_BUILTIN_IDLE_PER_NODE
  sched_ext: idle: Make idle static keys private
  sched/topology: Introduce for_each_node_numadist() iterator
  mm/numa: Introduce nearest_node_nodemask()
  nodemask: numa: reorganize inclusion path
  nodemask: add nodes_copy()
  tools/sched_ext: Sync with scx repo
  ...
2025-03-24 17:23:48 -07:00
Namhyung Kim
9daa05c84a perf report: Fix a memory leak for perf_env on AMD
The env.pmu_mapping can be leaked when it reads data from a pipe on AMD.
For a pipe data, it reads the header data including pmu_mapping from
PERF_RECORD_HEADER_FEATURE runtime.  But it's already set in:

  perf_session__new()
    __perf_session__new()
      evlist__init_trace_event_sample_raw()
        evlist__has_amd_ibs()
          perf_env__nr_pmu_mappings()

Then it'll overwrite that when it processes the HEADER_FEATURE record.
Here's a report from address sanitizer.

  Direct leak of 2689 byte(s) in 1 object(s) allocated from:
    #0 0x7fed8f814596 in realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:98
    #1 0x5595a7d416b1 in strbuf_grow util/strbuf.c:64
    #2 0x5595a7d414ef in strbuf_init util/strbuf.c:25
    #3 0x5595a7d0f4b7 in perf_env__read_pmu_mappings util/env.c:362
    #4 0x5595a7d12ab7 in perf_env__nr_pmu_mappings util/env.c:517
    #5 0x5595a7d89d2f in evlist__has_amd_ibs util/amd-sample-raw.c:315
    #6 0x5595a7d87fb2 in evlist__init_trace_event_sample_raw util/sample-raw.c:23
    #7 0x5595a7d7f893 in __perf_session__new util/session.c:179
    #8 0x5595a7b79572 in perf_session__new util/session.h:115
    #9 0x5595a7b7e9dc in cmd_report builtin-report.c:1603
    #10 0x5595a7c019eb in run_builtin perf.c:351
    #11 0x5595a7c01c92 in handle_internal_command perf.c:404
    #12 0x5595a7c01deb in run_argv perf.c:448
    #13 0x5595a7c02134 in main perf.c:556
    #14 0x7fed85833d67 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58

Let's free the existing pmu_mapping data if any.

Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20250311000416.817631-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-24 16:22:06 -07:00
Linus Torvalds
11c2b2e332 Merge tag 'seccomp-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp updates from Kees Cook:

 - avoid the lock trip seccomp_filter_release in common case (Mateusz
   Guzik)

 - remove unused 'sd' argument through-out (Oleg Nesterov)

 - selftests/seccomp: Add hard-coded __NR_uretprobe for x86_64

* tag 'seccomp-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  seccomp: avoid the lock trip seccomp_filter_release in common case
  seccomp: remove the 'sd' argument from __seccomp_filter()
  seccomp: remove the 'sd' argument from __secure_computing()
  seccomp: fix the __secure_computing() stub for !HAVE_ARCH_SECCOMP_FILTER
  seccomp/mips: change syscall_trace_enter() to use secure_computing()
  selftests/seccomp: Add hard-coded __NR_uretprobe for x86_64
2025-03-24 15:34:38 -07:00
Linus Torvalds
06961fbbbd Merge tag 'move-lib-kunit-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull lib kunit selftest move from Kees Cook:
 "This is a one-off tree to coordinate the move of selftests out of lib/
  and into lib/tests/. A separate tree was used for this to keep the
  paths sane with all the work in the same place.

   - move lib/ selftests into lib/tests/ (Kees Cook, Gabriela
     Bittencourt, Luis Felipe Hernandez, Lukas Bulwahn, Tamir
     Duberstein)

   - lib/math: Add int_log test suite (Bruno Sobreira França)

   - lib/math: Add Kunit test suite for gcd() (Yu-Chun Lin)

   - lib/tests/kfifo_kunit.c: add tests for the kfifo structure (Diego
     Vieira)

   - unicode: refactor selftests into KUnit (Gabriela Bittencourt)

   - lib/prime_numbers: convert self-test to KUnit (Tamir Duberstein)

   - printf: convert self-test to KUnit (Tamir Duberstein)

   - scanf: convert self-test to KUnit (Tamir Duberstein)"

* tag 'move-lib-kunit-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (21 commits)
  scanf: break kunit into test cases
  scanf: convert self-test to KUnit
  scanf: remove redundant debug logs
  scanf: implicate test line in failure messages
  printf: implicate test line in failure messages
  printf: break kunit into test cases
  printf: convert self-test to KUnit
  kunit/fortify: Replace "volatile" with OPTIMIZER_HIDE_VAR()
  kunit/fortify: Expand testing of __compiletime_strlen()
  kunit/stackinit: Use fill byte different from Clang i386 pattern
  kunit/overflow: Fix DEFINE_FLEX tests for counted_by
  selftests: remove reference to prime_numbers.sh
  MAINTAINERS: adjust entries in FORTIFY_SOURCE and KERNEL HARDENING
  lib/prime_numbers: convert self-test to KUnit
  lib/math: Add Kunit test suite for gcd()
  unicode: kunit: change tests filename and path
  unicode: kunit: refactor selftest to kunit tests
  lib/tests/kfifo_kunit.c: add tests for the kfifo structure
  lib: Move KUnit tests into tests/ subdirectory
  lib/math: Add int_log test suite
  ...
2025-03-24 15:15:11 -07:00
Amit Cohen
36ed81bcad selftests: vxlan_bridge: Test flood with unresolved FDB entry
Extend flood test to configure FDB entry with unresolved destination IP,
check that packets are not sent twice.

Without the previous patch which handles such scenario in mlxsw, the
tests fail:

$ TESTS='test_flood' ./vxlan_bridge_1d.sh
Running tests with UDP port 4789
TEST: VXLAN: flood                                                  [ OK ]
TEST: VXLAN: flood, unresolved FDB entry                            [FAIL]
        vx2 ns2: Expected to capture 10 packets, got 20.

$ TESTS='test_flood' ./vxlan_bridge_1q.sh
INFO: Running tests with UDP port 4789
TEST: VXLAN: flood vlan 10                                          [ OK ]
TEST: VXLAN: flood vlan 20                                          [ OK ]
TEST: VXLAN: flood vlan 10, unresolved FDB entry                    [FAIL]
        vx10 ns2: Expected to capture 10 packets, got 20.
TEST: VXLAN: flood vlan 20, unresolved FDB entry                    [FAIL]
        vx20 ns2: Expected to capture 10 packets, got 20.

With the previous patch, the tests pass.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/7bc96e317531f3bf06319fb2ea447bd8666f29fa.1742224300.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24 15:09:31 -07:00
Gabriele Monaco
4ffef9579f tools/rv: Allow rv list to filter for container
Add possibility to supply the container name to rv list:

  # rv list sched
  mon1
  mon2
  mon3

This lists only monitors in sched, without indentation.
Supplying -h, any option (string starting with -) or more than 1
argument will still print the usage.
Passing a non-existent container prints nothing and passing no container
continues to print all monitors, showing indentation for nested
monitors, reported after their container.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Link: https://lore.kernel.org/20250305140406.350227-10-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-24 17:27:40 -04:00
Gabriele Monaco
2334cf7d09 verification/dot2k: Add support for nested monitors
RV now supports nested monitors, this functionality requires a container
monitor, which has virtually no functionality besides holding other
monitors, and nested monitors, that have a container as parent.

Add the -p flag to pass a parent to a monitor, this sets it up while
registering the monitor and adds necessary includes and configurations.
Add the -c flag to create a container, since containers are empty, we
don't allow supplying a dot model or a monitor type, the template is
also different since functions to enable and disable the monitor are not
defined, nor any tracepoint. The generated header file only allows to
include the rv_monitor structure in children monitors.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Link: https://lore.kernel.org/20250305140406.350227-8-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-24 17:27:40 -04:00
Gabriele Monaco
eba321a16f tools/rv: Add support for nested monitors
RV now supports nested monitors, this functionality requires a container
monitor, which has virtually no functionality besides holding other
monitors, and nested monitors, that have a container as parent.

Nested monitors' sysfs folders are physically nested in the container's
folder, and they are listed in the available_monitors file with the
notation container:monitor.
These changes go against the assumption that each line in the
available_monitors file correspond to a folder in the rv directory,
breaking the functionality of the rv tool.

Add support for nested containers in the rv userspace tool, indenting
nested monitors while listed and allowing both the notation with and
without container name, which are equivalent:

  # rv list
  mon1
  mon2
  container:
   - nested1
   - nested2

  ## notation with container name
  # rv mon container:nested1

  ## notation without container name
  # rv mon nested1

Either way, enabling a nested monitor is the same as enabling any other
non-nested monitor.
Selecting the container with rv mon enables all the nested monitors, if
-t is passed, the trace also includes the monitor name next to the
event:

  # rv mon nested1 -t
  <idle>-0        [004] event   state1 x event    -> state2
  <idle>-0        [004] error   event not expected in state2

  # rv mon sched -t
  <idle>-0        [004] event_nested1   state1 x event    -> state2
  <idle>-0        [004] error_nested1   event not expected in state2

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Link: https://lore.kernel.org/20250305140406.350227-7-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-24 17:27:40 -04:00
Gabriele Monaco
fbe6c09b7e rv: Add scpd, snep and sncid per-cpu monitors
Add 3 per-cpu monitors as part of the sched model:

* scpd: schedule called with preemption disabled
    Monitor to ensure schedule is called with preemption disabled
* snep: schedule does not enable preempt
    Monitor to ensure schedule does not enable preempt
* sncid: schedule not called with interrupt disabled
    Monitor to ensure schedule is not called with interrupt disabled

To: Ingo Molnar <mingo@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Kacur <jkacur@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Link: https://lore.kernel.org/20250305140406.350227-6-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-24 17:27:39 -04:00
Gabriele Monaco
93bac9cf35 rv: Add snroc per-task monitor
Add a per-task monitor as part of the sched model:

* snroc: set non runnable on its own context
    Monitor to ensure set_state happens only in the respective task's context

To: Ingo Molnar <mingo@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Kacur <jkacur@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Link: https://lore.kernel.org/20250305140406.350227-5-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-24 17:27:39 -04:00
Gabriele Monaco
9fd420abc4 rv: Add sco and tss per-cpu monitors
Add 2 per-cpu monitors as part of the sched model:

* sco: scheduling context operations
    Monitor to ensure sched_set_state happens only in thread context
* tss: task switch while scheduling
    Monitor to ensure sched_switch happens only in scheduling context

To: Ingo Molnar <mingo@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Kacur <jkacur@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Link: https://lore.kernel.org/20250305140406.350227-4-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-24 17:27:39 -04:00
Gabriele Monaco
26f80681a0 sched: Add sched tracepoints for RV task model
Add the following tracepoints:
* sched_entry(bool preempt, ip)
    Called while entering __schedule
* sched_exit(bool is_switch, ip)
    Called while exiting __schedule
* sched_set_state(task, curr_state, state)
    Called when a task changes its state (to and from running)

These tracepoints are useful to describe the Linux task model and are
adapted from the patches by Daniel Bristot de Oliveira
(https://bristot.me/linux-task-model/).

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Link: https://lore.kernel.org/20250305140406.350227-2-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-24 17:27:39 -04:00
Thomas Richter
216d567610 perf trace: Fix wrong size to bpf_map__update_elem call
In linux-next
commit c760174401 ("perf cpumap: Reduce cpu size from int to int16_t")
causes the perf tests 100 126 to fail on s390:

Output before:
 # ./perf test 100
 100: perf trace BTF general tests         : FAILED!
 #

The root cause is the change from int to int16_t for the
cpu maps. The size of the CPU key value pair changes from
four bytes to two bytes. However a two byte key size is
not supported for bpf_map__update_elem().
Note: validate_map_op() in libbpf.c emits warning
 libbpf: map '__augmented_syscalls__': \
	 unexpected key size 2 provided, expected 4
when key size is set to int16_t.

Therefore change to variable size back to 4 bytes for
invocation of bpf_map__update_elem().

Output after:
 # ./perf test 100
 100: perf trace BTF general tests         : Ok
 #

Fixes: c760174401 ("perf cpumap: Reduce cpu size from int to int16_t")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Acked-by: Howard Chu <howardchu95@gmail.com>
Cc: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250324152756.3879571-1-tmricht@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-24 13:55:26 -07:00
Gal Pressman
c61209eeb0 selftests: drv-net: rss_ctx: Don't assume indirection table is present
The test_rss_context_dump() test assumes the indirection table is always
supported, which is not true for all drivers, e.g., virtio_net when
VIRTIO_NET_F_RSS is disabled.

Skip the check if 'indir' is not present.

Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250318112426.386651-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24 12:22:37 -07:00
Peter Seiderer
3099f9e156 selftest: net: update proc_net_pktgen (add more imix_weights test cases)
Add more imix_weights test cases (for incomplete input).

Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250317090401.1240704-2-ps.report@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24 12:01:38 -07:00
Linus Torvalds
130e696aa6 Merge tag 'vfs-6.15-rc1.mount.namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount namespace updates from Christian Brauner:
 "This expands the ability of anonymous mount namespaces:

   - Creating detached mounts from detached mounts

     Currently, detached mounts can only be created from attached
     mounts. This limitaton prevents various use-cases. For example, the
     ability to mount a subdirectory without ever having to make the
     whole filesystem visible first.

     The current permission modelis:

      (1) Check that the caller is privileged over the owning user
          namespace of it's current mount namespace.

      (2) Check that the caller is located in the mount namespace of the
          mount it wants to create a detached copy of.

     While it is not strictly necessary to do it this way it is
     consistently applied in the new mount api. This model will also be
     used when allowing the creation of detached mount from another
     detached mount.

     The (1) requirement can simply be met by performing the same check
     as for the non-detached case, i.e., verify that the caller is
     privileged over its current mount namespace.

     To meet the (2) requirement it must be possible to infer the origin
     mount namespace that the anonymous mount namespace of the detached
     mount was created from.

     The origin mount namespace of an anonymous mount is the mount
     namespace that the mounts that were copied into the anonymous mount
     namespace originate from.

     In order to check the origin mount namespace of an anonymous mount
     namespace the sequence number of the original mount namespace is
     recorded in the anonymous mount namespace.

     With this in place it is possible to perform an equivalent check
     (2') to (2). The origin mount namespace of the anonymous mount
     namespace must be the same as the caller's mount namespace. To
     establish this the sequence number of the caller's mount namespace
     and the origin sequence number of the anonymous mount namespace are
     compared.

     The caller is always located in a non-anonymous mount namespace
     since anonymous mount namespaces cannot be setns()ed into. The
     caller's mount namespace will thus always have a valid sequence
     number.

     The owning namespace of any mount namespace, anonymous or
     non-anonymous, can never change. A mount attached to a
     non-anonymous mount namespace can never change mount namespace.

     If the sequence number of the non-anonymous mount namespace and the
     origin sequence number of the anonymous mount namespace match, the
     owning namespaces must match as well.

     Hence, the capability check on the owning namespace of the caller's
     mount namespace ensures that the caller has the ability to copy the
     mount tree.

   - Allow mount detached mounts on detached mounts

     Currently, detached mounts can only be mounted onto attached
     mounts. This limitation makes it impossible to assemble a new
     private rootfs and move it into place. Instead, a detached tree
     must be created, attached, then mounted open and then either moved
     or detached again. Lift this restriction.

     In order to allow mounting detached mounts onto other detached
     mounts the same permission model used for creating detached mounts
     from detached mounts can be used (cf. above).

     Allowing to mount detached mounts onto detached mounts leaves three
     cases to consider:

      (1) The source mount is an attached mount and the target mount is
          a detached mount. This would be equivalent to moving a mount
          between different mount namespaces. A caller could move an
          attached mount to a detached mount. The detached mount can now
          be freely attached to any mount namespace. This changes the
          current delegatioh model significantly for no good reason. So
          this will fail.

      (2) Anonymous mount namespaces are always attached fully, i.e., it
          is not possible to only attach a subtree of an anoymous mount
          namespace. This simplifies the implementation and reasoning.

          Consequently, if the anonymous mount namespace of the source
          detached mount and the target detached mount are the identical
          the mount request will fail.

      (3) The source mount's anonymous mount namespace is different from
          the target mount's anonymous mount namespace.

          In this case the source anonymous mount namespace of the
          source mount tree must be freed after its mounts have been
          moved to the target anonymous mount namespace. The source
          anonymous mount namespace must be empty afterwards.

     By allowing to mount detached mounts onto detached mounts a caller
     may do the following:

       fd_tree1 = open_tree(-EBADF, "/mnt", OPEN_TREE_CLONE)
       fd_tree2 = open_tree(-EBADF, "/tmp", OPEN_TREE_CLONE)

     fd_tree1 and fd_tree2 refer to two different detached mount trees
     that belong to two different anonymous mount namespace.

     It is important to note that fd_tree1 and fd_tree2 both refer to
     the root of their respective anonymous mount namespaces.

     By allowing to mount detached mounts onto detached mounts the
     caller may now do:

         move_mount(fd_tree1, "", fd_tree2, "",
                    MOVE_MOUNT_F_EMPTY_PATH | MOVE_MOUNT_T_EMPTY_PATH)

     This will cause the detached mount referred to by fd_tree1 to be
     mounted on top of the detached mount referred to by fd_tree2.

     Thus, the detached mount fd_tree1 is moved from its separate
     anonymous mount namespace into fd_tree2's anonymous mount
     namespace.

     It also means that while fd_tree2 continues to refer to the root of
     its respective anonymous mount namespace fd_tree1 doesn't anymore.

     This has the consequence that only fd_tree2 can be moved to another
     anonymous or non-anonymous mount namespace. Moving fd_tree1 will
     now fail as fd_tree1 doesn't refer to the root of an anoymous mount
     namespace anymore.

     Now fd_tree1 and fd_tree2 refer to separate detached mount trees
     referring to the same anonymous mount namespace.

     This is conceptually fine. The new mount api does allow for this to
     happen already via:

       mount -t tmpfs tmpfs /mnt
       mkdir -p /mnt/A
       mount -t tmpfs tmpfs /mnt/A

       fd_tree3 = open_tree(-EBADF, "/mnt", OPEN_TREE_CLONE | AT_RECURSIVE)
       fd_tree4 = open_tree(-EBADF, "/mnt/A", 0)

     Both fd_tree3 and fd_tree4 refer to two different detached mount
     trees but both detached mount trees refer to the same anonymous
     mount namespace. An as with fd_tree1 and fd_tree2, only fd_tree3
     may be moved another mount namespace as fd_tree3 refers to the root
     of the anonymous mount namespace just while fd_tree4 doesn't.

     However, there's an important difference between the
     fd_tree3/fd_tree4 and the fd_tree1/fd_tree2 example.

     Closing fd_tree4 and releasing the respective struct file will have
     no further effect on fd_tree3's detached mount tree.

     However, closing fd_tree3 will cause the mount tree and the
     respective anonymous mount namespace to be destroyed causing the
     detached mount tree of fd_tree4 to be invalid for further mounting.

     By allowing to mount detached mounts on detached mounts as in the
     fd_tree1/fd_tree2 example both struct files will affect each other.

     Both fd_tree1 and fd_tree2 refer to struct files that have
     FMODE_NEED_UNMOUNT set.

     To handle this we use the fact that @fd_tree1 will have a parent
     mount once it has been attached to @fd_tree2.

     When dissolve_on_fput() is called the mount that has been passed in
     will refer to the root of the anonymous mount namespace. If it
     doesn't it would mean that mounts are leaked. So before allowing to
     mount detached mounts onto detached mounts this would be a bug.

     Now that detached mounts can be mounted onto detached mounts it
     just means that the mount has been attached to another anonymous
     mount namespace and thus dissolve_on_fput() must not unmount the
     mount tree or free the anonymous mount namespace as the file
     referring to the root of the namespace hasn't been closed yet.

     If it had been closed yet it would be obvious because the mount
     namespace would be NULL, i.e., the @fd_tree1 would have already
     been unmounted. If @fd_tree1 hasn't been unmounted yet and has a
     parent mount it is safe to skip any cleanup as closing @fd_tree2
     will take care of all cleanup operations.

   - Allow mount propagation for detached mount trees

     In commit ee2e3f5062 ("mount: fix mounting of detached mounts
     onto targets that reside on shared mounts") I fixed a bug where
     propagating the source mount tree of an anonymous mount namespace
     into a target mount tree of a non-anonymous mount namespace could
     be used to trigger an integer overflow in the non-anonymous mount
     namespace causing any new mounts to fail.

     The cause of this was that the propagation algorithm was unable to
     recognize mounts from the source mount tree that were already
     propagated into the target mount tree and then reappeared as
     propagation targets when walking the destination propagation mount
     tree.

     When fixing this I disabled mount propagation into anonymous mount
     namespaces. Make it possible for anonymous mount namespace to
     receive mount propagation events correctly. This is now also a
     correctness issue now that we allow mounting detached mount trees
     onto detached mount trees.

     Mark the source anonymous mount namespace with MNTNS_PROPAGATING
     indicating that all mounts belonging to this mount namespace are
     currently in the process of being propagated and make the
     propagation algorithm discard those if they appear as propagation
     targets"

* tag 'vfs-6.15-rc1.mount.namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (21 commits)
  selftests: test subdirectory mounting
  selftests: add test for detached mount tree propagation
  fs: namespace: fix uninitialized variable use
  mount: handle mount propagation for detached mount trees
  fs: allow creating detached mounts from fsmount() file descriptors
  selftests: seventh test for mounting detached mounts onto detached mounts
  selftests: sixth test for mounting detached mounts onto detached mounts
  selftests: fifth test for mounting detached mounts onto detached mounts
  selftests: fourth test for mounting detached mounts onto detached mounts
  selftests: third test for mounting detached mounts onto detached mounts
  selftests: second test for mounting detached mounts onto detached mounts
  selftests: first test for mounting detached mounts onto detached mounts
  fs: mount detached mounts onto detached mounts
  fs: support getname_maybe_null() in move_mount()
  selftests: create detached mounts from detached mounts
  fs: create detached mounts from detached mounts
  fs: add may_copy_tree()
  fs: add fastpath for dissolve_on_fput()
  fs: add assert for move_mount()
  fs: add mnt_ns_empty() helper
  ...
2025-03-24 11:41:41 -07:00
Linus Torvalds
74adf9e353 Merge tag 'vfs-6.15-rc1.nsfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs nsfs updates from Christian Brauner:
 "This contains non-urgent fixes for nsfs to validate ioctls before
  performing any relevant operations.

  We alredy did this for a few other filesystems last cycle"

* tag 'vfs-6.15-rc1.nsfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  selftests/nsfs: add ioctl validation tests
  nsfs: validate ioctls
2025-03-24 11:38:12 -07:00
Linus Torvalds
804382d59b Merge tag 'vfs-6.15-rc1.overlayfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs overlayfs updates from Christian Brauner:
 "Currently overlayfs uses the mounter's credentials for its
  override_creds() calls. That provides a consistent permission model.

  This patches allows a caller to instruct overlayfs to use its
  credentials instead. The caller must be located in the same user
  namespace hierarchy as the user namespace the overlayfs instance will
  be mounted in. This provides a consistent and simple security model.

  With this it is possible to e.g., mount an overlayfs instance where
  the mounter must have CAP_SYS_ADMIN but the credentials used for
  override_creds() have dropped CAP_SYS_ADMIN. It also allows the usage
  of custom fs{g,u}id different from the callers and other tweaks"

* tag 'vfs-6.15-rc1.overlayfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  selftests/ovl: add third selftest for "override_creds"
  selftests/ovl: add second selftest for "override_creds"
  selftests/filesystems: add utils.{c,h}
  selftests/ovl: add first selftest for "override_creds"
  ovl: allow to specify override credentials
2025-03-24 10:37:40 -07:00
Linus Torvalds
df00ded23a Merge tag 'vfs-6.15-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs pidfs updates from Christian Brauner:

 - Allow retrieving exit information after a process has been reaped
   through pidfds via the new PIDFD_INTO_EXIT extension for the
   PIDFD_GET_INFO ioctl. Various tools need access to information about
   a process/task even after it has already been reaped.

   Pidfd polling allows waiting on either task exit or for a task to
   have been reaped. The contract for PIDFD_INFO_EXIT is simply that
   EPOLLHUP must be observed before exit information can be retrieved,
   i.e., exit information is only provided once the task has been reaped
   and then can be retrieved as long as the pidfd is open.

 - Add PIDFD_SELF_{THREAD,THREAD_GROUP} sentinels allowing userspace to
   forgo allocating a file descriptor for their own process. This is
   useful in scenarios where users want to act on their own process
   through pidfds and is akin to AT_FDCWD.

 - Improve premature thread-group leader and subthread exec behavior
   when polling on pidfds:

   (1) During a multi-threaded exec by a subthread, i.e.,
       non-thread-group leader thread, all other threads in the
       thread-group including the thread-group leader are killed and the
       struct pid of the thread-group leader will be taken over by the
       subthread that called exec. IOW, two tasks change their TIDs.

   (2) A premature thread-group leader exit means that the thread-group
       leader exited before all of the other subthreads in the
       thread-group have exited.

   Both cases lead to inconsistencies for pidfd polling with
   PIDFD_THREAD. Any caller that holds a PIDFD_THREAD pidfd to the
   current thread-group leader may or may not see an exit notification
   on the file descriptor depending on when poll is performed. If the
   poll is performed before the exec of the subthread has concluded an
   exit notification is generated for the old thread-group leader. If
   the poll is performed after the exec of the subthread has concluded
   no exit notification is generated for the old thread-group leader.

   The correct behavior is to simply not generate an exit notification
   on the struct pid of a subhthread exec because the struct pid is
   taken over by the subthread and thus remains alive.

   But this is difficult to handle because a thread-group may exit
   premature as mentioned in (2). In that case an exit notification is
   reliably generated but the subthreads may continue to run for an
   indeterminate amount of time and thus also may exec at some point.

   After this pull no exit notifications will be generated for a
   PIDFD_THREAD pidfd for a thread-group leader until all subthreads
   have been reaped. If a subthread should exec before no exit
   notification will be generated until that task exits or it creates
   subthreads and repeates the cycle.

   This means an exit notification indicates the ability for the father
   to reap the child.

* tag 'vfs-6.15-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits)
  selftests/pidfd: third test for multi-threaded exec polling
  selftests/pidfd: second test for multi-threaded exec polling
  selftests/pidfd: first test for multi-threaded exec polling
  pidfs: improve multi-threaded exec and premature thread-group leader exit polling
  pidfs: ensure that PIDFS_INFO_EXIT is available
  selftests/pidfd: add seventh PIDFD_INFO_EXIT selftest
  selftests/pidfd: add sixth PIDFD_INFO_EXIT selftest
  selftests/pidfd: add fifth PIDFD_INFO_EXIT selftest
  selftests/pidfd: add fourth PIDFD_INFO_EXIT selftest
  selftests/pidfd: add third PIDFD_INFO_EXIT selftest
  selftests/pidfd: add second PIDFD_INFO_EXIT selftest
  selftests/pidfd: add first PIDFD_INFO_EXIT selftest
  selftests/pidfd: expand common pidfd header
  pidfs/selftests: ensure correct headers for ioctl handling
  selftests/pidfd: fix header inclusion
  pidfs: allow to retrieve exit information
  pidfs: record exit code and cgroupid at exit
  pidfs: use private inode slab cache
  pidfs: move setting flags into pidfs_alloc_file()
  pidfd: rely on automatic cleanup in __pidfd_prepare()
  ...
2025-03-24 10:16:37 -07:00
Marcus Meissner
9a352a90e8 perf tools: annotate asm_pure_loop.S
Annotate so it is built with non-executable stack.

Fixes: 8b97519711 ("perf test: Add asm pureloop test tool")
Signed-off-by: Marcus Meissner <meissner@suse.de>
Reviewed-by: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/20250323085410.23751-1-meissner@suse.de
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-24 09:39:54 -07:00
Ian Rogers
ba3b0861ed perf python: Fix setup.py mypy errors
getenv may return None, so assert it isn't None for CC and srctree
environmental variables required for the script.
Disable an optional warning related to Popen.

Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250311213628.569562-7-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-24 09:38:20 -07:00
Ian Rogers
21944462d5 perf test: Address attr.py mypy error
ConfigParser existed in python2 but not in python3 causing mypy to
fail.
Whilst removing a python2 workaround remove reference to __future__.

Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250311213628.569562-6-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-24 09:38:20 -07:00
Ian Rogers
8a54784e70 perf build: Add pylint build tests
If PYLINT=1 is passed to the build then run pylint over python code in
perf. Unlike shellcheck this isn't default on as there are currently
too many errors.

An example of an error:
```
************* Module setup
util/setup.py:19:0: C0301: Line too long (127/100) (line-too-long)
util/setup.py:20:0: C0301: Line too long (138/100) (line-too-long)
util/setup.py:63:0: C0301: Line too long (106/100) (line-too-long)
util/setup.py:1:0: C0114: Missing module docstring (missing-module-docstring)
util/setup.py:24:4: W0622: Redefining built-in 'vars' (redefined-builtin)
util/setup.py:11:4: C0103: Constant name "cc_options" doesn't conform to UPPER_CASE naming style (invalid-name)
util/setup.py:13:4: C0103: Constant name "cc_options" doesn't conform to UPPER_CASE naming style (invalid-name)
util/setup.py:15:34: R1732: Consider using 'with' for resource-allocating operations (consider-using-with)
util/setup.py:18:0: C0116: Missing function or method docstring (missing-function-docstring)
util/setup.py:19:16: R1732: Consider using 'with' for resource-allocating operations (consider-using-with)
util/setup.py:44:0: C0413: Import "from setuptools import setup, Extension" should be placed at the top of the module (wrong-import-position)
util/setup.py:46:0: C0413: Import "from setuptools.command.build_ext import build_ext as _build_ext" should be placed at the top of the module (wrong-import-position)
util/setup.py:47:0: C0413: Import "from setuptools.command.install_lib import install_lib as _install_lib" should be placed at the top of the module (wrong-import-position)
util/setup.py:49:0: C0115: Missing class docstring (missing-class-docstring)
util/setup.py:49:0: C0103: Class name "build_ext" doesn't conform to PascalCase naming style (invalid-name)
util/setup.py:52:8: W0201: Attribute 'build_lib' defined outside __init__ (attribute-defined-outside-init)
util/setup.py:53:8: W0201: Attribute 'build_temp' defined outside __init__ (attribute-defined-outside-init)
util/setup.py:55:0: C0115: Missing class docstring (missing-class-docstring)
util/setup.py:55:0: C0103: Class name "install_lib" doesn't conform to PascalCase naming style (invalid-name)
util/setup.py:58:8: W0201: Attribute 'build_dir' defined outside __init__ (attribute-defined-outside-init)

*-----------------------------------------------------------------
Your code has been rated at 6.67/10 (previous run: 6.51/10, +0.16)

make[4]: *** [util/Build:442: util/setup.py.pylint_log] Error 1
```

Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250311213628.569562-5-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-24 09:38:20 -07:00
Ian Rogers
168910d0f9 perf build: Add mypy build tests
If MYPY=1 is passed to the build then run mypy over python code in
perf. Unlike shellcheck this isn't default on as there are currently
too many errors.

An example of an error:
```
util/setup.py:8: error: Item "None" of "str | None" has no attribute "split"  [union-attr]
util/setup.py:15: error: Item "None" of "IO[bytes] | None" has no attribute "readline"  [union-attr]
util/setup.py:15: error: List item 0 has incompatible type "str | None"; expected "str | bytes | PathLike[str] | PathLike[bytes]"  [list-item]
util/setup.py:16: error: Unsupported left operand type for + ("None")  [operator]
util/setup.py:16: note: Left operand is of type "str | None"
util/setup.py:74: error: Unsupported left operand type for + ("None")  [operator]
util/setup.py:74: note: Left operand is of type "str | None"
Found 5 errors in 1 file (checked 1 source file)
make[4]: *** [util/Build:430: util/setup.py.mypy_log] Error 1
```

Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250311213628.569562-4-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-24 09:38:20 -07:00
Ian Rogers
ef238109a3 perf build: Rename TEST_LOGS to SHELL_TEST_LOGS
Rename TEST_LOGS to SHELL_TEST_LOGS as later changes will add more
kinds of test logs.
Minor comment tweak in Makefile.perf as more than just test shell
tests are checked.

Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250311213628.569562-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-24 09:38:20 -07:00
Ian Rogers
935e7cb5bb tools/build: Don't pass test log files to linker
Separate test log files from object files. Depend on test log output
but don't pass to the linker.

Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250311213628.569562-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-24 09:38:20 -07:00
Linus Torvalds
fd101da676 Merge tag 'vfs-6.15-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount updates from Christian Brauner:

 - Mount notifications

   The day has come where we finally provide a new api to listen for
   mount topology changes outside of /proc/<pid>/mountinfo. A mount
   namespace file descriptor can be supplied and registered with
   fanotify to listen for mount topology changes.

   Currently notifications for mount, umount and moving mounts are
   generated. The generated notification record contains the unique
   mount id of the mount.

   The listmount() and statmount() api can be used to query detailed
   information about the mount using the received unique mount id.

   This allows userspace to figure out exactly how the mount topology
   changed without having to generating diffs of /proc/<pid>/mountinfo
   in userspace.

 - Support O_PATH file descriptors with FSCONFIG_SET_FD in the new mount
   api

 - Support detached mounts in overlayfs

   Since last cycle we support specifying overlayfs layers via file
   descriptors. However, we don't allow detached mounts which means
   userspace cannot user file descriptors received via
   open_tree(OPEN_TREE_CLONE) and fsmount() directly. They have to
   attach them to a mount namespace via move_mount() first.

   This is cumbersome and means they have to undo mounts via umount().
   Allow them to directly use detached mounts.

 - Allow to retrieve idmappings with statmount

   Currently it isn't possible to figure out what idmapping has been
   attached to an idmapped mount. Add an extension to statmount() which
   allows to read the idmapping from the mount.

 - Allow creating idmapped mounts from mounts that are already idmapped

   So far it isn't possible to allow the creation of idmapped mounts
   from already idmapped mounts as this has significant lifetime
   implications. Make the creation of idmapped mounts atomic by allow to
   pass struct mount_attr together with the open_tree_attr() system call
   allowing to solve these issues without complicating VFS lookup in any
   way.

   The system call has in general the benefit that creating a detached
   mount and applying mount attributes to it becomes an atomic operation
   for userspace.

 - Add a way to query statmount() for supported options

   Allow userspace to query which mount information can be retrieved
   through statmount().

 - Allow superblock owners to force unmount

* tag 'vfs-6.15-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (21 commits)
  umount: Allow superblock owners to force umount
  selftests: add tests for mount notification
  selinux: add FILE__WATCH_MOUNTNS
  samples/vfs: fix printf format string for size_t
  fs: allow changing idmappings
  fs: add kflags member to struct mount_kattr
  fs: add open_tree_attr()
  fs: add copy_mount_setattr() helper
  fs: add vfs_open_tree() helper
  statmount: add a new supported_mask field
  samples/vfs: add STATMOUNT_MNT_{G,U}IDMAP
  selftests: add tests for using detached mount with overlayfs
  samples/vfs: check whether flag was raised
  statmount: allow to retrieve idmappings
  uidgid: add map_id_range_up()
  fs: allow detached mounts in clone_private_mount()
  selftests/overlayfs: test specifying layers as O_PATH file descriptors
  fs: support O_PATH fds with FSCONFIG_SET_FD
  vfs: add notifications for mount attach and detach
  fanotify: notify on mount attach and detach
  ...
2025-03-24 09:34:10 -07:00
Dirk Gouders
99476fa085 perf bench sched pipe: fix enforced blocking reads in worker_thread
The function worker_thread() is programmed in a way that roughly
doubles the number of expectable context switches, because it enforces
blocking reads:

 Performance counter stats for 'perf bench sched pipe':

         2,000,004      context-switches

      11.859548321 seconds time elapsed

       0.674871000 seconds user
       8.076890000 seconds sys

The result of this behavior is that the blocking reads by far dominate
the performance analysis of 'perf bench sched pipe':

Samples: 78K of event 'cycles:P', Event count (approx.): 27964965844
Overhead  Command     Shared Object         Symbol
  25.28%  sched-pipe  [kernel.kallsyms]     [k] read_hpet
   8.11%  sched-pipe  [kernel.kallsyms]     [k] retbleed_untrain_ret
   2.82%  sched-pipe  [kernel.kallsyms]     [k] pipe_write

From the code, it is unclear if that behavior is wanted but the log
says that at least Ingo Molnar aims to mimic lmbench's lat_ctx, that
doesn't handle the pipe ends that way
(https://sourceforge.net/p/lmbench/code/HEAD/tree/trunk/lmbench2/src/lat_ctx.c)

Fix worker_thread() by always first feeding the write ends of the pipes
and then trying to read.

This roughly halves the context switches and runtime of pure
'perf bench sched pipe':

 Performance counter stats for 'perf bench sched pipe':

         1,005,770      context-switches

       6.033448041 seconds time elapsed

       0.423142000 seconds user
       4.519829000 seconds sys

And the blocking reads do no longer dominate the analysis at the above
extreme:

Samples: 40K of event 'cycles:P', Event count (approx.): 14309364879
Overhead  Command     Shared Object         Symbol
  12.20%  sched-pipe  [kernel.kallsyms]     [k] read_hpet
   9.23%  sched-pipe  [kernel.kallsyms]     [k] retbleed_untrain_ret
   3.68%  sched-pipe  [kernel.kallsyms]     [k] pipe_write

Signed-off-by: Dirk Gouders <dirk@gouders.net>
Acked-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250323140316.19027-2-dirk@gouders.net
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-23 23:20:37 -07:00
Likhitha Korrapati
7e442be701 perf tools: Fix is_compat_mode build break in ppc64
Commit 54f9aa1092 ("tools/perf/powerpc/util: Add support to
handle compatible mode PVR for perf json events") introduced
to select proper JSON events in case of compat mode using
auxiliary vector. But this caused a compilation error in ppc64
Big Endian.

arch/powerpc/util/header.c: In function 'is_compat_mode':
arch/powerpc/util/header.c:20:21: error: cast to pointer from
integer of different size [-Werror=int-to-pointer-cast]
   20 |         if (!strcmp((char *)platform, (char *)base_platform))
      |                     ^
arch/powerpc/util/header.c:20:39: error: cast to pointer from
integer of different size [-Werror=int-to-pointer-cast]
   20 |         if (!strcmp((char *)platform, (char *)base_platform))
      |

Commit saved the getauxval(AT_BASE_PLATFORM) and getauxval(AT_PLATFORM)
return values in u64 which causes the compilation error.

Patch fixes this issue by changing u64 to "unsigned long".

Fixes: 54f9aa1092 ("tools/perf/powerpc/util: Add support to handle compatible mode PVR for perf json events")
Signed-off-by: Likhitha Korrapati <likhitha@linux.ibm.com>
Reviewed-by: Athira Rajeev <atrajeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20250321100726.699956-1-likhitha@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-23 23:14:19 -07:00
Holger Hoffstätte
9480cc14a9 perf build: filter all combinations of -flto for libperl
When enabling the libperl feature the build uses perl's build flags
(ccopts) but filters out various flags, e.g. for LTO.
While this is conceptually correct, it is insufficient in practice,
since only "-flto=auto" is filtered out. When perl itself is built with
"-flto" this can cause parts of perf being built with LTO and others
without, giving exciting build errors like e.g.:

   ../tools/perf/pmu-events/pmu-events.c:72851:(.text+0xb79): undefined
   reference to `strcmp_cpuid_str' collect2: error: ld returned 1 exit status

Fix this by filtering all matching flag values of -flto{=n,auto,..}.

Signed-off-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Link: https://lore.kernel.org/r/20250321082038.27901-2-holger@applied-asynchrony.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-23 23:12:10 -07:00
Ming Lei
0f3ebf2d4b selftests: ublk: add stripe target
Add ublk stripe target which can take 1~4 underlying backing files
or block device, with stripe size 4k ~ 512K.

Add two basic tests(write verify & mkfs/mount/umount) over ublk/stripe.

This target is helpful to cover multiple IOs aiming at same
fixed/registered IO kernel buffer.

It is also capable of verifying vectored registered (kernel)buffers
in future for zero copy, so far it isn't supported yet.

Todo: support vectored registered kernel buffer for ublk/zc.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-9-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-22 08:35:08 -06:00
Ming Lei
263846eb43 selftests: ublk: simplify loop io completion
Use the added target io handling helpers for simplifying loop io
completion.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-8-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-22 08:35:08 -06:00
Ming Lei
8cb9b971e2 selftests: ublk: enable zero copy for null target
Enable zero copy for null target so that we can evaluate performance
from zero copy or not.

Also this should be the simplest ublk zero copy implementation, which
can be served as zc example.

Add test for covering 'add -t null -z'.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-7-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-22 08:35:08 -06:00
Ming Lei
8842b72a82 selftests: ublk: prepare for supporting stripe target
- pass 'truct dev_ctx *ctx' to target init function

- add 'private_data' to 'struct ublk_dev' for storing target specific data

- add 'private_data' to 'struct ublk_io' for storing per-IO data

- add 'tgt_ios' to 'struct ublk_io' for counting how many io_uring ios
for handling the current io command

- add helper ublk_get_io() for supporting stripe target

- add two helpers for simplifying target io handling

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-6-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-22 08:35:08 -06:00
Ming Lei
10d962dae2 selftests: ublk: move common code into common.c
Move two functions for initializing & de-initializing backing file
into common.c.

Also move one common helper into kublk.h.

Prepare for supporting ublk-stripe.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-5-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-22 08:35:08 -06:00
Ming Lei
9413c0ca8e selftests: ublk: increase max buffer size to 1MB
Increase max buffer size to 1MB, and 64KB is too small to evaluate
performance with builtin ublk server implementation.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-22 08:35:08 -06:00
Ming Lei
f2639ed11e selftests: ublk: add single sqe allocator helper
Unify the sqe allocator helper, and we will use it for supporting
more cases, such as ublk stripe, in which variable sqe allocation
is required.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-22 08:35:08 -06:00
Ming Lei
723977cab4 selftests: ublk: add generic_01 for verifying sequential IO order
block layer, ublk and io_uring might re-order IO in the past

- plug

- queue ublk io command via task work

Add one test for verifying if sequential WRITE IO is dispatched in order.

- null target is taken, so we can just observe io order from
`tracepoint:block:block_rq_complete` which represents the dispatch order

- WRITE IO is taken because READ may come from system-wide utility

Cc: Uday Shankar <ushankar@purestorage.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-22 08:35:08 -06:00
Kohei Enju
5f3077d7fc selftests/bpf: Add selftests for load-acquire/store-release when register number is invalid
syzbot reported out-of-bounds read in check_atomic_load/store() when the
register number is invalid in this context:
    https://syzkaller.appspot.com/bug?extid=a5964227adc0f904549c

To avoid the issue from now on, let's add tests where the register number
is invalid for load-acquire/store-release.

After discussion with Eduard, I decided to use R15 as invalid register
because the actual slab-out-of-bounds read issue occurs when the register
number is R12 or larger.

Signed-off-by: Kohei Enju <enjuk@amazon.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250322045340.18010-6-enjuk@amazon.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-22 06:19:09 -07:00