Commit Graph

47836 Commits

Author SHA1 Message Date
Peter Zijlstra
c5b9678957 perf/bpf: Robustify perf_event_free_bpf_prog()
Ensure perf_event_free_bpf_prog() is safe to call a second time;
notably without making any references to event->pmu when there is no
prog left.

Note: perf_event_detach_bpf_prog() might leave a stale event->prog

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20241104135518.978956692@infradead.org
2025-03-04 09:42:59 +01:00
Peter Zijlstra
adc38b4ca1 perf/core: Introduce perf_free_addr_filters()
Replace _free_event()'s use of perf_addr_filters_splice()s use with an
explicit perf_free_addr_filters() with the explicit propery that it is
able to be called a second time without ill effect.

Most notable, referencing event->pmu must be avoided when there are no
filters left (from eg a previous call).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20241104135518.868460518@infradead.org
2025-03-04 09:42:55 +01:00
Peter Zijlstra
b2996f5655 perf/core: Add this_cpc() helper
As a preparation for adding yet another indirection.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20241104135518.650051565@infradead.org
2025-03-04 09:42:51 +01:00
Peter Zijlstra
4baeb0687a perf/core: Merge struct pmu::pmu_disable_count into struct perf_cpu_pmu_context::pmu_disable_count
Because it makes no sense to have two per-cpu allocations per pmu.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20241104135518.518730578@infradead.org
2025-03-04 09:42:47 +01:00
Peter Zijlstra
8f2221f52e perf/core: Simplify perf_event_alloc()
Using the previous simplifications, transition perf_event_alloc() to
the cleanup way of things -- reducing error path magic.

[ mingo: Ported it to recent kernels. ]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20241104135518.410755241@infradead.org
2025-03-04 09:42:40 +01:00
Peter Zijlstra
caf8b765d4 perf/core: Simplify perf_init_event()
Use the <linux/cleanup.h> guard() and scoped_guard() infrastructure
to simplify the control flow.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20241104135518.302444446@infradead.org
2025-03-04 09:42:32 +01:00
Peter Zijlstra
6c8b0b835f perf/core: Simplify perf_pmu_register()
Using the previously introduced perf_pmu_free() and a new IDR helper,
simplify the perf_pmu_register error paths.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20241104135518.198937277@infradead.org
2025-03-04 09:42:29 +01:00
Peter Zijlstra
8f4c4963d2 perf/core: Simplify the perf_pmu_register() error path
The error path of perf_pmu_register() is of course very similar to a
subset of perf_pmu_unregister(). Extract this common part in
perf_pmu_free() and simplify things.

[ mingo: Forward ported it ]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20241104135518.090915501@infradead.org
2025-03-04 09:42:26 +01:00
Peter Zijlstra
c70ca29803 perf/core: Simplify the perf_event_alloc() error path
The error cleanup sequence in perf_event_alloc() is a subset of the
existing _free_event() function (it must of course be).

Split this out into __free_event() and simplify the error path.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20241104135517.967889521@infradead.org
2025-03-04 09:42:14 +01:00
Saket Kumar Bhaskar
061c991697 perf/hw_breakpoint: Return EOPNOTSUPP for unsupported breakpoint type
Currently, __reserve_bp_slot() returns -ENOSPC for unsupported
breakpoint types on the architecture. For example, powerpc
does not support hardware instruction breakpoints. This causes
the perf_skip BPF selftest to fail, as neither ENOENT nor
EOPNOTSUPP is returned by perf_event_open for unsupported
breakpoint types. As a result, the test that should be skipped
for this arch is not correctly identified.

To resolve this, hw_breakpoint_event_init() should exit early by
checking for unsupported breakpoint types using
hw_breakpoint_slots_cached() and return the appropriate error
(-EOPNOTSUPP).

Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: https://lore.kernel.org/r/20250303092451.1862862-1-skb99@linux.ibm.com
2025-03-04 09:42:13 +01:00
Hans Zhang
01499ae673 genirq/msi: Expose MSI message data in debugfs
When debugging MSI-related hardware issues (e.g. interrupt delivery
failures), developers currently need to either:

  1. Recompile the kernel with dynamic debug for tracing msi_desc.
  2. Manually read device registers through low-level tools.

Both approaches become challenging in production environments where
dynamic debugging is often disabled.

The interrupt core provides a debugfs interface for inspection of interrupt
related data, which contains the per interrupt information in the view of
the hierarchical interrupt domains. Though this interface does not expose
the MSI address/data pair, which is important information to:

  - Verify whether the MSI configuration matches the hardware expectations
  - Diagnose interrupt routing errors (e.g., mismatched destination ID)
  - Validate remapping behavior in virtualized environments

Implement the debug_show() callback for the generic MSI interrupt domains,
and use it to expose the MSI address/data pair in the per interrupt
diagnostics.

Sample output:
  address_hi: 0x00000000
  address_lo: 0xfe670040
  msg_data:   0x00000001

[ tglx: Massaged change log. Use irq_data_get_msi_desc() to avoid pointless
  	lookup. ]

Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250303121008.309265-1-18255117159@163.com
2025-03-03 22:16:03 +01:00
Ingo Molnar
1fff9f8730 Merge tag 'v6.14-rc5' into x86/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-03-03 21:05:45 +01:00
Tejun Heo
8a9b1585e2 sched_ext: Merge branch 'for-6.14-fixes' into for-6.15
Pull for-6.14-fixes to receive:

  9360dfe4cb ("sched_ext: Validate prev_cpu in scx_bpf_select_cpu_dfl()")

which conflicts with:

  337d1b354a ("sched_ext: Move built-in idle CPU selection policy to a separate file")

Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-03 08:02:22 -10:00
Andrea Righi
9360dfe4cb sched_ext: Validate prev_cpu in scx_bpf_select_cpu_dfl()
If a BPF scheduler provides an invalid CPU (outside the nr_cpu_ids
range) as prev_cpu to scx_bpf_select_cpu_dfl() it can cause a kernel
crash.

To prevent this, validate prev_cpu in scx_bpf_select_cpu_dfl() and
trigger an scx error if an invalid CPU is specified.

Fixes: f0e1a0643a ("sched_ext: Implement BPF extensible scheduler class")
Cc: stable@vger.kernel.org # v6.12+
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-03 07:55:48 -10:00
Linus Torvalds
26edad06d5 Merge tag 'probes-fixes-v6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probe events fixes from Masami Hiramatsu:

 - probe-events: Remove unused MAX_ARG_BUF_LEN macro - it is not used

 - fprobe-events: Log error for exceeding the number of entry args.

   Since the max number of entry args is limited, it should be checked
   and rejected when the parser detects it.

 - tprobe-events: Reject invalid tracepoint name

   If a user specifies an invalid tracepoint name (e.g. including '/')
   then the new event is not defined correctly in the eventfs.

 - tprobe-events: Fix a memory leak when tprobe defined with $retval

   There is a memory leak if tprobe is defined with $retval.

* tag 'probes-fixes-v6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: probe-events: Remove unused MAX_ARG_BUF_LEN macro
  tracing: fprobe-events: Log error for exceeding the number of entry args
  tracing: tprobe-events: Reject invalid tracepoint name
  tracing: tprobe-events: Fix a memory leak when tprobe with $retval
2025-03-03 07:28:15 -10:00
Lizhi Xu
52323ed144 PM: hibernate: Avoid deadlock in hibernate_compressor_param_set()
syzbot reported a deadlock in lock_system_sleep() (see below).

The write operation to "/sys/module/hibernate/parameters/compressor"
conflicts with the registration of ieee80211 device, resulting in a deadlock
when attempting to acquire system_transition_mutex under param_lock.

To avoid this deadlock, change hibernate_compressor_param_set() to use
mutex_trylock() for attempting to acquire system_transition_mutex and
return -EBUSY when it fails.

Task flags need not be saved or adjusted before calling
mutex_trylock(&system_transition_mutex) because the caller is not going
to end up waiting for this mutex and if it runs concurrently with system
suspend in progress, it will be frozen properly when it returns to user
space.

syzbot report:

syz-executor895/5833 is trying to acquire lock:
ffffffff8e0828c8 (system_transition_mutex){+.+.}-{4:4}, at: lock_system_sleep+0x87/0xa0 kernel/power/main.c:56

but task is already holding lock:
ffffffff8e07dc68 (param_lock){+.+.}-{4:4}, at: kernel_param_lock kernel/params.c:607 [inline]
ffffffff8e07dc68 (param_lock){+.+.}-{4:4}, at: param_attr_store+0xe6/0x300 kernel/params.c:586

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (param_lock){+.+.}-{4:4}:
       __mutex_lock_common kernel/locking/mutex.c:585 [inline]
       __mutex_lock+0x19b/0xb10 kernel/locking/mutex.c:730
       ieee80211_rate_control_ops_get net/mac80211/rate.c:220 [inline]
       rate_control_alloc net/mac80211/rate.c:266 [inline]
       ieee80211_init_rate_ctrl_alg+0x18d/0x6b0 net/mac80211/rate.c:1015
       ieee80211_register_hw+0x20cd/0x4060 net/mac80211/main.c:1531
       mac80211_hwsim_new_radio+0x304e/0x54e0 drivers/net/wireless/virtual/mac80211_hwsim.c:5558
       init_mac80211_hwsim+0x432/0x8c0 drivers/net/wireless/virtual/mac80211_hwsim.c:6910
       do_one_initcall+0x128/0x700 init/main.c:1257
       do_initcall_level init/main.c:1319 [inline]
       do_initcalls init/main.c:1335 [inline]
       do_basic_setup init/main.c:1354 [inline]
       kernel_init_freeable+0x5c7/0x900 init/main.c:1568
       kernel_init+0x1c/0x2b0 init/main.c:1457
       ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:148
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

-> #2 (rtnl_mutex){+.+.}-{4:4}:
       __mutex_lock_common kernel/locking/mutex.c:585 [inline]
       __mutex_lock+0x19b/0xb10 kernel/locking/mutex.c:730
       wg_pm_notification drivers/net/wireguard/device.c:80 [inline]
       wg_pm_notification+0x49/0x180 drivers/net/wireguard/device.c:64
       notifier_call_chain+0xb7/0x410 kernel/notifier.c:85
       notifier_call_chain_robust kernel/notifier.c:120 [inline]
       blocking_notifier_call_chain_robust kernel/notifier.c:345 [inline]
       blocking_notifier_call_chain_robust+0xc9/0x170 kernel/notifier.c:333
       pm_notifier_call_chain_robust+0x27/0x60 kernel/power/main.c:102
       snapshot_open+0x189/0x2b0 kernel/power/user.c:77
       misc_open+0x35a/0x420 drivers/char/misc.c:179
       chrdev_open+0x237/0x6a0 fs/char_dev.c:414
       do_dentry_open+0x735/0x1c40 fs/open.c:956
       vfs_open+0x82/0x3f0 fs/open.c:1086
       do_open fs/namei.c:3830 [inline]
       path_openat+0x1e88/0x2d80 fs/namei.c:3989
       do_filp_open+0x20c/0x470 fs/namei.c:4016
       do_sys_openat2+0x17a/0x1e0 fs/open.c:1428
       do_sys_open fs/open.c:1443 [inline]
       __do_sys_openat fs/open.c:1459 [inline]
       __se_sys_openat fs/open.c:1454 [inline]
       __x64_sys_openat+0x175/0x210 fs/open.c:1454
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #1 ((pm_chain_head).rwsem){++++}-{4:4}:
       down_read+0x9a/0x330 kernel/locking/rwsem.c:1524
       blocking_notifier_call_chain_robust kernel/notifier.c:344 [inline]
       blocking_notifier_call_chain_robust+0xa9/0x170 kernel/notifier.c:333
       pm_notifier_call_chain_robust+0x27/0x60 kernel/power/main.c:102
       snapshot_open+0x189/0x2b0 kernel/power/user.c:77
       misc_open+0x35a/0x420 drivers/char/misc.c:179
       chrdev_open+0x237/0x6a0 fs/char_dev.c:414
       do_dentry_open+0x735/0x1c40 fs/open.c:956
       vfs_open+0x82/0x3f0 fs/open.c:1086
       do_open fs/namei.c:3830 [inline]
       path_openat+0x1e88/0x2d80 fs/namei.c:3989
       do_filp_open+0x20c/0x470 fs/namei.c:4016
       do_sys_openat2+0x17a/0x1e0 fs/open.c:1428
       do_sys_open fs/open.c:1443 [inline]
       __do_sys_openat fs/open.c:1459 [inline]
       __se_sys_openat fs/open.c:1454 [inline]
       __x64_sys_openat+0x175/0x210 fs/open.c:1454
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #0 (system_transition_mutex){+.+.}-{4:4}:
       check_prev_add kernel/locking/lockdep.c:3163 [inline]
       check_prevs_add kernel/locking/lockdep.c:3282 [inline]
       validate_chain kernel/locking/lockdep.c:3906 [inline]
       __lock_acquire+0x249e/0x3c40 kernel/locking/lockdep.c:5228
       lock_acquire.part.0+0x11b/0x380 kernel/locking/lockdep.c:5851
       __mutex_lock_common kernel/locking/mutex.c:585 [inline]
       __mutex_lock+0x19b/0xb10 kernel/locking/mutex.c:730
       lock_system_sleep+0x87/0xa0 kernel/power/main.c:56
       hibernate_compressor_param_set+0x1c/0x210 kernel/power/hibernate.c:1452
       param_attr_store+0x18f/0x300 kernel/params.c:588
       module_attr_store+0x55/0x80 kernel/params.c:924
       sysfs_kf_write+0x117/0x170 fs/sysfs/file.c:139
       kernfs_fop_write_iter+0x33d/0x500 fs/kernfs/file.c:334
       new_sync_write fs/read_write.c:586 [inline]
       vfs_write+0x5ae/0x1150 fs/read_write.c:679
       ksys_write+0x12b/0x250 fs/read_write.c:731
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

other info that might help us debug this:

Chain exists of:
  system_transition_mutex --> rtnl_mutex --> param_lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(param_lock);
                               lock(rtnl_mutex);
                               lock(param_lock);
  lock(system_transition_mutex);

 *** DEADLOCK ***

Reported-by: syzbot+ace60642828c074eb913@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=ace60642828c074eb913
Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
Link: https://patch.msgid.link/20250224013139.3994500-1-lizhi.xu@windriver.com
[ rjw: New subject matching the code changes, changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-03-03 13:29:21 +01:00
Masami Hiramatsu (Google)
fd5ba38390 tracing: probe-events: Remove unused MAX_ARG_BUF_LEN macro
Commit 18b1e870a4 ("tracing/probes: Add $arg* meta argument for all
function args") introduced MAX_ARG_BUF_LEN but it is not used.
Remove it.

Link: https://lore.kernel.org/all/174055075876.4079315.8805416872155957588.stgit@mhiramat.tok.corp.google.com/

Fixes: 18b1e870a4 ("tracing/probes: Add $arg* meta argument for all function args")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-03 11:17:54 +09:00
Ingo Molnar
ef2f798600 Merge branch 'perf/urgent' into perf/core, to pick up dependent patches and fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-03-01 19:41:14 +01:00
Peter Zijlstra
003659fec9 perf/core: Fix perf_pmu_register() vs. perf_init_event()
There is a fairly obvious race between perf_init_event() doing
idr_find() and perf_pmu_register() doing idr_alloc() with an
incompletely initialized PMU pointer.

Avoid by doing idr_alloc() on a NULL pointer to register the id, and
swizzling the real struct pmu pointer at the end using idr_replace().

Also making sure to not set struct pmu members after publishing
the struct pmu, duh.

[ introduce idr_cmpxchg() in order to better handle the idr_replace()
  error case -- if it were to return an unexpected pointer, it will
  already have replaced the value and there is no going back. ]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20241104135517.858805880@infradead.org
2025-03-01 19:38:42 +01:00
Peter Zijlstra
2565e42539 perf/core: Fix pmus_lock vs. pmus_srcu ordering
Commit a63fbed776 ("perf/tracing/cpuhotplug: Fix locking order")
placed pmus_lock inside pmus_srcu, this makes perf_pmu_unregister()
trip lockdep.

Move the locking about such that only pmu_idr and pmus (list) are
modified while holding pmus_lock. This avoids doing synchronize_srcu()
while holding pmus_lock and all is well again.

Fixes: a63fbed776 ("perf/tracing/cpuhotplug: Fix locking order")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20241104135517.679556858@infradead.org
2025-03-01 19:38:42 +01:00
Linus Torvalds
209cd6f2ca Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
 "ARM:

   - Fix TCR_EL2 configuration to not use the ASID in TTBR1_EL2 and not
     mess-up T1SZ/PS by using the HCR_EL2.E2H==0 layout.

   - Bring back the VMID allocation to the vcpu_load phase, ensuring
     that we only setup VTTBR_EL2 once on VHE. This cures an ugly race
     that would lead to running with an unallocated VMID.

  RISC-V:

   - Fix hart status check in SBI HSM extension

   - Fix hart suspend_type usage in SBI HSM extension

   - Fix error returned by SBI IPI and TIME extensions for unsupported
     function IDs

   - Fix suspend_type usage in SBI SUSP extension

   - Remove unnecessary vcpu kick after injecting interrupt via IMSIC
     guest file

  x86:

   - Fix an nVMX bug where KVM fails to detect that, after nested
     VM-Exit, L1 has a pending IRQ (or NMI).

   - To avoid freeing the PIC while vCPUs are still around, which would
     cause a NULL pointer access with the previous patch, destroy vCPUs
     before any VM-level destruction.

   - Handle failures to create vhost_tasks"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: retry nx_huge_page_recovery_thread creation
  vhost: return task creation error instead of NULL
  KVM: nVMX: Process events on nested VM-Exit if injectable IRQ or NMI is pending
  KVM: x86: Free vCPUs before freeing VM state
  riscv: KVM: Remove unnecessary vcpu kick
  KVM: arm64: Ensure a VMID is allocated before programming VTTBR_EL2
  KVM: arm64: Fix tcr_el2 initialisation in hVHE mode
  riscv: KVM: Fix SBI sleep_type use
  riscv: KVM: Fix SBI TIME error generation
  riscv: KVM: Fix SBI IPI error generation
  riscv: KVM: Fix hart suspend_type use
  riscv: KVM: Fix hart suspend status check
2025-03-01 08:48:53 -08:00
Eric Sandeen
f13abc1e8e watch_queue: fix pipe accounting mismatch
Currently, watch_queue_set_size() modifies the pipe buffers charged to
user->pipe_bufs without updating the pipe->nr_accounted on the pipe
itself, due to the if (!pipe_has_watch_queue()) test in
pipe_resize_ring(). This means that when the pipe is ultimately freed,
we decrement user->pipe_bufs by something other than what than we had
charged to it, potentially leading to an underflow. This in turn can
cause subsequent too_many_pipe_buffers_soft() tests to fail with -EPERM.

To remedy this, explicitly account for the pipe usage in
watch_queue_set_size() to match the number set via account_pipe_buffers()

(It's unclear why watch_queue_set_size() does not update nr_accounted;
it may be due to intentional overprovisioning in watch_queue_set_size()?)

Fixes: e95aada4cb ("pipe: wakeup wr_wait after setting max_usage")
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Link: https://lore.kernel.org/r/206682a8-0604-49e5-8224-fdbe0c12b460@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-01 11:02:35 +01:00
Keith Busch
cb380909ae vhost: return task creation error instead of NULL
Lets callers distinguish why the vhost task creation failed. No one
currently cares why it failed, so no real runtime change from this
patch, but that will not be the case for long.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Message-ID: <20250227230631.303431-2-kbusch@meta.com>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-03-01 02:52:52 -05:00
Linus Torvalds
d203484f25 Merge tag 'sched-urgent-2025-02-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
 "Prevent cond_resched() based preemption when interrupts are disabled,
  on PREEMPT_NONE and PREEMPT_VOLUNTARY kernels"

* tag 'sched-urgent-2025-02-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/core: Prevent rescheduling when interrupts are disabled
2025-02-28 17:00:16 -08:00
Linus Torvalds
766331f286 Merge tag 'perf-urgent-2025-02-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf event fixes from Ingo Molnar:
 "Miscellaneous perf events fixes and a minor HW enablement change:

   - Fix missing RCU protection in perf_iterate_ctx()

   - Fix pmu_ctx_list ordering bug

   - Reject the zero page in uprobes

   - Fix a family of bugs related to low frequency sampling

   - Add Intel Arrow Lake U CPUs to the generic Arrow Lake RAPL support
     table

   - Fix a lockdep-assert false positive in uretprobes"

* tag 'perf-urgent-2025-02-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  uprobes: Remove too strict lockdep_assert() condition in hprobe_expire()
  perf/x86/rapl: Add support for Intel Arrow Lake U
  perf/x86/intel: Use better start period for frequency mode
  perf/core: Fix low freq setting via IOC_PERIOD
  perf/x86: Fix low freqency setting issue
  uprobes: Reject the shared zeropage in uprobe_write_opcode()
  perf/core: Order the PMU list to fix warning about unordered pmu_ctx_list
  perf/core: Add RCU read lock protection to perf_iterate_ctx()
2025-02-28 16:52:10 -08:00
Linus Torvalds
5c44ddaf7d Merge tag 'trace-v6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:

 - Fix crash from bad histogram entry

   An error path in the histogram creation could leave an entry in a
   link list that gets freed. Then when a new entry is added it can
   cause a u-a-f bug. This is fixed by restructuring the code so that
   the histogram is consistent on failure and everything is cleaned up
   appropriately.

 - Fix fprobe self test

   The fprobe self test relies on no function being attached by ftrace.
   BPF programs can attach to functions via ftrace and systemd now does
   so. This causes those functions to appear in the enabled_functions
   list which holds all functions attached by ftrace. The selftest also
   uses that file to see if functions are being connected correctly. It
   counts the functions in the file, but if there's already functions in
   the file, it fails. Instead, add the number of functions in the file
   at the start of the test to all the calculations during the test.

 - Fix potential division by zero of the function profiler stddev

   The calculated divisor that calculates the standard deviation of the
   function times can overflow. If the overflow happens to land on zero,
   that can cause a division by zero. Check for zero from the
   calculation before doing the division.

   TODO: Catch when it ever overflows and report it accordingly. For
   now, just prevent the system from crashing.

* tag 'trace-v6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ftrace: Avoid potential division by zero in function_stat_show()
  selftests/ftrace: Let fprobe test consider already enabled functions
  tracing: Fix bad hist from corrupting named_triggers list
2025-02-28 15:43:32 -08:00
Joerg Roedel
59b6c3469e Merge tag 'for-joerg' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd into core
iommu shared branch with iommufd

The three dependent series on a shared branch:

- Change the iommufd fault handle into an always present hwpt handle in
  the domain

- Give iommufd its own SW_MSI implementation along with some IRQ layer
  rework

- Improvements to the handle attach API
2025-02-28 16:35:19 +01:00
Nikolay Kuratov
a1a7eb89ca ftrace: Avoid potential division by zero in function_stat_show()
Check whether denominator expression x * (x - 1) * 1000 mod {2^32, 2^64}
produce zero and skip stddev computation in that case.

For now don't care about rec->counter * rec->counter overflow because
rec->time * rec->time overflow will likely happen earlier.

Cc: stable@vger.kernel.org
Cc: Wen Yang <wenyang@linux.alibaba.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250206090156.1561783-1-kniv@yandex-team.ru
Fixes: e31f7939c1 ("ftrace: Avoid potential division by zero in function profiler")
Signed-off-by: Nikolay Kuratov <kniv@yandex-team.ru>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-02-27 21:02:10 -05:00
Steven Rostedt
6f86bdeab6 tracing: Fix bad hist from corrupting named_triggers list
The following commands causes a crash:

 ~# cd /sys/kernel/tracing/events/rcu/rcu_callback
 ~# echo 'hist:name=bad:keys=common_pid:onmax(bogus).save(common_pid)' > trigger
 bash: echo: write error: Invalid argument
 ~# echo 'hist:name=bad:keys=common_pid' > trigger

Because the following occurs:

event_trigger_write() {
  trigger_process_regex() {
    event_hist_trigger_parse() {

      data = event_trigger_alloc(..);

      event_trigger_register(.., data) {
        cmd_ops->reg(.., data, ..) [hist_register_trigger()] {
          data->ops->init() [event_hist_trigger_init()] {
            save_named_trigger(name, data) {
              list_add(&data->named_list, &named_triggers);
            }
          }
        }
      }

      ret = create_actions(); (return -EINVAL)
      if (ret)
        goto out_unreg;
[..]
      ret = hist_trigger_enable(data, ...) {
        list_add_tail_rcu(&data->list, &file->triggers); <<<---- SKIPPED!!! (this is important!)
[..]
 out_unreg:
      event_hist_unregister(.., data) {
        cmd_ops->unreg(.., data, ..) [hist_unregister_trigger()] {
          list_for_each_entry(iter, &file->triggers, list) {
            if (!hist_trigger_match(data, iter, named_data, false))   <- never matches
                continue;
            [..]
            test = iter;
          }
          if (test && test->ops->free) <<<-- test is NULL

            test->ops->free(test) [event_hist_trigger_free()] {
              [..]
              if (data->name)
                del_named_trigger(data) {
                  list_del(&data->named_list);  <<<<-- NEVER gets removed!
                }
              }
           }
         }

         [..]
         kfree(data); <<<-- frees item but it is still on list

The next time a hist with name is registered, it causes an u-a-f bug and
the kernel can crash.

Move the code around such that if event_trigger_register() succeeds, the
next thing called is hist_trigger_enable() which adds it to the list.

A bunch of actions is called if get_named_trigger_data() returns false.
But that doesn't need to be called after event_trigger_register(), so it
can be moved up, allowing event_trigger_register() to be called just
before hist_trigger_enable() keeping them together and allowing the
file->triggers to be properly populated.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250227163944.1c37f85f@gandalf.local.home
Fixes: 067fe038e7 ("tracing: Add variable reference handling to hist triggers")
Reported-by: Tomas Glozar <tglozar@redhat.com>
Tested-by: Tomas Glozar <tglozar@redhat.com>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Closes: https://lore.kernel.org/all/CAP4=nvTsxjckSBTz=Oe_UYh8keD9_sZC4i++4h72mJLic4_W4A@mail.gmail.com/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-02-27 21:01:34 -05:00
Thomas Gleixner
82c387ef75 sched/core: Prevent rescheduling when interrupts are disabled
David reported a warning observed while loop testing kexec jump:

  Interrupts enabled after irqrouter_resume+0x0/0x50
  WARNING: CPU: 0 PID: 560 at drivers/base/syscore.c:103 syscore_resume+0x18a/0x220
   kernel_kexec+0xf6/0x180
   __do_sys_reboot+0x206/0x250
   do_syscall_64+0x95/0x180

The corresponding interrupt flag trace:

  hardirqs last  enabled at (15573): [<ffffffffa8281b8e>] __up_console_sem+0x7e/0x90
  hardirqs last disabled at (15580): [<ffffffffa8281b73>] __up_console_sem+0x63/0x90

That means __up_console_sem() was invoked with interrupts enabled. Further
instrumentation revealed that in the interrupt disabled section of kexec
jump one of the syscore_suspend() callbacks woke up a task, which set the
NEED_RESCHED flag. A later callback in the resume path invoked
cond_resched() which in turn led to the invocation of the scheduler:

  __cond_resched+0x21/0x60
  down_timeout+0x18/0x60
  acpi_os_wait_semaphore+0x4c/0x80
  acpi_ut_acquire_mutex+0x3d/0x100
  acpi_ns_get_node+0x27/0x60
  acpi_ns_evaluate+0x1cb/0x2d0
  acpi_rs_set_srs_method_data+0x156/0x190
  acpi_pci_link_set+0x11c/0x290
  irqrouter_resume+0x54/0x60
  syscore_resume+0x6a/0x200
  kernel_kexec+0x145/0x1c0
  __do_sys_reboot+0xeb/0x240
  do_syscall_64+0x95/0x180

This is a long standing problem, which probably got more visible with
the recent printk changes. Something does a task wakeup and the
scheduler sets the NEED_RESCHED flag. cond_resched() sees it set and
invokes schedule() from a completely bogus context. The scheduler
enables interrupts after context switching, which causes the above
warning at the end.

Quite some of the code paths in syscore_suspend()/resume() can result in
triggering a wakeup with the exactly same consequences. They might not
have done so yet, but as they share a lot of code with normal operations
it's just a question of time.

The problem only affects the PREEMPT_NONE and PREEMPT_VOLUNTARY scheduling
models. Full preemption is not affected as cond_resched() is disabled and
the preemption check preemptible() takes the interrupt disabled flag into
account.

Cure the problem by adding a corresponding check into cond_resched().

Reported-by: David Woodhouse <dwmw@amazon.co.uk>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org
Closes: https://lore.kernel.org/all/7717fe2ac0ce5f0a2c43fdab8b11f4483d54a2a4.camel@infradead.org
2025-02-27 21:13:57 +01:00
Brian Gerst
18cdd90aba x86/bpf: Fix BPF percpu accesses
Due to this recent commit in the x86 tree:

  9d7de2aa8b ("Use relative percpu offsets")

percpu addresses went from positive offsets from the GSBASE to negative
kernel virtual addresses.  The BPF verifier has an optimization for
x86-64 that loads the address of cpu_number into a register, but was only
doing a 32-bit load which truncates negative addresses.

Change it to a 64-bit load so that the address is properly sign-extended.

Fixes: 9d7de2aa8b ("Use relative percpu offsets")
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250227195302.1667654-1-brgerst@gmail.com
2025-02-27 21:10:03 +01:00
NeilBrown
88d5baf690 Change inode_operations.mkdir to return struct dentry *
Some filesystems, such as NFS, cifs, ceph, and fuse, do not have
complete control of sequencing on the actual filesystem (e.g.  on a
different server) and may find that the inode created for a mkdir
request already exists in the icache and dcache by the time the mkdir
request returns.  For example, if the filesystem is mounted twice the
directory could be visible on the other mount before it is on the
original mount, and a pair of name_to_handle_at(), open_by_handle_at()
calls could instantiate the directory inode with an IS_ROOT() dentry
before the first mkdir returns.

This means that the dentry passed to ->mkdir() may not be the one that
is associated with the inode after the ->mkdir() completes.  Some
callers need to interact with the inode after the ->mkdir completes and
they currently need to perform a lookup in the (rare) case that the
dentry is no longer hashed.

This lookup-after-mkdir requires that the directory remains locked to
avoid races.  Planned future patches to lock the dentry rather than the
directory will mean that this lookup cannot be performed atomically with
the mkdir.

To remove this barrier, this patch changes ->mkdir to return the
resulting dentry if it is different from the one passed in.
Possible returns are:
  NULL - the directory was created and no other dentry was used
  ERR_PTR() - an error occurred
  non-NULL - this other dentry was spliced in

This patch only changes file-systems to return "ERR_PTR(err)" instead of
"err" or equivalent transformations.  Subsequent patches will make
further changes to some file-systems to return a correct dentry.

Not all filesystems reliably result in a positive hashed dentry:

- NFS, cifs, hostfs will sometimes need to perform a lookup of
  the name to get inode information.  Races could result in this
  returning something different. Note that this lookup is
  non-atomic which is what we are trying to avoid.  Placing the
  lookup in filesystem code means it only happens when the filesystem
  has no other option.
- kernfs and tracefs leave the dentry negative and the ->revalidate
  operation ensures that lookup will be called to correctly populate
  the dentry.  This could be fixed but I don't think it is important
  to any of the users of vfs_mkdir() which look at the dentry.

The recommendation to use
    d_drop();d_splice_alias()
is ugly but fits with current practice.  A planned future patch will
change this.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: NeilBrown <neilb@suse.de>
Link: https://lore.kernel.org/r/20250227013949.536172-2-neilb@suse.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-27 20:00:17 +01:00
Jakub Kicinski
357660d759 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.14-rc5).

Conflicts:

drivers/net/ethernet/cadence/macb_main.c
  fa52f15c74 ("net: cadence: macb: Synchronize stats calculations")
  75696dd0fd ("net: cadence: macb: Convert to get_stats64")
https://lore.kernel.org/20250224125848.68ee63e5@canb.auug.org.au

Adjacent changes:

drivers/net/ethernet/intel/ice/ice_sriov.c
  79990cf5e7 ("ice: Fix deinitializing VF in error path")
  a203163274 ("ice: simplify VF MSI-X managing")

net/ipv4/tcp.c
  18912c5206 ("tcp: devmem: don't write truncated dmabuf CMSGs to userspace")
  297d389e9e ("net: prefix devmem specific helpers")

net/mptcp/subflow.c
  8668860b0a ("mptcp: reset when MPTCP opts are dropped after join")
  c3349a22c2 ("mptcp: consolidate subflow cleanup")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-27 10:20:58 -08:00
Alexei Starovoitov
c9eb8102e2 bpf: Use try_alloc_pages() to allocate pages for bpf needs.
Use try_alloc_pages() and free_pages_nolock() for BPF needs
when context doesn't allow using normal alloc_pages.
This is a prerequisite for further work.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20250222024427.30294-7-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-27 09:39:44 -08:00
Abel Wu
c4af66a95a cgroup/rstat: Fix forceidle time in cpu.stat
The commit b824766504 ("cgroup/rstat: add force idle show helper")
retrieves forceidle_time outside cgroup_rstat_lock for non-root cgroups
which can be potentially inconsistent with other stats.

Rather than reverting that commit, fix it in a way that retains the
effort of cleaning up the ifdef-messes.

Fixes: b824766504 ("cgroup/rstat: add force idle show helper")
Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-02-27 07:19:17 -10:00
Alexander Lobakin
ed16b8a4d1 bpf: cpumap: switch to napi_skb_cache_get_bulk()
Now that cpumap uses GRO, which drops unused skb heads to the NAPI
cache, use napi_skb_cache_get_bulk() to try to reuse cached entries
and lower MM layer pressure. Always disable the BH before checking and
running the cpumap-pinned XDP prog and don't re-enable it in between
that and allocating an skb bulk, as we can access the NAPI caches only
from the BH context.
The better GRO aggregates packets, the less new skbs will be allocated.
If an aggregated skb contains 16 frags, this means 15 skbs were returned
to the cache, so next 15 skbs will be built without allocating anything.

The same trafficgen UDP GRO test now shows:

                GRO off   GRO on
threaded GRO    2.3       4         Mpps
thr bulk GRO    2.4       4.7       Mpps

diff            +4        +17       %

Comparing to the baseline cpumap:

baseline        2.7       N/A       Mpps
thr bulk GRO    2.4       4.7       Mpps
diff            -11       +74       %

Tested-by: Daniel Xu <dxu@dxuuu.xyz>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-27 14:03:39 +01:00
Alexander Lobakin
57efe762cd bpf: cpumap: reuse skb array instead of a linked list to chain skbs
cpumap still uses linked lists to store a list of skbs to pass to the
stack. Now that we don't use listified Rx in favor of
napi_gro_receive(), linked list is now an unneeded overhead.
Inside the polling loop, we already have an array of skbs. Let's reuse
it for skbs passed to cpumap (generic XDP) and keep there in case of
XDP_PASS when a program is installed to the map itself. Don't list
regular xdp_frames after converting them to skbs as well; store them
in the mentioned array (but *before* generic skbs as the latters have
lower priority) and call gro_receive_skb() for each array element after
they're done.

Tested-by: Daniel Xu <dxu@dxuuu.xyz>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-27 14:03:14 +01:00
Alexander Lobakin
4f8ab26a03 bpf: cpumap: switch to GRO from netif_receive_skb_list()
cpumap has its own BH context based on kthread. It has a sane batch
size of 8 frames per one cycle.
GRO can be used here on its own. Adjust cpumap calls to the upper stack
to use GRO API instead of netif_receive_skb_list() which processes skbs
by batches, but doesn't involve GRO layer at all.
In plenty of tests, GRO performs better than listed receiving even
given that it has to calculate full frame checksums on the CPU.
As GRO passes the skbs to the upper stack in the batches of
@gro_normal_batch, i.e. 8 by default, and skb->dev points to the
device where the frame comes from, it is enough to disable GRO
netdev feature on it to completely restore the original behaviour:
untouched frames will be being bulked and passed to the upper stack
by 8, as it was with netif_receive_skb_list().

Tested-by: Daniel Xu <dxu@dxuuu.xyz>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-27 14:03:14 +01:00
Christian Brauner
71628584df Merge patch series "prep patches for my mkdir series"
NeilBrown <neilb@suse.de> says:

These two patches are cleanup are dependencies for my mkdir changes and
subsequence directory locking changes.

* patches from https://lore.kernel.org/r/20250226062135.2043651-1-neilb@suse.de: (2 commits)
  nfsd: drop fh_update() from S_IFDIR branch of nfsd_create_locked()
  nfs/vfs: discard d_exact_alias()

Link: https://lore.kernel.org/r/20250226062135.2043651-1-neilb@suse.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-27 09:25:34 +01:00
Tomas Glozar
a065bbf776 trace/osnoise: Add trace events for samples
Add trace events that fire at osnoise and timerlat sample generation, in
addition to the already existing noise and threshold events.

This allows processing the samples directly in the kernel, either with
ftrace triggers or with BPF.

Cc: John Kacur <jkacur@redhat.com>
Cc: Luis Goncalves <lgoncalv@redhat.com>
Link: https://lore.kernel.org/20250203090418.1458923-1-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Tested-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-02-26 19:44:30 -05:00
Masami Hiramatsu (Google)
db5e228611 tracing: fprobe-events: Log error for exceeding the number of entry args
Add error message when the number of entry argument exceeds the
maximum size of entry data.
This is currently checked when registering fprobe, but in this case
no error message is shown in the error_log file.

Link: https://lore.kernel.org/all/174055074269.4079315.17809232650360988538.stgit@mhiramat.tok.corp.google.com/

Fixes: 25f00e40ce ("tracing/probes: Support $argN in return probe (kprobe and fprobe)")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-02-27 09:11:51 +09:00
Masami Hiramatsu (Google)
d0453655b6 tracing: tprobe-events: Reject invalid tracepoint name
Commit 57a7e6de9e ("tracing/fprobe: Support raw tracepoints on
future loaded modules") allows user to set a tprobe on non-exist
tracepoint but it does not check the tracepoint name is acceptable.
So it leads tprobe has a wrong character for events (e.g. with
subsystem prefix). In this case, the event is not shown in the
events directory.

Reject such invalid tracepoint name.

The tracepoint name must consist of alphabet or digit or '_'.

Link: https://lore.kernel.org/all/174055073461.4079315.15875502830565214255.stgit@mhiramat.tok.corp.google.com/

Fixes: 57a7e6de9e ("tracing/fprobe: Support raw tracepoints on future loaded modules")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: stable@vger.kernel.org
2025-02-27 09:10:58 +09:00
Masami Hiramatsu (Google)
ac965d7d88 tracing: tprobe-events: Fix a memory leak when tprobe with $retval
Fix a memory leak when a tprobe is defined with $retval. This
combination is not allowed, but the parse_symbol_and_return() does
not free the *symbol which should not be used if it returns the error.
Thus, it leaks the *symbol memory in that error path.

Link: https://lore.kernel.org/all/174055072650.4079315.3063014346697447838.stgit@mhiramat.tok.corp.google.com/

Fixes: ce51e6153f ("tracing: fprobe-event: Fix to check tracepoint event and return")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: stable@vger.kernel.org
2025-02-27 09:10:21 +09:00
Linus Torvalds
f4ce1f3318 Merge tag 'wq-for-6.14-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue update from Tejun Heo:
 "This contains a patch improve debug visibility.

  While it isn't a fix, the change carries virtually no risk and makes
  it substantially easier to chase down a class of problems"

* tag 'wq-for-6.14-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Log additional details when rejecting work
2025-02-26 14:22:47 -08:00
Linus Torvalds
e6d3c4e535 Merge tag 'sched_ext-for-6.14-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fix from Tejun Heo:
 "pick_task_scx() has a workaround to avoid stalling when the fair
  class's balance() says yes but pick_task() says no.

  The workaround was incorrectly deciding to keep the prev taks running
  if the task is on SCX even when the task is in a sleeping state, which
  can lead to several confusing failure modes.

  Fix it by testing the prev task is currently queued on SCX instead"

* tag 'sched_ext-for-6.14-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext: Fix pick_task_scx() picking non-queued tasks when it's called without balance()
2025-02-26 14:13:11 -08:00
Luo Gengkun
9ec84f79c5 perf: Remove unnecessary parameter of security check
It seems that the attr parameter was never been used in security
checks since it was first introduced by:

commit da97e18458 ("perf_event: Add support for LSM and SELinux checks")

so remove it.

Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-02-26 14:13:58 -05:00
Alexei Starovoitov
4580f4e0eb bpf: Fix deadlock between rcu_tasks_trace and event_mutex.
Fix the following deadlock:
CPU A
_free_event()
  perf_kprobe_destroy()
    mutex_lock(&event_mutex)
      perf_trace_event_unreg()
        synchronize_rcu_tasks_trace()

There are several paths where _free_event() grabs event_mutex
and calls sync_rcu_tasks_trace. Above is one such case.

CPU B
bpf_prog_test_run_syscall()
  rcu_read_lock_trace()
    bpf_prog_run_pin_on_cpu()
      bpf_prog_load()
        bpf_tracing_func_proto()
          trace_set_clr_event()
            mutex_lock(&event_mutex)

Delegate trace_set_clr_event() to workqueue to avoid
such lock dependency.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250224221637.4780-1-alexei.starovoitov@gmail.com
2025-02-26 08:48:40 -08:00
Thomas Weißschuh
7a6b158e00 posix-clock: Remove duplicate compat ioctl() handler
The normal and compat ioctl handlers are identical,
which is fine as compat ioctls are detected and handled dynamically
inside the underlying clock implementation.
The duplicate definition however is unnecessary.

Just reuse the regular ioctl handler also for compat ioctls.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Link: https://lore.kernel.org/all/20250225-posix-clock-compat-cleanup-v2-1-30de86457a2b@weissschuh.net
2025-02-26 16:53:58 +01:00
Michael Ellerman
333e8eb3e0 genirq: Remove IRQ_EDGE_EOI_HANDLER
The powerpc Cell blade support, now removed, was the only user of
IRQ_EDGE_EOI_HANDLER, so remove it.

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20241218105523.416573-21-mpe@ellerman.id.au
2025-02-26 21:15:09 +05:30
Christophe Leroy
d856bc3ac7 static_call_inline: Provide trampoline address when updating sites
In preparation of support of inline static calls on powerpc, provide
trampoline address when updating sites, so that when the destination
function is too far for a direct function call, the call site is
patched with a call to the trampoline.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/5efe0cffc38d6f69b1ec13988a99f1acff551abf.1733245362.git.christophe.leroy@csgroup.eu
2025-02-26 21:09:43 +05:30