The non-pi5 variant of tc358743 needs to update the compatible of
csi0 rather than csi1 if the cam0 override is used, otherwise it
gets loaded in Media Controller mode.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
commit fdc93beead upstream.
[Why & How]
This fixes the black screen issue on certain APUs with HDMI,
accompanied by the following messages:
amdgpu 0000:c4:00.0: amdgpu: [drm] Failed to setup vendor info
frame on connector DP-1: -22
amdgpu 0000:c4:00.0: [drm] Cannot find any crtc or sizes [drm]
Cannot find any crtc or sizes
Fixes: 489f0f600c ("drm/amd/display: Fix DVI-D/HDMI adapters")
Suggested-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 678c901443a6d2e909e3b51331a20f9d8f84ce82)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 38ab33dbea upstream.
Align the function headers for `amdgpu_max_hdmi_pixel_clock` and
`amdgpu_connector_dvi_mode_valid` with the function implementations so
they match the expected kdoc style.
Fixes the below:
drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c:1199: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* Returns the maximum supported HDMI (TMDS) pixel clock in KHz.
drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c:1212: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* Validates the given display mode on DVI and HDMI connectors.
Fixes: 585b2f685c ("drm/amdgpu: Respect max pixel clock for HDMI and DVI-D (v2)")
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 543d350040 upstream.
Commit 4d330fe541 ("ACPI: SPCR: Support Precise Baud Rate field")
added support to use the precise baud rate available since SPCR 1.09
(revision 4) but failed to check the version of the table provided by
the firmware.
Accessing an older version of SPCR table causes accesses beyond the
end of the table and can lead to garbage data to be used for the baud
rate.
Check the version of the firmware provided SPCR to ensure that the
precise baudrate is vaild before using it.
Fixes: 4d330fe541 ("ACPI: SPCR: Support Precise Baud Rate field")
Signed-off-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com>
Link: https://patch.msgid.link/20251024123125.1081612-1-punit.agrawal@oss.qualcomm.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d34caa89a1 upstream.
ufshcd_link_startup() has a facility (link_startup_again) to issue
DME_LINKSTARTUP a 2nd time even though the 1st time was successful.
Some older hardware benefits from that, however the behaviour is
non-standard, and has been found to cause link startup to be unreliable
for some Intel Alder Lake based host controllers.
Add UFSHCD_QUIRK_PERFORM_LINK_STARTUP_ONCE to suppress
link_startup_again, in preparation for setting the quirk for affected
controllers.
Fixes: 7dc9fb47bc ("scsi: ufs: ufs-pci: Add support for Intel ADL")
Cc: stable@vger.kernel.org
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251024085918.31825-3-adrian.hunter@intel.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bb44826c3b upstream.
Intel platforms with UFS, can support Suspend-to-Idle (S0ix) and
Suspend-to-RAM (S3). For S0ix the link state should be HIBERNATE. For
S3, state is lost, so the link state must be OFF. Driver policy,
expressed by spm_lvl, can be 3 (link HIBERNATE, device SLEEP) for S0ix
but must be changed to 5 (link OFF, device POWEROFF) for S3.
Fix support for S0ix/S3 by switching spm_lvl as needed. During suspend
->prepare(), if the suspend target state is not Suspend-to-Idle, ensure
the spm_lvl is at least 5 to ensure that resume will be possible from
deep sleep states. During suspend ->complete(), restore the spm_lvl to
its original value that is suitable for S0ix.
This fix is first needed in Intel Alder Lake based controllers.
Fixes: 7dc9fb47bc ("scsi: ufs: ufs-pci: Add support for Intel ADL")
Cc: stable@vger.kernel.org
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251024085918.31825-2-adrian.hunter@intel.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0c71670396 upstream.
Since commit 4959aebba8 ("virtio-net: use mtu size as buffer length
for big packets"), when guest gso is off, the allocated size for big
packets is not MAX_SKB_FRAGS * PAGE_SIZE anymore but depends on
negotiated MTU. The number of allocated frags for big packets is stored
in vi->big_packets_num_skbfrags.
Because the host announced buffer length can be malicious (e.g. the host
vhost_net driver's get_rx_bufs is modified to announce incorrect
length), we need a check in virtio_net receive path. Currently, the
check is not adapted to the new change which can lead to NULL page
pointer dereference in the below while loop when receiving length that
is larger than the allocated one.
This commit fixes the received length check corresponding to the new
change.
Fixes: 4959aebba8 ("virtio-net: use mtu size as buffer length for big packets")
Cc: stable@vger.kernel.org
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Link: https://patch.msgid.link/20251030144438.7582-1-minhquangbui99@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 734e99623c upstream.
find_or_create_cached_dir() could grab a new reference after kref_put()
had seen the refcount drop to zero but before cfid_list_lock is acquired
in smb2_close_cached_fid(), leading to use-after-free.
Switch to kref_put_lock() so cfid_release() is called with
cfid_list_lock held, closing that gap.
Fixes: ebe98f1447 ("cifs: enable caching of directories for which a lease is held")
Cc: stable@vger.kernel.org
Reported-by: Jay Shin <jaeshin@redhat.com>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4012abe8a7 upstream.
SMB2_change_notify called smb2_validate_iov() but ignored the return
code, then kmemdup()ed using server provided OutputBufferOffset/Length.
Check the return of smb2_validate_iov() and bail out on error.
Discovered with help from the ZeroPath security tooling.
Signed-off-by: Joshua Rogers <linux@joshua.hu>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Cc: stable@vger.kernel.org
Fixes: e3e9463414 ("smb3: improve SMB3 change notification support")
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fd9f30d103 upstream.
Guenter Roeck reported this kernel crash on his emulated B160L machine:
Starting network: udhcpc: started, v1.36.1
Backtrace:
[<104320d4>] unwind_once+0x1c/0x5c
[<10434a00>] walk_stackframe.isra.0+0x74/0xb8
[<10434a6c>] arch_stack_walk+0x28/0x38
[<104e5efc>] stack_trace_save+0x48/0x5c
[<105d1bdc>] set_track_prepare+0x44/0x6c
[<105d9c80>] ___slab_alloc+0xfc4/0x1024
[<105d9d38>] __slab_alloc.isra.0+0x58/0x90
[<105dc80c>] kmem_cache_alloc_noprof+0x2ac/0x4a0
[<105b8e54>] __anon_vma_prepare+0x60/0x280
[<105a823c>] __vmf_anon_prepare+0x68/0x94
[<105a8b34>] do_wp_page+0x8cc/0xf10
[<105aad88>] handle_mm_fault+0x6c0/0xf08
[<10425568>] do_page_fault+0x110/0x440
[<10427938>] handle_interruption+0x184/0x748
[<11178398>] schedule+0x4c/0x190
BUG: spinlock recursion on CPU#0, ifconfig/2420
lock: terminate_lock.2+0x0/0x1c, .magic: dead4ead, .owner: ifconfig/2420, .owner_cpu: 0
While creating the stack trace, the unwinder uses the stack pointer to guess
the previous frame to read the previous stack pointer from memory. The crash
happens, because the unwinder tries to read from unaligned memory and as such
triggers the unalignment trap handler which then leads to the spinlock
recursion and finally to a deadlock.
Fix it by checking the alignment before accessing the memory.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org # v6.12+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8d59fba493 upstream.
In the parse_adv_monitor_pattern() function, the value of
the 'length' variable is currently limited to HCI_MAX_EXT_AD_LENGTH(251).
The size of the 'value' array in the mgmt_adv_pattern structure is 31.
If the value of 'pattern[i].length' is set in the user space
and exceeds 31, the 'patterns[i].value' array can be accessed
out of bound when copied.
Increasing the size of the 'value' array in
the 'mgmt_adv_pattern' structure will break the userspace.
Considering this, and to avoid OOB access revert the limits for 'offset'
and 'length' back to the value of HCI_MAX_AD_LENGTH.
Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.
Fixes: db08722fc7 ("Bluetooth: hci_core: Fix missing instances using HCI_MAX_AD_LENGTH")
Cc: stable@vger.kernel.org
Signed-off-by: Ilia Gavrilov <Ilia.Gavrilov@infotecs.ru>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 487df8b698 upstream.
The Mesa issue referenced below pointed out a possible deadlock:
[ 1231.611031] Possible interrupt unsafe locking scenario:
[ 1231.611033] CPU0 CPU1
[ 1231.611034] ---- ----
[ 1231.611035] lock(&xa->xa_lock#17);
[ 1231.611038] local_irq_disable();
[ 1231.611039] lock(&fence->lock);
[ 1231.611041] lock(&xa->xa_lock#17);
[ 1231.611044] <Interrupt>
[ 1231.611045] lock(&fence->lock);
[ 1231.611047]
*** DEADLOCK ***
In this example, CPU0 would be any function accessing job->dependencies
through the xa_* functions that don't disable interrupts (eg:
drm_sched_job_add_dependency(), drm_sched_entity_kill_jobs_cb()).
CPU1 is executing drm_sched_entity_kill_jobs_cb() as a fence signalling
callback so in an interrupt context. It will deadlock when trying to
grab the xa_lock which is already held by CPU0.
Replacing all xa_* usage by their xa_*_irq counterparts would fix
this issue, but Christian pointed out another issue: dma_fence_signal
takes fence.lock and so does dma_fence_add_callback.
dma_fence_signal() // locks f1.lock
-> drm_sched_entity_kill_jobs_cb()
-> foreach dependencies
-> dma_fence_add_callback() // locks f2.lock
This will deadlock if f1 and f2 share the same spinlock.
To fix both issues, the code iterating on dependencies and re-arming them
is moved out to drm_sched_entity_kill_jobs_work().
Cc: stable@vger.kernel.org # v6.2+
Fixes: 2fdb8a8f07 ("drm/scheduler: rework entity flush, kill and fini")
Link: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13908
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
[phasta: commit message nits]
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Link: https://patch.msgid.link/20251104095358.15092-1-pierre-eric.pelloux-prayer@amd.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3c6a743c69 upstream.
[Why]
drm_dp_mst_topology_queue_probe() is used under the assumption that
mst is already initialized. If we connect system with SST first
then switch to the mst branch during suspend, we will fail probing
topology by calling the wrong API since the mst manager is yet to
be initialized.
[How]
At dm_resume(), once it's detected as mst branc connected, check if
the mst is initialized already. If not, call
dm_helpers_dp_mst_start_top_mgr() instead to initialize mst
V2: Adjust the commit msg a bit
Fixes: bc068194f5 ("drm/amd/display: Don't write DP_MSTM_CTRL after LT")
Cc: Fangzhi Zuo <jerry.zuo@amd.com>
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 62320fb8d91a0bddc44a228203cfa9bfbb5395bd)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 80f0d631dc ]
The function create_field_var() allocates memory for 'val' through
create_hist_field() inside parse_atom(), and for 'var' through
create_var(), which in turn allocates var->type and var->var.name
internally. Simply calling kfree() to release these structures will
result in memory leaks.
Use destroy_hist_field() to properly free 'val', and explicitly release
the memory of var->type and var->var.name before freeing 'var' itself.
Link: https://patch.msgid.link/20251106120132.3639920-1-zilin@seu.edu.cn
Fixes: 02205a6752 ("tracing: Add support for 'field variables'")
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8dca36978a ]
syzbot reported[1] a use-after-free when deleting an expired fdb. It is
due to a race condition between learning still happening and a port being
deleted, after all its fdbs have been flushed. The port's state has been
toggled to disabled so no learning should happen at that time, but if we
have MST enabled, it will bypass the port's state, that together with VLAN
filtering disabled can lead to fdb learning at a time when it shouldn't
happen while the port is being deleted. VLAN filtering must be disabled
because we flush the port VLANs when it's being deleted which will stop
learning. This fix adds a check for the port's vlan group which is
initialized to NULL when the port is getting deleted, that avoids the port
state bypass. When MST is enabled there would be a minimal new overhead
in the fast-path because the port's vlan group pointer is cache-hot.
[1] https://syzkaller.appspot.com/bug?extid=dd280197f0f7ab3917be
Fixes: ec7328b591 ("net: bridge: mst: Multiple Spanning Tree (MST) mode")
Reported-by: syzbot+dd280197f0f7ab3917be@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/69088ffa.050a0220.29fc44.003d.GAE@google.com/
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20251105111919.1499702-2-razor@blackwall.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0216721ce7 ]
The following warning was seen when we try to connect using ssh to the device.
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:575
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 104, name: dropbear
preempt_count: 1, expected: 0
INFO: lockdep is turned off.
CPU: 0 UID: 0 PID: 104 Comm: dropbear Tainted: G W 6.18.0-rc2-00399-g6f1ab1b109b9-dirty #530 NONE
Tainted: [W]=WARN
Hardware name: Generic DT based system
Call trace:
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x7c/0xac
dump_stack_lvl from __might_resched+0x16c/0x2b0
__might_resched from __mutex_lock+0x64/0xd34
__mutex_lock from mutex_lock_nested+0x1c/0x24
mutex_lock_nested from lan966x_stats_get+0x5c/0x558
lan966x_stats_get from dev_get_stats+0x40/0x43c
dev_get_stats from dev_seq_printf_stats+0x3c/0x184
dev_seq_printf_stats from dev_seq_show+0x10/0x30
dev_seq_show from seq_read_iter+0x350/0x4ec
seq_read_iter from seq_read+0xfc/0x194
seq_read from proc_reg_read+0xac/0x100
proc_reg_read from vfs_read+0xb0/0x2b0
vfs_read from ksys_read+0x6c/0xec
ksys_read from ret_fast_syscall+0x0/0x1c
Exception stack(0xf0b11fa8 to 0xf0b11ff0)
1fa0: 00000001 00001000 00000008 be9048d8 00001000 00000001
1fc0: 00000001 00001000 00000008 00000003 be905920 0000001e 00000000 00000001
1fe0: 0005404c be9048c0 00018684 b6ec2cd8
It seems that we are using a mutex in a atomic context which is wrong.
Change the mutex with a spinlock.
Fixes: 12c2d0a5b8 ("net: lan966x: add ethtool configuration and statistics")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20251105074955.1766792-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 96baf482ca ]
KSZ9477/KSZ9897 and LAN937X families of switches use a reserved multicast
address table for some specific forwarding with some multicast addresses,
like the one used in STP. The hardware assumes the host port is the last
port in KSZ9897 family and port 5 in LAN937X family. Most of the time
this assumption is correct but not in other cases like KSZ9477.
Originally the function just setups the first entry, but the others still
need update, especially for one common multicast address that is used by
PTP operation.
LAN937x also uses different register bits when accessing the reserved
table.
Fixes: 457c182af5 ("net: dsa: microchip: generic access to ksz9477 static and reserved table")
Signed-off-by: Tristram Ha <tristram.ha@microchip.com>
Tested-by: Łukasz Majewski <lukma@nabladev.com>
Link: https://patch.msgid.link/20251105033741.6455-1-Tristram.Ha@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4d6ec3a793 ]
The driver calls mfd_add_devices() but fails to call mfd_remove_devices()
in error paths after successful MFD device registration and in the remove
function. This leads to resource leaks where MFD child devices are not
properly unregistered.
Replace mfd_add_devices with devm_mfd_add_devices to automatically
manage the device resources.
Fixes: c96e976d9a ("net: wan: framer: Add support for the Lantiq PEF2256 framer")
Suggested-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn>
Acked-by: Herve Codina <herve.codina@bootlin.com>
Link: https://patch.msgid.link/20251105034716.662-1-vulab@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bacd8d8018 ]
mlx5e_hw_gro_skb_has_enough_space() uses a formula to check if there is
enough space in the skb frags to store more data. This formula is
incorrect for 64K page sizes and it triggers early GRO session
termination because the first fragment will blow up beyond
GRO_LEGACY_MAX_SIZE.
This patch adds a special case for page sizes >= GRO_LEGACY_MAX_SIZE
(64K) which uses the skb->len instead. Within this context,
the check is safe from fragment overflow because the hardware
will continuously fill the data up to the reservation size of 64K
and the driver will coalesce all data from the same page to the same
fragment. This means that the data will span one fragment or at most
two for such a large page size.
It is expected that the if statement will be optimized out as the
check is done with constants.
Fixes: 92552d3abd ("net/mlx5e: HW_GRO cqe handler implementation")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1762238915-1027590-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ae4789affd ]
The ICSSG driver does the initial FDB configuration which
includes setting the control registers. Other run time
management like learning is managed by the PRU's. The default
FDB hash size used by the firmware is 512 slots, which is
currently missing in the current driver. Update the driver
FDB config to include FDB hash size as well.
Please refer trm [1] 6.4.14.12.17 section on how the FDB config
register gets configured. From the table 6-1404, there is a reset
field for FDB_HAS_SIZE which is 4, meaning 1024 slots. Currently
the driver is not updating this reset value from 4(1024 slots) to
3(512 slots). This patch fixes this by updating the reset value
to 512 slots.
[1]: https://www.ti.com/lit/pdf/spruim2
Fixes: abd5576b9c ("net: ti: icssg-prueth: Add support for ICSSG switch firmware")
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251104104415.3110537-1-m-malladi@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c74619e760 ]
hwsim radios marked destroy_on_close are removed when the Netlink socket
that created them is closed. As the portid is not unique across network
namespaces, closing a socket in one namespace may remove radios in another
if it has the destroy_on_close flag set.
Instead of matching the network namespace, match the netgroup of the radio
to limit radio removal to those that have been created by the closing
Netlink socket. The netgroup of a radio identifies the network namespace
it was created in, and matching on it removes a destroy_on_close radio
even if it has been moved to another namespace.
Fixes: 100cb9ff40 ("mac80211_hwsim: Allow managing radios from non-initial namespaces")
Signed-off-by: Martin Willi <martin@strongswan.org>
Link: https://patch.msgid.link/20251103082436.30483-1-martin@strongswan.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 46010d43ab ]
If 'force' is false, it will keep the memory pages and all data
structures for the context memory type if the memory is valid.
This patch always passes true for the 'force' parameter so there is
no change in behavior. Later patches will adjust the 'force' parameter
for the FW log context memory types so that the logs will not be reset
after FW reset.
Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-5-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: 5204943a4c ("bnxt_en: Fix warning in bnxt_dl_reload_down()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0b350b4927 ]
Add a new bit to struct bnxt_ctx_mem_type to indicate that host
memory has been successfully allocated for this context memory type.
In the next patches, we'll be adding some additional context memory
types for FW debugging/logging. If memory cannot be allocated for
any of these new types, we will not abort and the cleared mem_valid
bit will indicate to skip configuring the memory type.
Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-of-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: 5204943a4c ("bnxt_en: Fix warning in bnxt_dl_reload_down()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e120f46768 ]
Raw IP packets have no MAC header, leaving skb->mac_header uninitialized.
This can trigger kernel panics on ARM64 when xfrm or other subsystems
access the offset due to strict alignment checks.
Initialize the MAC header to prevent such crashes.
This can trigger kernel panics on ARM when running IPsec over the
qmimux0 interface.
Example trace:
Internal error: Oops: 000000009600004f [#1] SMP
CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.34-gbe78e49cb433 #1
Hardware name: LS1028A RDB Board (DT)
pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : xfrm_input+0xde8/0x1318
lr : xfrm_input+0x61c/0x1318
sp : ffff800080003b20
Call trace:
xfrm_input+0xde8/0x1318
xfrm6_rcv+0x38/0x44
xfrm6_esp_rcv+0x48/0xa8
ip6_protocol_deliver_rcu+0x94/0x4b0
ip6_input_finish+0x44/0x70
ip6_input+0x44/0xc0
ipv6_rcv+0x6c/0x114
__netif_receive_skb_one_core+0x5c/0x8c
__netif_receive_skb+0x18/0x60
process_backlog+0x78/0x17c
__napi_poll+0x38/0x180
net_rx_action+0x168/0x2f0
Fixes: c6adf77953 ("net: usb: qmi_wwan: add qmap mux protocol support")
Signed-off-by: Qendrim Maxhuni <qendrim.maxhuni@garderos.com>
Link: https://patch.msgid.link/20251029075744.105113-1-qendrim.maxhuni@garderos.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit de0337d641 ]
The TSO path called ionic_tx_map_skb() before preparing the TCP pseudo
checksum (ionic_tx_tcp_[inner_]pseudo_csum()), which may perform
skb_cow_head() and might modifies bytes in the linear header area.
Mapping first and then mutating the header risks:
- Using a stale DMA address if skb_cow_head() relocates the head, and/or
- Device reading stale header bytes on weakly-ordered systems
(CPU writes after mapping are not guaranteed visible without an
explicit dma_sync_single_for_device()).
Reorder the TX path to perform all header mutations (including
skb_cow_head()) *before* DMA mapping. Mapping is now done only after the
skb layout and header contents are final. This removes the need for any
post-mapping dma_sync and prevents on-wire corruption observed under
VLAN+TSO load after repeated runs.
This change is purely an ordering fix; no functional behavior change
otherwise.
Fixes: 0f3154e6bc ("ionic: Add Tx and Rx handling")
Signed-off-by: Mohammad Heib <mheib@redhat.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Link: https://patch.msgid.link/20251031155203.203031-2-mheib@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d261f5b09c ]
The TX path currently writes descriptors and then immediately writes to
the MMIO doorbell register to notify the NIC. On weakly ordered
architectures, descriptor writes may still be pending in CPU or DMA
write buffers when the doorbell is issued, leading to the device
fetching stale or incomplete descriptors.
Add a dma_wmb() in ionic_txq_post() to ensure all descriptor writes are
visible to the device before the doorbell MMIO write.
Fixes: 0f3154e6bc ("ionic: Add Tx and Rx handling")
Signed-off-by: Mohammad Heib <mheib@redhat.com>
Link: https://patch.msgid.link/20251031155203.203031-1-mheib@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0be04b5fa6 ]
The switch clears the ARL_SRCH_STDN bit when the search is done, i.e. it
finished traversing the ARL table.
This means that there will be no valid result, so we should not attempt
to read and process any further entries.
We only ever check the validity of the entries for 4 ARL bin chips, and
only after having passed the first entry to the b53_fdb_copy().
This means that we always pass an invalid entry at the end to the
b53_fdb_copy(). b53_fdb_copy() does check the validity though before
passing on the entry, so it never gets passed on.
On < 4 ARL bin chips, we will even continue reading invalid entries
until we reach the result limit.
Fixes: 1da6df85c6 ("net: dsa: b53: Implement ARL add/del/dump operations")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20251102100758.28352-3-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c264294624 ]
In the New Control register bit 1 is either reserved, or has a different
function:
Out of Range Error Discard
When enabled, the ingress port discards any frames
if the Length field is between 1500 and 1536
(excluding 1500 and 1536) and with good CRC.
The actual bit for enabling IP multicast is bit 0, which was only
explicitly enabled for BCM5325 so far.
For older switch chips, this bit defaults to 0, so we want to enable it
as well, while newer switch chips default to 1, and their documentation
says "It is illegal to set this bit to zero."
So drop the wrong B53_IPMC_FWD_EN define, enable the IP multicast bit
also for other switch chips. While at it, rename it to (B53_)IP_MC as
that is how it is called in Broadcom code.
Fixes: 63cc54a6f0 ("net: dsa: b53: Fix egress flooding settings")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20251102100758.28352-2-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3e4ebdc160 ]
BCM63XX's switch does not support MDIO scanning of external phys, so its
MACs needs to be manually configured for autonegotiated link speeds.
So b53_force_port_config() and b53_force_link() accordingly also when
mode is MLO_AN_PHY for those ports.
Fixes lower speeds than 1000/full on rgmii ports 4 - 7.
This aligns the behaviour with the old bcm63xx_enetsw driver for those
ports.
Fixes: 967dd82ffc ("net: dsa: b53: Add support for Broadcom RoboSwitch")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20251101132807.50419-3-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b6a8a5477f ]
There is no guarantee that the port state override registers have their
default values, as not all switches support being reset via register or
have a reset GPIO.
So when forcing port config, we need to make sure to clear all fields,
which we currently do not do for the speed and flow control
configuration. This can cause flow control stay enabled, or in the case
of speed becoming an illegal value, e.g. configured for 1G (0x2), then
setting 100M (0x1), results in 0x3 which is invalid.
For PORT_OVERRIDE_SPEED_2000M we need to make sure to only clear it on
supported chips, as the bit can have different meanings on other chips,
e.g. for BCM5389 this controls scanning PHYs for link/speed
configuration.
Fixes: 5e004460f8 ("net: dsa: b53: Add helper to set link parameters")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20251101132807.50419-2-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c211f5d7cb ]
After registering a VLAN device and setting its feature flags, we need to
synchronize the VLAN features with the lower device. For example, the VLAN
device does not have the NETIF_F_LRO flag, it should be synchronized with
the lower device based on the NETIF_F_UPPER_DISABLES definition.
As the dev->vlan_features has changed, we need to call
netdev_update_features(). The caller must run after netdev_upper_dev_link()
links the lower devices, so this patch adds the netdev_update_features()
call in register_vlan_dev().
Fixes: fd867d51f8 ("net/core: generic support for disabling netdev features down stack")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20251030073539.133779-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d01f8136d4 ]
The script "ethtool-common.sh" is not installed in INSTALL_PATH, and
triggers some errors when I try to run the test
'drivers/net/netdevsim/ethtool-coalesce.sh':
TAP version 13
1..1
# timeout set to 600
# selftests: drivers/net/netdevsim: ethtool-coalesce.sh
# ./ethtool-coalesce.sh: line 4: ethtool-common.sh: No such file or directory
# ./ethtool-coalesce.sh: line 25: make_netdev: command not found
# ethtool: bad command line argument(s)
# ./ethtool-coalesce.sh: line 124: check: command not found
# ./ethtool-coalesce.sh: line 126: [: -eq: unary operator expected
# FAILED /0 checks
not ok 1 selftests: drivers/net/netdevsim: ethtool-coalesce.sh # exit=1
Install this file to avoid this error. After this patch:
TAP version 13
1..1
# timeout set to 600
# selftests: drivers/net/netdevsim: ethtool-coalesce.sh
# PASSED all 22 checks
ok 1 selftests: drivers/net/netdevsim: ethtool-coalesce.sh
Fixes: fbb8531e58 ("selftests: extract common functions in ethtool-common.sh")
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Link: https://patch.msgid.link/20251030040340.3258110-1-wangliang74@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f8e8486702 ]
The GRO self-test, gro.c, currently constructs IPv6 packets containing a
Hop-by-Hop Options header (IPPROTO_HOPOPTS) to ensure the GRO path
correctly handles IPv6 extension headers.
However, network elements may be configured to drop packets with the
Hop-by-Hop Options header (HBH). This causes the self-test to fail
in environments where such network elements are present.
To improve the robustness and reliability of this test in diverse
network environments, switch from using IPPROTO_HOPOPTS to
IPPROTO_DSTOPTS (Destination Options).
The Destination Options header is less likely to be dropped by
intermediate routers and still serves the core purpose of the test:
validating GRO's handling of an IPv6 extension header. This change
ensures the test can execute successfully without being incorrectly
failed by network policies outside the kernel's control.
Fixes: 7d1575014a ("selftests/net: GRO coalesce test")
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Anubhav Singh <anubhavsinggh@google.com>
Link: https://patch.msgid.link/20251030060436.1556664-1-anubhavsinggh@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 02d064de05 ]
Due to the gro_sender sending data packets and FIN packets
in very quick succession, these are received almost simultaneously
by the gro_receiver. FIN packets are sometimes processed before the
data packets leading to intermittent (~1/100) test failures.
This change adds a delay of 100ms before sending FIN packets
in gro:tcp test to avoid the out-of-order delivery. The same
mitigation already exists for the gro:ip test.
Fixes: 7d1575014a ("selftests/net: GRO coalesce test")
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Anubhav Singh <anubhavsinggh@google.com>
Link: https://patch.msgid.link/20251030062818.1562228-1-anubhavsinggh@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3d18a84edd ]
The internal switch on BCM63XX SoCs will unconditionally add 802.1Q VLAN
tags on egress to CPU when 802.1Q mode is enabled. We do this
unconditionally since commit ed409f3bba ("net: dsa: b53: Configure
VLANs while not filtering").
This is fine for VLAN aware bridges, but for standalone ports and vlan
unaware bridges this means all packets are tagged with the default VID,
which is 0.
While the kernel will treat that like untagged, this can break userspace
applications processing raw packets, expecting untagged traffic, like
STP daemons.
This also breaks several bridge tests, where the tcpdump output then
does not match the expected output anymore.
Since 0 isn't a valid VID, just strip out the VLAN tag if we encounter
it, unless the priority field is set, since that would be a valid tag
again.
Fixes: 964dbf186e ("net: dsa: tag_brcm: add support for legacy tags")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/20251027194621.133301-1-jonas.gorski@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1c21cf89a6 ]
The memory allocated for ptr using kvmalloc() is not freed on the last
error path. Fix that by freeing it on that error path.
Fixes: 9a24ce5e29 ("Bluetooth: btrtl: Firmware format v2 support")
Signed-off-by: Abdun Nihaal <nihaal@cse.iitm.ac.in>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a74f038fa5 ]
The pt_dump_seq_puts() macro incorrectly uses seq_printf() instead of
seq_puts(). This is both a performance issue and conceptually wrong,
as the macro name suggests plain string output (puts) but the
implementation uses formatted output (printf).
The macro is used in ptdump.c:301 to output a newline character. Using
seq_printf() adds unnecessary overhead for format string parsing when
outputting this constant string.
This bug was introduced in commit 59c4da8640 ("riscv: Add support to
dump the kernel page tables") in 2020, which copied the implementation
pattern from other architectures that had the same bug.
Fixes: 59c4da8640 ("riscv: Add support to dump the kernel page tables")
Signed-off-by: Josephine Pfeiffer <hi@josie.lol>
Link: https://lore.kernel.org/r/20251018170451.3355496-1-hi@josie.lol
Signed-off-by: Paul Walmsley <pjw@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 060ea84a48 ]
Unwinding the stack of a task other than current, KASAN would report
"BUG: KASAN: out-of-bounds in walk_stackframe+0x41c/0x460"
There is a same issue on x86 and has been resolved by the commit
84936118bd ("x86/unwind: Disable KASAN checks for non-current tasks")
The solution could be applied to RISC-V too.
This patch also can solve the issue:
https://seclists.org/oss-sec/2025/q4/23
Fixes: 5d8544e2d0 ("RISC-V: Generic library routines and assembly")
Co-developed-by: Jiakai Xu <xujiakai2025@iscas.ac.cn>
Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn>
Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn>
Link: https://lore.kernel.org/r/20251022072608.743484-1-zhangchunyan@iscas.ac.cn
[pjw@kernel.org: clean up checkpatch issues]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 2618849f31 upstream.
[BUG]
During development of a minor feature (make sure all btrfs_bio::end_io()
is called in task context), I noticed a crash in generic/388, where
metadata writes triggered new works after btrfs_stop_all_workers().
It turns out that it can even happen without any code modification, just
using RAID5 for metadata and the same workload from generic/388 is going
to trigger the use-after-free.
[CAUSE]
If btrfs hits an error, the fs is marked as error, no new
transaction is allowed thus metadata is in a frozen state.
But there are some metadata modifications before that error, and they are
still in the btree inode page cache.
Since there will be no real transaction commit, all those dirty folios
are just kept as is in the page cache, and they can not be invalidated
by invalidate_inode_pages2() call inside close_ctree(), because they are
dirty.
And finally after btrfs_stop_all_workers(), we call iput() on btree
inode, which triggers writeback of those dirty metadata.
And if the fs is using RAID56 metadata, this will trigger RMW and queue
new works into rmw_workers, which is already stopped, causing warning
from queue_work() and use-after-free.
[FIX]
Add a special handling for write_one_eb(), that if the fs is already in
an error state, immediately mark the bbio as failure, instead of really
submitting them.
Then during close_ctree(), iput() will just discard all those dirty
tree blocks without really writing them back, thus no more new jobs for
already stopped-and-freed workqueues.
The extra discard in write_one_eb() also acts as an extra safenet.
E.g. the transaction abort is triggered by some extent/free space
tree corruptions, and since extent/free space tree is already corrupted
some tree blocks may be allocated where they shouldn't be (overwriting
existing tree blocks). In that case writing them back will further
corrupting the fs.
CC: stable@vger.kernel.org # 6.6+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 284922f4c5 ]
The runtime-const infrastructure was never designed to handle the
modular case, because the constant fixup is only done at boot time for
core kernel code.
But by the time I used it for the x86-64 user space limit handling in
commit 86e6b1547b ("x86: fix user address masking non-canonical
speculation issue"), I had completely repressed that fact.
And it all happens to work because the only code that currently actually
gets inlined by modules is for the access_ok() limit check, where the
default constant value works even when not fixed up. Because at least I
had intentionally made it be something that is in the non-canonical
address space region.
But it's technically very wrong, and it does mean that at least in
theory, the use of 'access_ok()' + '__get_user()' can trigger the same
speculation issue with non-canonical addresses that the original commit
was all about.
The pattern is unusual enough that this probably doesn't matter in
practice, but very wrong is still very wrong. Also, let's fix it before
the nice optimized scoped user accessor helpers that Thomas Gleixner is
working on cause this pseudo-constant to then be more widely used.
This all came up due to an unrelated discussion with Mateusz Guzik about
using the runtime const infrastructure for names_cachep accesses too.
There the modular case was much more obviously broken, and Mateusz noted
it in his 'v2' of the patch series.
That then made me notice how broken 'access_ok()' had been in modules
all along. Mea culpa, mea maxima culpa.
Fix it by simply not using the runtime-const code in modules, and just
using the USER_PTR_MAX variable value instead. This is not
performance-critical like the core user accessor functions (get_user()
and friends) are.
Also make sure this doesn't get forgotten the next time somebody wants
to do runtime constant optimizations by having the x86 runtime-const.h
header file error out if included by modules.
Fixes: 86e6b1547b ("x86: fix user address masking non-canonical speculation issue")
Acked-by: Borislav Petkov <bp@alien8.de>
Acked-by: Sean Christopherson <seanjc@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Triggered-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/all/20251030105242.801528-1-mjguzik@gmail.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 91309a7082 ]
This was a suggestion by David Laight, and while I was slightly worried
that some micro-architecture would predict cmov like a conditional
branch, there is little reason to actually believe any core would be
that broken.
Intel documents that their existing cores treat CMOVcc as a data
dependency that will constrain speculation in their "Speculative
Execution Side Channel Mitigations" whitepaper:
"Other instructions such as CMOVcc, AND, ADC, SBB and SETcc can also
be used to prevent bounds check bypass by constraining speculative
execution on current family 6 processors (Intel® Core™, Intel® Atom™,
Intel® Xeon® and Intel® Xeon Phi™ processors)"
and while that leaves the future uarch issues open, that's certainly
true of our traditional SBB usage too.
Any core that predicts CMOV will be unusable for various crypto
algorithms that need data-independent timing stability, so let's just
treat CMOV as the safe choice that simplifies the address masking by
avoiding an extra instruction and doesn't need a temporary register.
Suggested-by: David Laight <David.Laight@aculab.com>
Link: https://www.intel.com/content/dam/develop/external/us/en/documents/336996-speculative-execution-side-channel-mitigations.pdf
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stable-dep-of: 284922f4c5 ("x86: uaccess: don't use runtime-const rewriting in modules")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 22c73d52a6 ]
The mds auth caps check should also validate the
fsname along with the associated caps. Not doing
so would result in applying the mds auth caps of
one fs on to the other fs in a multifs ceph cluster.
The bug causes multiple issues w.r.t user
authentication, following is one such example.
Steps to Reproduce (on vstart cluster):
1. Create two file systems in a cluster, say 'fsname1' and 'fsname2'
2. Authorize read only permission to the user 'client.usr' on fs 'fsname1'
$ceph fs authorize fsname1 client.usr / r
3. Authorize read and write permission to the same user 'client.usr' on fs 'fsname2'
$ceph fs authorize fsname2 client.usr / rw
4. Update the keyring
$ceph auth get client.usr >> ./keyring
With above permssions for the user 'client.usr', following is the
expectation.
a. The 'client.usr' should be able to only read the contents
and not allowed to create or delete files on file system 'fsname1'.
b. The 'client.usr' should be able to read/write on file system 'fsname2'.
But, with this bug, the 'client.usr' is allowed to read/write on file
system 'fsname1'. See below.
5. Mount the file system 'fsname1' with the user 'client.usr'
$sudo bin/mount.ceph usr@.fsname1=/ /kmnt_fsname1_usr/
6. Try creating a file on file system 'fsname1' with user 'client.usr'. This
should fail but passes with this bug.
$touch /kmnt_fsname1_usr/file1
7. Mount the file system 'fsname1' with the user 'client.admin' and create a
file.
$sudo bin/mount.ceph admin@.fsname1=/ /kmnt_fsname1_admin
$echo "data" > /kmnt_fsname1_admin/admin_file1
8. Try removing an existing file on file system 'fsname1' with the user
'client.usr'. This shoudn't succeed but succeeds with the bug.
$rm -f /kmnt_fsname1_usr/admin_file1
For more information, please take a look at the corresponding mds/fuse patch
and tests added by looking into the tracker mentioned below.
v2: Fix a possible null dereference in doutc
v3: Don't store fsname from mdsmap, validate against
ceph_mount_options's fsname and use it
v4: Code refactor, better warning message and
fix possible compiler warning
[ Slava.Dubeyko: "fsname check failed" -> "fsname mismatch" ]
Link: https://tracker.ceph.com/issues/72167
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 53db6f25ee ]
The wake_up_bit() is called in ceph_async_unlink_cb(),
wake_async_create_waiters(), and ceph_finish_async_create().
It makes sense to switch on clear_bit() function, because
it makes the code much cleaner and easier to understand.
More important rework is the adding of smp_mb__after_atomic()
memory barrier after the bit modification and before
wake_up_bit() call. It can prevent potential race condition
of accessing the modified bit in other threads. Luckily,
clear_and_wake_up_bit() already implements the required
functionality pattern:
static inline void clear_and_wake_up_bit(int bit, unsigned long *word)
{
clear_bit_unlock(bit, word);
/* See wake_up_bit() for which memory barrier you need to use. */
smp_mb__after_atomic();
wake_up_bit(word, bit);
}
Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Reviewed-by: Alex Markuze <amarkuze@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5824ccba9a ]
The Coverity Scan service has detected potential
race condition in ceph_ioctl_lazyio() [1].
The CID 1591046 contains explanation: "Check of thread-shared
field evades lock acquisition (LOCK_EVASION). Thread1 sets
fmode to a new value. Now the two threads have an inconsistent
view of fmode and updates to fields correlated with fmode
may be lost. The data guarded by this critical section may
be read while in an inconsistent state or modified by multiple
racing threads. In ceph_ioctl_lazyio: Checking the value of
a thread-shared field outside of a locked region to determine
if a locked operation involving that thread shared field
has completed. (CWE-543)".
The patch places fi->fmode field access under ci->i_ceph_lock
protection. Also, it introduces the is_file_already_lazy
variable that is set under the lock and it is checked later
out of scope of critical section.
[1] https://scan5.scan.coverity.com/#/project-view/64304/10063?selectedIssue=1591046
Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Reviewed-by: Alex Markuze <amarkuze@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b7ed1e29cf ]
The Coverity Scan service has detected the calling of
wait_for_completion_killable() without checking the return
value in ceph_lock_wait_for_completion() [1]. The CID 1636232
defect contains explanation: "If the function returns an error
value, the error value may be mistaken for a normal value.
In ceph_lock_wait_for_completion(): Value returned from
a function is not checked for errors before being used. (CWE-252)".
The patch adds the checking of wait_for_completion_killable()
return value and return the error code from
ceph_lock_wait_for_completion().
[1] https://scan5.scan.coverity.com/#/project-view/64304/10063?selectedIssue=1636232
Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Reviewed-by: Alex Markuze <amarkuze@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7574f30337 ]
If mmap write lock is taken while draining retry fault, mmap write lock
is not released because svm_range_restore_pages calls mmap_read_unlock
then returns. This causes deadlock and system hangs later because mmap
read or write lock cannot be taken.
Downgrade mmap write lock to read lock if draining retry fault fix this
bug.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 66128f4287 ]
On m68k, check_sizetypes in headers_check reports:
./usr/include/asm/bootinfo-amiga.h:17: found __[us]{8,16,32,64} type without #include <linux/types.h>
This header file does not use any of the Linux-specific integer types,
but merely refers to them from comments, so this is a false positive.
As of commit c3a9d74ee4 ("kbuild: uapi: upgrade check_sizetypes()
warning to error"), this check was promoted to an error, breaking m68k
all{mod,yes}config builds.
Fix this by stripping simple comments before looking for Linux-specific
integer types.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Link: https://patch.msgid.link/949f096337e28d50510e970ae3ba3ec9c1342ec0.1759753998.git.geert@linux-m68k.org
[nathan: Adjust comment and remove unnecessary escaping from slashes in
regex]
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3637d34b35 ]
Add bounds checking to prevent writes past framebuffer boundaries when
rendering text near screen edges. Return early if the Y position is off-screen
and clip image height to screen boundary. Break from the rendering loop if the
X position is off-screen. When clipping image width to fit the screen, update
the character count to match the clipped width to prevent buffer size
mismatches.
Without the character count update, bit_putcs_aligned and bit_putcs_unaligned
receive mismatched parameters where the buffer is allocated for the clipped
width but cnt reflects the original larger count, causing out-of-bounds writes.
Reported-by: syzbot+48b0652a95834717f190@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=48b0652a95834717f190
Suggested-by: Helge Deller <deller@gmx.de>
Tested-by: syzbot+48b0652a95834717f190@syzkaller.appspotmail.com
Signed-off-by: Albin Babu Varghese <albinbabuvarghese20@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1375152bb0 ]
Instead of preserving mode, timestamp, and owner, for the object files
during installation, just preserve the mode and timestamp.
When installing as root, the installed files should be owned by root.
When installing as user, --preserve=ownership doesn't work anyway. This
makes --preserve=ownership rather pointless.
Signed-off-by: Emil Dahl Juhl <juhl.emildahl@gmail.com>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit db740f5689 ]
The atomic instructions sc.q, llacq.{w/d}, screl.{w/d} were newly added
in the LoongArch Reference Manual v1.10, it is necessary to handle them
in insns_not_supported() to avoid putting a breakpoint in the middle of
a ll/sc atomic sequence, otherwise it will loop forever for kprobes and
uprobes.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 558ae45798 ]
When a UTP error occurs in isolation, UFS is not currently recoverable.
This is because the UTP error is not considered fatal in the error
handling code, leading to either an I/O timeout or an OCS error.
Add the UTP error flag to INT_FATAL_ERRORS so the controller will be
reset in this situation.
sd 0:0:0:0: [sda] tag#38 UNKNOWN(0x2003) Result: hostbyte=0x07
driverbyte=DRIVER_OK cmd_age=0s
sd 0:0:0:0: [sda] tag#38 CDB: opcode=0x28 28 00 00 51 24 e2 00 00 08 00
I/O error, dev sda, sector 42542864 op 0x0:(READ) flags 0x80700 phys_seg
8 prio class 2
OCS error from controller = 9 for tag 39
pa_err[1] = 0x80000010 at 2667224756 us
pa_err: total cnt=2
dl_err[0] = 0x80000002 at 2667148060 us
dl_err[1] = 0x80002000 at 2667282844 us
No record of nl_err
No record of tl_err
No record of dme_err
No record of auto_hibern8_err
fatal_err[0] = 0x804 at 2667282836 us
---------------------------------------------------
REGISTER
---------------------------------------------------
NAME OFFSET VALUE
STD HCI SFR 0xfffffff0 0x0
AHIT 0x18 0x814
INTERRUPT STATUS 0x20 0x1000
INTERRUPT ENABLE 0x24 0x70ef5
[mkp: commit desc]
Signed-off-by: Hoyoung Seo <hy50.seo@samsung.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Message-Id: <20250930061428.617955-1-hy50.seo@samsung.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ba60189291 ]
During initialization, the EDVD_COREx_VOLT_FREQ registers for some cores
are still at reset values and not reflecting the actual frequency. This
causes get calls to fail. Set all cores to their respective max
frequency during probe to initialize the registers to working values.
Suggested-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Aaron Kling <webgeek1234@gmail.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5ad865862a ]
The NTB epf host driver assumes the BAR number associated with a memory
window is just incremented from the BAR number associated with MW1. This
seems to have been enough so far but this is not really how the endpoint
side work and the two could easily become mis-aligned.
ntb_epf_mw_to_bar() even assumes that the BAR number is the memory window
index + 2, which means the function only returns a proper result if BAR_2
is associated with MW1.
Instead, fully describe and allow arbitrary NTB BAR mapping.
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7c2e86f7b5 ]
The output clock register offset used in clk_wzrd_register_output_clocks
was incorrectly referencing 0x3C instead of 0x38, which caused
misconfiguration of output dividers on Versal platforms.
Correcting the off-by-one error ensures proper configuration of output
clocks.
Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 18db1ff2de ]
For some of the SCMI based platforms, the oem extended config may be
supported, but not for duty cycle purpose. Skip the duty cycle ops if
err return when trying to get duty cycle info.
Signed-off-by: Jacky Bai <ping.bai@nxp.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1e0d75258b ]
As described in AM335x Errata Advisory 1.0.42, WKUP_DEBUGSS_CLKCTRL
can't be disabled - the clock module will just be stuck in transitioning
state forever, resulting in the following warning message after the wait
loop times out:
l3-aon-clkctrl:0000:0: failed to disable
Just add the clock to enable_init_clks, so no attempt is made to disable
it.
Signed-off-by: Matthias Schiffer <matthias.schiffer@tq-group.com>
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Acked-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit af98caeaa7 ]
This register is important for sequencing the commands to PLLs, so
actually write the update bits with regmap_write_bits() instead of
relying on a read/modify/write regmap command that could skip the actual
hardware write if the value is identical to the one read.
It's changed when modification is needed to the PLL, when
read-only operation is done, we could keep the call to
regmap_update_bits().
Add a comment to the sam9x60_div_pll_set_div() function that uses this
PLL_UPDT register so that it's used consistently, according to the
product's datasheet.
Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Tested-by: Ryan Wanner <ryan.wanner@microchip.com> # on sama7d65 and sam9x75
Link: https://lore.kernel.org/r/20250827150811.82496-1-nicolas.ferre@microchip.com
[claudiu.beznea: fix "Alignment should match open parenthesis"
checkpatch.pl check]
Signed-off-by: Claudiu Beznea <claudiu.beznea@tuxon.dev>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e0237f5635 ]
A potential divider for the master clock is div/3. The register
configuration for div/3 is MASTER_PRES_MAX. The current bit shifting
method does not work for this case. Checking for MASTER_PRES_MAX will
ensure the correct decimal value is stored in the system.
Signed-off-by: Ryan Wanner <Ryan.Wanner@microchip.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a6f1a4f059 ]
PCF2127 can generate interrupt every full second or minute configured
from control and status register 1, bits MI (1) and SI (0).
On interrupt control register 2 bit MSF (7) is set and must be cleared
to continue normal operation.
While the driver never enables this interrupt on its own, users or
firmware may do so - e.g. as an easy way to test the interrupt.
Add preprocessor definition for MSF bit and include it in the irq
bitmask to ensure minute and second interrupts are cleared when fired.
This fixes an issue where the rtc enters a test mode and becomes
unresponsive after a second interrupt has fired and is not cleared in
time. In this state register writes to control registers have no
effect and the interrupt line is kept asserted [1]:
[1] userspace commands to put rtc into unresponsive state:
$ i2cget -f -y 2 0x51 0x00
0x04
$ i2cset -f -y 2 0x51 0x00 0x05 # set bit 0 SI
$ i2cget -f -y 2 0x51 0x00
0x84 # bit 8 EXT_TEST set
$ i2cset -f -y 2 0x51 0x00 0x05 # try overwrite control register
$ i2cget -f -y 2 0x51 0x00
0x84 # no change
Signed-off-by: Josua Mayer <josua@solid-run.com>
Reviewed-by: Bruno Thomsen <bruno.thomsen@gmail.com>
Link: https://lore.kernel.org/r/20250825-rtc-irq-v1-1-0133319406a7@solid-run.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7aa8781f37 ]
The A523's RTC block is backward compatible with the R329's, but it also
has a calibration function for its internal oscillator, which would
allow it to provide a clock rate closer to the desired 32.768 KHz. This
is useful on the Radxa Cubie A5E, which does not have an external 32.768
KHz crystal.
Add new compatible-specific data for it.
Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://patch.msgid.link/20250909170947.2221611-1-wens@kernel.org
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 725e9d8186 ]
Add the missing option name in the help message. Additionally,
switch to __uml_help(), because this is a global option rather
than a per-channel option.
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 32058c38d3 ]
The function call new_inode() is a primitive for allocating an inode in memory,
rather than planning disk space for it. Therefore, -ENOMEM should be returned
as the error code rather than -ENOSPC.
To be specific, new_inode()'s call path looks like this:
new_inode
new_inode_pseudo
alloc_inode
ops->alloc_inode (hpfs_alloc_inode)
alloc_inode_sb
kmem_cache_alloc_lru
Therefore, the failure of new_inode() indicates a memory presure issue (-ENOMEM),
not a lack of disk space. However, the current implementation of
hpfs_mkdir/create/mknod/symlink incorrectly returns -ENOSPC when new_inode() fails.
This patch fix this by set err to -ENOMEM before the goto statement.
BTW, we also noticed that other nested calls within these four functions,
like hpfs_alloc_f/dnode and hpfs_add_dirent, might also fail due to memory presure.
But similarly, only -ENOSPC is returned. Addressing these will involve code
modifications in other functions, and we plan to submit dedicated patches for these
issues in the future. For this patch, we focus on new_inode().
Signed-off-by: Yikang Yue <yikangy2@illinois.edu>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3b1a4a59a2 ]
In btrfs_fallocate(), when the allocated range overlaps with a prealloc
extent and the extent starts after i_size, the range doesn't get marked
dirty in file_extent_tree. This results in persisting an incorrect
disk_i_size for the inode when not using the no-holes feature.
This is reproducible since commit 41a2ee75aa ("btrfs: introduce
per-inode file extent tree"), then became hidden since commit 3d7db6e8bd
("btrfs: don't allocate file extent tree for non regular files") and then
visible again after commit 8679d2687c ("btrfs: initialize
inode::file_extent_tree after i_mode has been set"), which fixes the
previous commit.
The following reproducer triggers the problem:
$ cat test.sh
MNT=/mnt/test
DEV=/dev/vdb
mkdir -p $MNT
mkfs.btrfs -f -O ^no-holes $DEV
mount $DEV $MNT
touch $MNT/file1
fallocate -n -o 1M -l 2M $MNT/file1
umount $MNT
mount $DEV $MNT
len=$((1 * 1024 * 1024))
fallocate -o 1M -l $len $MNT/file1
du --bytes $MNT/file1
umount $MNT
mount $DEV $MNT
du --bytes $MNT/file1
umount $MNT
Running the reproducer gives the following result:
$ ./test.sh
(...)
2097152 /mnt/test/file1
1048576 /mnt/test/file1
The difference is exactly 1048576 as we assigned.
Fix by adding a call to btrfs_inode_set_file_extent_range() in
btrfs_fallocate_update_isize().
Fixes: 41a2ee75aa ("btrfs: introduce per-inode file extent tree")
Signed-off-by: austinchang <austinchang@synology.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f260c6aff0 ]
When btrfs_add_qgroup_relation() is called with invalid qgroup levels
(src >= dst), the function returns -EINVAL directly without freeing the
preallocated qgroup_list structure passed by the caller. This causes a
memory leak because the caller unconditionally sets the pointer to NULL
after the call, preventing any cleanup.
The issue occurs because the level validation check happens before the
mutex is acquired and before any error handling path that would free
the prealloc pointer. On this early return, the cleanup code at the
'out' label (which includes kfree(prealloc)) is never reached.
In btrfs_ioctl_qgroup_assign(), the code pattern is:
prealloc = kzalloc(sizeof(*prealloc), GFP_KERNEL);
ret = btrfs_add_qgroup_relation(trans, sa->src, sa->dst, prealloc);
prealloc = NULL; // Always set to NULL regardless of return value
...
kfree(prealloc); // This becomes kfree(NULL), does nothing
When the level check fails, 'prealloc' is never freed by either the
callee or the caller, resulting in a 64-byte memory leak per failed
operation. This can be triggered repeatedly by an unprivileged user
with access to a writable btrfs mount, potentially exhausting kernel
memory.
Fix this by freeing prealloc before the early return, ensuring prealloc
is always freed on all error paths.
Fixes: 4addc1ffd6 ("btrfs: qgroup: preallocate memory before adding a relation")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Shardul Bankar <shardulsb08@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fe9622011f ]
When QP wraps around, WQE data from the previous use at the same
position still remains as driver does not clear it. The WQE field
layout differs across different opcodes, causing that the fields
that are not explicitly assigned for the current opcode retain
stale values, and are issued to HW by mistake. Such fields are as
follows:
* MSG_START_SGE_IDX field in ATOMIC WQE
* BLOCK_SIZE and ZBVA fields in FRMR WQE
* DirectWQE fields when DirectWQE not used
For ATOMIC WQE, always set the latest sge index in MSG_START_SGE_IDX
as required by HW.
For FRMR WQE and DirectWQE, clear only those unassigned fields
instead of the entire WQE to avoid performance penalty.
Fixes: 68a997c5d2 ("RDMA/hns: Add FRMR support for hip08")
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://patch.msgid.link/20251016114051.1963197-4-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c4b67b514a ]
Currently driver enforces affinity between QP cache and send CQ
cache, which helps improve the performance of sending, but doesn't
set affinity with recv CQ cache, resulting in suboptimal performance
of receiving.
Use one CQ bank per context to ensure the affinity among QP, send CQ
and recv CQ. For kernel ULP, CQ bank is fixed to 0.
Fixes: 9e03dbea2b ("RDMA/hns: Fix CQ and QP cache affinity")
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://patch.msgid.link/20251016114051.1963197-2-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5575b7646b ]
The driver maintains a CQ table that is used to ensure that a CQ is
still valid when processing CQ related AEs. When a CQ is destroyed,
the table entry is cleared, using irdma_cq.cq_num as the index. This
field was never being set, so it was just always clearing out entry
0.
Additionally, the cq_num field size was increased to accommodate HW
supporting more than 64K CQs.
Fixes: b48c24c2d7 ("RDMA/irdma: Implement device supported verb APIs")
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Link: https://patch.msgid.link/20250923142439.943930-1-jmoroni@google.com
Acked-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8d158f47f1 ]
In some cases, it is possible for pble_rsrc->next_fpm_addr to be
larger than u32, so remove the u32 cast to avoid unintentional
truncation.
This fixes the following error that can be observed when registering
massive memory regions:
[ 447.227494] (NULL ib_device): cqp opcode = 0x1f maj_err_code = 0xffff min_err_code = 0x800c
[ 447.227505] (NULL ib_device): [Update PE SDs Cmd Error][op_code=21] status=-5 waiting=1 completion_err=1 maj=0xffff min=0x800c
Fixes: e8c4dbc2fc ("RDMA/irdma: Add PBLE resource manager")
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Link: https://patch.msgid.link/20250923190850.1022773-1-jmoroni@google.com
Acked-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit db291ed173 ]
[Why]
DP validation may fail with multiple displays and higher color depths.
The sink may support others though.
[How]
When DP bandwidth validation fails, progressively fallback through:
- YUV422 8bpc (bandwidth efficient)
- YUV422 6bpc (reduced color depth)
- YUV420 (last resort)
This resolves cases where displays would show no image due to insufficient
DP link bandwidth for the requested RGB mode.
Suggested-by: Mauri Carvalho <mcarvalho3@lenovo.com>
Reviewed-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Mario Limonciello <Mario.Limonciello@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 31b636d2c4 ]
When this was first reported [1], the possibility of having sufficient
number of dynamic misc devices was theoretical, in the case of dlm driver.
In practice, its userspace never created more than one device.
What we know from commit ab760791c0 ("char: misc: Increase the maximum
number of dynamic misc devices to 1048448"), is that the miscdevice
interface has been used for allocating more than the single-shot devices it
was designed for. And it is not only coresight_tmc, but many other drivers
are able to create multiple devices.
On systems like the ones described in the above commit, it is certain that
the dynamic allocation will allocate certain reserved minor numbers,
leading to failures when a later driver tries to claim its reserved number.
Instead of excluding the historically statically allocated range from
dynamic allocation, restrict the latter to minors above 255. That also
removes the need for DYNAMIC_MINORS and the convolution in allocating minor
numbers, simplifying the code.
Since commit ab760791c0 ("char: misc: Increase the maximum number of
dynamic misc devices to 1048448") has been applied, such range is already
possible. And given such devices already need to be dynamically created,
there should be no systems where this might become a problem.
[1] https://lore.kernel.org/all/1257813017-28598-3-git-send-email-cascardo@holoscopio.com/
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Link: https://lore.kernel.org/r/20250423-misc-dynrange-v4-1-133b5ae4ca18@igalia.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 88b4cbcf6b ]
Currently when both IMA and EVM are in fix mode, the IMA signature will
be reset to IMA hash if a program first stores IMA signature in
security.ima and then writes/removes some other security xattr for the
file.
For example, on Fedora, after booting the kernel with "ima_appraise=fix
evm=fix ima_policy=appraise_tcb" and installing rpm-plugin-ima,
installing/reinstalling a package will not make good reference IMA
signature generated. Instead IMA hash is generated,
# getfattr -m - -d -e hex /usr/bin/bash
# file: usr/bin/bash
security.ima=0x0404...
This happens because when setting security.selinux, the IMA_DIGSIG flag
that had been set early was cleared. As a result, IMA hash is generated
when the file is closed.
Similarly, IMA signature can be cleared on file close after removing
security xattr like security.evm or setting/removing ACL.
Prevent replacing the IMA file signature with a file hash, by preventing
the IMA_DIGSIG flag from being reset.
Here's a minimal C reproducer which sets security.selinux as the last
step which can also replaced by removing security.evm or setting ACL,
#include <stdio.h>
#include <sys/xattr.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
int main() {
const char* file_path = "/usr/sbin/test_binary";
const char* hex_string = "030204d33204490066306402304";
int length = strlen(hex_string);
char* ima_attr_value;
int fd;
fd = open(file_path, O_WRONLY|O_CREAT|O_EXCL, 0644);
if (fd == -1) {
perror("Error opening file");
return 1;
}
ima_attr_value = (char*)malloc(length / 2 );
for (int i = 0, j = 0; i < length; i += 2, j++) {
sscanf(hex_string + i, "%2hhx", &ima_attr_value[j]);
}
if (fsetxattr(fd, "security.ima", ima_attr_value, length/2, 0) == -1) {
perror("Error setting extended attribute");
close(fd);
return 1;
}
const char* selinux_value= "system_u:object_r:bin_t:s0";
if (fsetxattr(fd, "security.selinux", selinux_value, strlen(selinux_value), 0) == -1) {
perror("Error setting extended attribute");
close(fd);
return 1;
}
close(fd);
return 0;
}
Signed-off-by: Coiby Xu <coxu@redhat.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 00be6f26a2 ]
When io_uring is used in the same task as CIFS, there might be
unnecessary reconnects, causing issues in user-space applications
like QEMU with a log like:
> CIFS: VFS: \\10.10.100.81 Error -512 sending data on socket to server
Certain io_uring completions might be added to task_work with
notify_method being TWA_SIGNAL and thus TIF_NOTIFY_SIGNAL is set for
the task.
In __smb_send_rqst(), signals are masked before calling
smb_send_kvec(), but the masking does not apply to TIF_NOTIFY_SIGNAL.
If sk_stream_wait_memory() is reached via sock_sendmsg() while
TIF_NOTIFY_SIGNAL is set, signal_pending(current) will evaluate to
true there, and -EINTR will be propagated all the way from
sk_stream_wait_memory() to sock_sendmsg() in smb_send_kvec().
Afterwards, __smb_send_rqst() will see that not everything was written
and reconnect.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4099b98203 ]
A soft lockup was observed when loading amdgpu module.
If a module has a lot of tracable functions, multiple calls
to kallsyms_lookup can spend too much time in RCU critical
section and with disabled preemption, causing kernel panic.
This is the same issue that was fixed in
commit d0b24b4e91 ("ftrace: Prevent RCU stall on PREEMPT_VOLUNTARY
kernels") and commit 42ea22e754 ("ftrace: Add cond_resched() to
ftrace_graph_set_hash()").
Fix it the same way by adding cond_resched() in ftrace_module_enable.
Link: https://lore.kernel.org/aMQD9_lxYmphT-up@vova-pc
Signed-off-by: Vladimir Riabchun <ferr.lambarginio@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 025e880759 ]
Willy Tarreau <w@1wt.eu> forwarded me a message from
Disclosure <disclosure@aisle.com> with the following
warning:
> The helper `xattr_key()` uses the pointer variable in the loop condition
> rather than dereferencing it. As `key` is incremented, it remains non-NULL
> (until it runs into unmapped memory), so the loop does not terminate on
> valid C strings and will walk memory indefinitely, consuming CPU or hanging
> the thread.
I easily reproduced this with setfattr and getfattr, causing a kernel
oops, hung user processes and corrupted orangefs files. Disclosure
sent along a diff (not a patch) with a suggested fix, which I based
this patch on.
After xattr_key started working right, xfstest generic/069 exposed an
xattr related memory leak that lead to OOM. xattr_key returns
a hashed key. When adding xattrs to the orangefs xattr cache, orangefs
used hash_add, a kernel hashing macro. hash_add also hashes the key using
hash_log which resulted in additions to the xattr cache going to the wrong
hash bucket. generic/069 tortures a single file and orangefs does a
getattr for the xattr "security.capability" every time. Orangefs
negative caches on xattrs which includes a kmalloc. Since adds to the
xattr cache were going to the wrong bucket, every getattr for
"security.capability" resulted in another kmalloc, none of which were
ever freed.
I changed the two uses of hash_add to hlist_add_head instead
and the memory leak ceased and generic/069 quit throwing furniture.
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Reported-by: Stanislav Fort of Aisle Research <stanislav.fort@aisle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 79c1587b6c ]
syzbot created an exfat image with cluster bits not set for the allocation
bitmap. exfat-fs reads and uses the allocation bitmap without checking
this. The problem is that if the start cluster of the allocation bitmap
is 6, cluster 6 can be allocated when creating a directory with mkdir.
exfat zeros out this cluster in exfat_mkdir, which can delete existing
entries. This can reallocate the allocated entries. In addition,
the allocation bitmap is also zeroed out, so cluster 6 can be reallocated.
This patch adds exfat_test_bitmap_range to validate that clusters used for
the allocation bitmap are correctly marked as in-use.
Reported-by: syzbot+a725ab460fc1def9896f@syzkaller.appspotmail.com
Tested-by: syzbot+a725ab460fc1def9896f@syzkaller.appspotmail.com
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6dfba10838 ]
For exFAT filesystems with 4MB read_ahead_size, removing the storage device
when the read operation is in progress, which cause the last read syscall
spent 150s [1]. The main reason is that exFAT generates excessive log
messages [2].
After applying this patch, approximately 300,000 lines of log messages
were suppressed, and the delay of the last read() syscall was reduced
to about 4 seconds.
[1]:
write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072) = 131072 <0.000120>
read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072) = 131072 <0.000032>
write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072) = 131072 <0.000119>
read(4, 0x7fccf28ae000, 131072) = -1 EIO (Input/output error) <150.186215>
[2]:
[ 333.696603] exFAT-fs (vdb): error, failed to access to FAT (entry 0x0000d780, err:-5)
[ 333.697378] exFAT-fs (vdb): error, failed to access to FAT (entry 0x0000d780, err:-5)
[ 333.698156] exFAT-fs (vdb): error, failed to access to FAT (entry 0x0000d780, err:-5)
Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 659169c4eb ]
The 1824c does not have the A/B switch that the 1810c has,
but instead it has a mono main switch that sums the two
main output channels to mono.
Signed-off-by: Roy Vegard Ovesen <roy.vegard.ovesen@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ca94b2b036 ]
Currently, bcsp_recv() can be called even when the BCSP protocol has not
been registered. This leads to a NULL pointer dereference, as shown in
the following stack trace:
KASAN: null-ptr-deref in range [0x0000000000000108-0x000000000000010f]
RIP: 0010:bcsp_recv+0x13d/0x1740 drivers/bluetooth/hci_bcsp.c:590
Call Trace:
<TASK>
hci_uart_tty_receive+0x194/0x220 drivers/bluetooth/hci_ldisc.c:627
tiocsti+0x23c/0x2c0 drivers/tty/tty_io.c:2290
tty_ioctl+0x626/0xde0 drivers/tty/tty_io.c:2706
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:907 [inline]
__se_sys_ioctl+0xfc/0x170 fs/ioctl.c:893
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
To prevent this, ensure that the HCI_UART_REGISTERED flag is set before
processing received data. If the protocol is not registered, return
-EUNATCH.
Reported-by: syzbot+4ed6852d4da4606c93da@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=4ed6852d4da4606c93da
Tested-by: syzbot+4ed6852d4da4606c93da@syzkaller.appspotmail.com
Signed-off-by: Ivan Pravdin <ipravdin.official@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 70a5ce8bc9 ]
bp->dev->dev_addr is of type `unsigned char *`. Casting it to a u32
pointer and dereferencing implies dealing manually with endianness,
which is error-prone.
Replace by calls to get_unaligned_le32|le16() helpers.
This was found using sparse:
⟩ make C=2 drivers/net/ethernet/cadence/macb_main.o
warning: incorrect type in assignment (different base types)
expected unsigned int [usertype] bottom
got restricted __le32 [usertype]
warning: incorrect type in assignment (different base types)
expected unsigned short [usertype] top
got restricted __le16 [usertype]
...
Reviewed-by: Sean Anderson <sean.anderson@linux.dev>
Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250923-macb-fixes-v6-5-772d655cdeb6@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8b9f128947 ]
PCI devices prior to PCI 2.3 both use level interrupts and do not support
interrupt masking, leading to a failure when passed through to a KVM guest on
at least the ppc64 platform. This failure manifests as receiving and
acknowledging a single interrupt in the guest, while the device continues to
assert the level interrupt indicating a need for further servicing.
When lazy IRQ masking is used on DisINTx- (non-PCI 2.3) hardware, the following
sequence occurs:
* Level IRQ assertion on device
* IRQ marked disabled in kernel
* Host interrupt handler exits without clearing the interrupt on the device
* Eventfd is delivered to userspace
* Guest processes IRQ and clears device interrupt
* Device de-asserts INTx, then re-asserts INTx while the interrupt is masked
* Newly asserted interrupt acknowledged by kernel VMM without being handled
* Software mask removed by VFIO driver
* Device INTx still asserted, host controller does not see new edge after EOI
The behavior is now platform-dependent. Some platforms (amd64) will continue
to spew IRQs for as long as the INTX line remains asserted, therefore the IRQ
will be handled by the host as soon as the mask is dropped. Others (ppc64) will
only send the one request, and if it is not handled no further interrupts will
be sent. The former behavior theoretically leaves the system vulnerable to
interrupt storm, and the latter will result in the device stalling after
receiving exactly one interrupt in the guest.
Work around this by disabling lazy IRQ masking for DisINTx- INTx devices.
Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
Link: https://lore.kernel.org/r/333803015.1744464.1758647073336.JavaMail.zimbra@raptorengineeringinc.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 733a763dd8 ]
The problem of having class-D initialization sequence in probe using
regmap_register_patch() is that it will do hardware register writes
immediately after being called as it bypasses regcache. Afterwards, in
aic3x_init() we also perform codec soft reset, rendering class-D init
sequence pointless. This issue is even more apparent when using reset
GPIO line, since in that case class-D amplifier initialization fails
with "Failed to init class D: -5" message as codec is already held in
reset state after requesting the reset GPIO and hence hardware I/O
fails with -EIO errno.
Thus move class-D amplifier initialization sequence from probe function
to aic3x_set_power() just before the usual regcache sync. Use bypassed
regmap_multi_reg_write_bypassed() function to make sure, class-D init
sequence is performed in proper order as described in the datasheet.
Signed-off-by: Primoz Fiser <primoz.fiser@norik.com>
Link: https://patch.msgid.link/20250925085929.2581749-1-primoz.fiser@norik.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 27fa1a8b28 ]
The mclk direction now needs to be specified in endpoint node with
"system-clock-direction-out" property. However some calls to the
set_sysclk callback, related to CPU DAI clock, result in unbalanced
calls to clock API.
The set_sysclk callback in STM32 SAI driver is intended only for mclk
management. So it is relevant to ensure that calls to set_sysclk are
related to mclk only.
Since the master clock is handled only at runtime, skip the calls to
set_sysclk in the initialization phase.
Signed-off-by: Olivier Moysan <olivier.moysan@foss.st.com>
Link: https://patch.msgid.link/20250916123118.84175-1-olivier.moysan@foss.st.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 46e75c56df ]
The following code paths may result in high latency or even task hangs:
1. fastcommit io is throttled by wbt.
2. jbd2_fc_wait_bufs() might wait for a long time while
JBD2_FAST_COMMIT_ONGOING is set in journal->flags, and then
jbd2_journal_commit_transaction() waits for the
JBD2_FAST_COMMIT_ONGOING bit for a long time while holding the write
lock of j_state_lock.
3. start_this_handle() waits for read lock of j_state_lock which
results in high latency or task hang.
Given the fact that ext4_fc_commit() already modifies the current
process' IO priority to match that of the jbd2 thread, it should be
reasonable to match jbd2's IO submission flags as well.
Suggested-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Julian Sun <sunjunchao@bytedance.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-ID: <20250827121812.1477634-1-sunjunchao@bytedance.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1534f72dc2 ]
The parent function ext4_xattr_inode_lookup_create already uses GFP_NOFS for memory alloction, so the function ext4_xattr_inode_cache_find should use same gfp_flag.
Signed-off-by: chuguangqing <chuguangqing@inspur.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 513024d5a0 ]
When IOMMU is enabled, dma_alloc_coherent() with GFP_USER may return
addresses from the vmalloc range. If such an address is mapped without
VM_MIXEDMAP, vm_insert_page() will trigger a BUG_ON due to the
VM_PFNMAP restriction.
Fix this by checking for vmalloc addresses and setting VM_MIXEDMAP
in the VMA before mapping. This ensures safe mapping and avoids kernel
crashes. The memory is still driver-allocated and cannot be accessed
directly by userspace.
Signed-off-by: Moti Haimovski <moti.haimovski@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a0d866bab1 ]
Dirty state can occur when the host VM undergoes a reset while the
device does not. In such a case, the driver must reset the device before
it can be used again. As part of this reset, the device capabilities
are zeroed. Therefore, the driver must read the Preboot status again to
learn the Preboot state, capabilities, and security configuration.
Signed-off-by: Konstantin Sinyuk <konstantin.sinyuk@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9f5067531c ]
EFAULT is currently returned if less than requested user pages are
pinned. This value means a "bad address" which might be confusing to
the user, as the address of the given user memory is not necessarily
"bad".
Modify the return value to ENOMEM, as "out of memory" is more suitable
in this case.
Signed-off-by: Tomer Tayar <tomer.tayar@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b4fd8e56c9 ]
Change the BMON_CR register value back to its original state before
enabling, so that BMON does not continue to collect information
after being disabled.
Signed-off-by: Vered Yavniely <vered.yavniely@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 072fdd4b0b ]
The fc_ct_ms_fill() helper currently formats the OS name and version
into entry->value using "%s v%s". Since init_utsname()->sysname and
->release are unbounded strings, snprintf() may attempt to write more
than FC_FDMI_HBA_ATTR_OSNAMEVERSION_LEN bytes, triggering a
-Wformat-truncation warning with W=1.
In file included from drivers/scsi/libfc/fc_elsct.c:18:
drivers/scsi/libfc/fc_encode.h: In function ‘fc_ct_ms_fill.constprop’:
drivers/scsi/libfc/fc_encode.h:359:30: error: ‘%s’ directive output may
be truncated writing up to 64 bytes into a region of size between 62
and 126 [-Werror=format-truncation=]
359 | "%s v%s",
| ^~
360 | init_utsname()->sysname,
361 | init_utsname()->release);
| ~~~~~~~~~~~~~~~~~~~~~~~
drivers/scsi/libfc/fc_encode.h:357:17: note: ‘snprintf’ output between
3 and 131 bytes into a destination of size 128
357 | snprintf((char *)&entry->value,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
358 | FC_FDMI_HBA_ATTR_OSNAMEVERSION_LEN,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
359 | "%s v%s",
| ~~~~~~~~~
360 | init_utsname()->sysname,
| ~~~~~~~~~~~~~~~~~~~~~~~~
361 | init_utsname()->release);
| ~~~~~~~~~~~~~~~~~~~~~~~~
Fix this by using "%.62s v%.62s", which ensures sysname and release are
truncated to fit within the 128-byte field defined by
FC_FDMI_HBA_ATTR_OSNAMEVERSION_LEN.
[mkp: clarified commit description]
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cd9a9562b2 ]
Currently, after the bridge is created, the FDB does not hold an FDB entry
for the bridge MAC on VLAN 0:
# ip link add name br up type bridge
# ip -br link show dev br
br UNKNOWN 92:19:8c:4e:01:ed <BROADCAST,MULTICAST,UP,LOWER_UP>
# bridge fdb show | grep 92:19:8c:4e:01:ed
92:19:8c:4e:01:ed dev br vlan 1 master br permanent
Later when the bridge MAC is changed, or in fact when the address is given
during netdevice creation, the entry appears:
# ip link add name br up address 00:11:22:33:44:55 type bridge
# bridge fdb show | grep 00:11:22:33:44:55
00:11:22:33:44:55 dev br vlan 1 master br permanent
00:11:22:33:44:55 dev br master br permanent
However when the bridge address is set by the user to the current bridge
address before the first port is enslaved, none of the address handlers
gets invoked, because the address is not actually changed. The address is
however marked as NET_ADDR_SET. Then when a port is enslaved, the address
is not changed, because it is NET_ADDR_SET. Thus the VLAN 0 entry is not
added, and it has not been added previously either:
# ip link add name br up type bridge
# ip -br link show dev br
br UNKNOWN 7e:f0:a8:1a:be:c2 <BROADCAST,MULTICAST,UP,LOWER_UP>
# ip link set dev br addr 7e:f0:a8:1a:be:c2
# ip link add name v up type veth
# ip link set dev v master br
# ip -br link show dev br
br UNKNOWN 7e:f0:a8:1a:be:c2 <BROADCAST,MULTICAST,UP,LOWER_UP>
# bridge fdb | grep 7e:f0:a8:1a:be:c2
7e:f0:a8:1a:be:c2 dev br vlan 1 master br permanent
Then when the bridge MAC is used as DMAC, and br_handle_frame_finish()
looks up an FDB entry with VLAN=0, it doesn't find any, and floods the
traffic instead of passing it up.
Fix this by simply adding the VLAN 0 FDB entry for the bridge itself always
on netdevice creation. This also makes the behavior consistent with how
ports are treated: ports always have an FDB entry for each member VLAN as
well as VLAN 0.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/415202b2d1b9b0899479a502bbe2ba188678f192.1758550408.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a890a2e339 ]
Theoretically it's an oopsable race, but I don't believe one can manage
to hit it on real hardware; might become doable on a KVM, but it still
won't be easy to attack.
Anyway, it's easy to deal with - since xdr_encode_hyper() is just a call of
put_unaligned_be64(), we can put that under ->d_lock and be done with that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bf75ad0968 ]
When client initialization goes through server trunking discovery, it
schedules the state manager and then sleeps waiting for nfs_client
initialization completion.
The state manager can fail during state recovery, and specifically in
lease establishment as nfs41_init_clientid() will bail out in case of
errors returned from nfs4_proc_create_session(), without ever marking
the client ready. The session creation can fail for a variety of reasons
e.g. during backchannel parameter negotiation, with status -EINVAL.
The error status will propagate all the way to the nfs4_state_manager
but the client status will not be marked, and thus the mount process
will remain blocked waiting.
Fix it by adding -EINVAL error handling to nfs4_state_manager().
Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit be390f9524 ]
RFC7530 states that clients should be prepared for the return of
NFS4ERR_GRACE errors for non-reclaim lock and I/O requests.
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 51cb93aa0c ]
Don't update DC stream color components during atomic check. The driver
will continue validating the new CRTC color state but will not change DC
stream color components. The DC stream color state will only be
programmed at commit time in the `atomic_setup_commit` stage.
It fixes gamma LUT loss reported by KDE users when changing brightness
quickly or changing Display settings (such as overscan) with nightlight
on and HDR. As KWin can do a test commit with color settings different
from those that should be applied in a non-test-only commit, if the
driver changes DC stream color state in atomic check, this state can be
eventually HW programmed in commit tail, instead of the respective state
set by the non-blocking commit.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4444
Reported-by: Xaver Hugl <xaver.hugl@gmail.com>
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f082daf08f ]
[Why]
Driver does not pick up and save vbios's clocks during init clocks,
the dispclk in clk_mgr will keep 0 until the first update clocks.
In some cases, OS changes the timing in the second set mode
(lower the pixel clock), causing the driver to lower the dispclk
in prepare bandwidth, which is illegal and causes grey screen.
[How]
1. Dump and save the vbios's clocks, and init the dispclk in
dcn314_init_clocks.
2. Fix the condition in dcn314_update_clocks, regarding a 0kHz value.
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Lo-an Chen <lo-an.chen@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c8bedab2d9 ]
[WHY]
Ensure AVI infoframe updates from stream updates are applied to the active
stream so OS overrides are not lost.
[HOW]
Copy avi_infopacket to stream when valid flag is set.
Follow existing infopacket copy pattern and perform a basic validity check before assignment.
Reviewed-by: Aric Cyr <aric.cyr@amd.com>
Signed-off-by: Karthi Kandasamy <karthi.kandasamy@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a0b977a3d1 ]
At reset, the KSZ8463 uses a strap-based configuration to set SPI as
bus interface. SPI is the only bus supported by the driver. If the
required pull-ups/pull-downs are missing (by mistake or by design to
save power) the pins may float and the configuration can go wrong
preventing any communication with the switch.
Introduce a ksz8463_configure_straps_spi() function called during the
device reset. It relies on the 'straps-rxd-gpios' OF property and the
'reset' pinmux configuration to enforce SPI as bus interface.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Bastien Curutchet (Schneider Electric) <bastien.curutchet@bootlin.com>
Link: https://patch.msgid.link/20250918-ksz-strap-pins-v3-3-16662e881728@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 50d51cef55 ]
Quoted from musl wiki:
GNU getopt permutes argv to pull options to the front, ahead of
non-option arguments. musl and the POSIX standard getopt stop
processing options at the first non-option argument with no
permutation.
Thus these scripts stop working on musl since non-option arguments for
tools using getopt() (in this case, (ar)ping) do not always come last.
Fix it by reordering arguments.
Signed-off-by: David Yang <mmyangfl@gmail.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250919053538.1106753-1-mmyangfl@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 299fad4133 ]
When a device is surprise-removed (e.g., due to a dock unplug), the PCI
core unconfigures all downstream devices and sets their error state to
pci_channel_io_perm_failure. This marks them as disconnected via
pci_dev_is_disconnected().
During device removal, the runtime PM framework may attempt to resume the
device to D0 via pm_runtime_get_sync(), which calls into pci_power_up().
Since the device is already disconnected, this resume attempt is
unnecessary and results in a predictable errors like this, typically when
undocking from a TBT3 or USB4 dock with PCIe tunneling:
pci 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
Avoid powering up disconnected devices by checking their status early in
pci_power_up() and returning -EIO.
Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
[bhelgaas: add typical message]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://patch.msgid.link/20250909031916.4143121-1-superm1@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 64b9642fc2 ]
When disabling SR-IOV, clear the configuration of each VF
in the hardware. Do not exit the configuration clearing process
due to the failure of a single VF. Additionally, Clear the VF
configurations before decrementing the PM counter.
Signed-off-by: Weili Qian <qianweili@huawei.com>
Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 85acd1b26b ]
Before the device reset, although the driver has set the queue
status to intercept doorbells sent by the task process, the reset
thread is isolated from the user-mode task process, so the task process
may still send doorbells. Therefore, before the reset, the queue is
directly invalidated, and the device directly discards the doorbells
sent by the process.
Signed-off-by: Weili Qian <qianweili@huawei.com>
Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 16df67f218 ]
The two implementers of vfio_device_ops.device_feature,
vfio_cdx_ioctl_feature and vfio_pci_core_ioctl_feature, return
-ENOTTY in the fallthrough case when the feature is unsupported. For
consistency, the base case, vfio_ioctl_device_feature, should do the
same when device_feature == NULL, indicating an implementation has no
feature extensions.
Signed-off-by: Alex Mastro <amastro@fb.com>
Link: https://lore.kernel.org/r/20250908-vfio-enotty-v1-1-4428e1539e2e@fb.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7205ef77df ]
Conventions for readsl() are the same as for readl() - any __iomem
pointer is acceptable, both const and volatile ones being OK. Same
for readsb() and readsw().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Andreas Larsson <andreas@gaisler.com> # Making sparc64 subject prefix
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bf7154ffb1 ]
EEE speed down means speed down MAC MCU clock. It is not from spec.
It is kind of Realtek specific power saving feature. But enable it
may cause some issues, like packet drop or interrupt loss. Different
hardware may have different issues.
EEE speed down ratio (mac ocp 0xe056[7:4]) is used to set EEE speed
down rate. The larger this value is, the more power can save. But it
actually save less power then we expected. And, as mentioned above,
will impact compatibility. So set it to 1 (mac ocp 0xe056[7:4] = 0)
, which means not to speed down, to improve compatibility.
Signed-off-by: ChunHao Lin <hau@realtek.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/20250918023425.3463-1-hau@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 99e9c5ffbb ]
Variable idx is set in the loop, but is never used resulting in dead
code. Building with GCC 16, which enables
-Werror=unused-but-set-parameter= by default results in build error.
This patch removes the idx parameter, since all the callers of the
fm10k_unbind_hw_stats_q as 0 as idx anyways.
Suggested-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Brahmajit Das <listout@listout.xyz>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 75c02a0376 ]
snprintf() returns the number of bytes that would have been written, not
the number actually written. Using this for offset tracking can cause
buffer overruns if truncation occurs.
Replace snprintf() with scnprintf() to ensure the offset stays within
bounds.
Since scnprintf() never returns a negative value, and zero is not possible
in this context because 'bytes' starts at 0 and 'size - bytes' is
DEBUG_BUFFER_SIZE in the first call, which is large enough to hold the
string literals used, the return value is always positive. An integer
overflow is also completely out of reach here due to the small and fixed
buffer size. The error check in latency_show_one() is therefore
unnecessary. Remove it and make dmar_latency_snapshot() return void.
Signed-off-by: Seyediman Seyedarab <ImanDevel@gmail.com>
Link: https://lore.kernel.org/r/20250731225048.131364-1-ImanDevel@gmail.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 60f887b129 ]
When a PHY is halted (e.g. `ip link set dev lan2 down`), several
fields in struct phy_device may still reflect the last active
connection. This leads to ethtool showing stale values even though
the link is down.
Reset selected fields in _phy_state_machine() when transitioning
to PHY_HALTED and the link was previously up:
- speed/duplex -> UNKNOWN, but only in autoneg mode (in forced mode
these fields carry configuration, not status)
- master_slave_state -> UNKNOWN if previously supported
- mdix -> INVALID (state only, same meaning as "unknown")
- lp_advertising -> always cleared
The cleanup is skipped if the PHY is in PHY_ERROR state, so the
last values remain available for diagnostics.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250917094751.2101285-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cc9a8e238e ]
kcalloc() may fail. When WS is non-zero and allocation fails, ectx.ws
remains NULL while ectx.ws_size is set, leading to a potential NULL
pointer dereference in atom_get_src_int() when accessing WS entries.
Return -ENOMEM on allocation failure to avoid the NULL dereference.
Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 300b072df7 ]
The transaction manager initialization in txInit() was not properly
initializing TxBlock[0].waitor waitqueue, causing a crash when
txEnd(0) is called on read-only filesystems.
When a filesystem is mounted read-only, txBegin() returns tid=0 to
indicate no transaction. However, txEnd(0) still gets called and
tries to access TxBlock[0].waitor via tid_to_tblock(0), but this
waitqueue was never initialized because the initialization loop
started at index 1 instead of 0.
This causes a 'non-static key' lockdep warning and system crash:
INFO: trying to register non-static key in txEnd
Fix by ensuring all transaction blocks including TxBlock[0] have
their waitqueues properly initialized during txInit().
Reported-by: syzbot+c4f3462d8b2ad7977bea@syzkaller.appspotmail.com
Signed-off-by: Shaurya Rane <ssrane_b23@ee.vjti.ac.in>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 42f993d343 ]
Currently, all master upper netdevices (e.g., bond, VRF) are treated
equally.
When a VRF netdevice is used over an IPoIB netdevice, the expected
netdev resolution is on the lower IPoIB device which has the IP address
assigned to it and not the VRF device.
The rdma_cm module (CMA) tries to match incoming requests to a
particular netdevice. When successful, it also validates that the return
path points to the same device by performing a routing table lookup.
Currently, the former would resolve to the VRF netdevice, while the
latter to the correct lower IPoIB netdevice, leading to failure in
rdma_cm.
Improve this by ignoring the VRF master netdevice, if it exists, and
instead return the lower IPoIB device.
Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Link: https://patch.msgid.link/20250916111103.84069-5-edwards@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bc2a5a12fa ]
Logically before a waiting side which has already timed out turns the
atomic status back to idle, a completing side could still pass atomic
condition and call complete. It will make the following H2C commands,
waiting C2H events, get a completion unexpectedly early. Hence, renew
a completion for each H2C command waiting a C2H event.
Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20250915065343.39023-1-pkshih@realtek.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 23361bd549 ]
When we get wrong extent info data, and look up extent_node in rb tree,
it will cause infinite loop (CONFIG_F2FS_CHECK_FS=n). Avoiding this by
return NULL and print some kernel messages in that case.
Signed-off-by: wangzijie <wangzijie1@honor.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 41cf11946b ]
Allow autosuspend to be used by xhci plat device. For Qualcomm SoCs,
when in host mode, it is intended that the controller goes to suspend
state to save power and wait for interrupts from connected peripheral
to wake it up. This is particularly used in cases where a HID or Audio
device is connected. In such scenarios, the usb controller can enter
auto suspend and resume action after getting interrupts from the
connected device.
Signed-off-by: Krishna Kurapati <krishna.kurapati@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250916120436.3617598-1-krishna.kurapati@oss.qualcomm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 368ed48a5e ]
The usbmon binary interface currently truncates captures of large
transfers from higher-speed USB devices. Because a single event capture
is limited to one-fifth of the total buffer size, the current maximum
size of a captured URB is around 240 KiB. This is insufficient when
capturing traffic from modern devices that use transfers of several
hundred kilobytes or more, as truncated URBs can make it impossible for
user-space USB analysis tools like Wireshark to properly defragment and
reassemble higher-level protocol packets in the captured data.
The root cause of this issue is the 1200 KiB BUFF_MAX limit, which has
not been changed since the binary interface was introduced in 2006.
To resolve this issue, this patch increases BUFF_MAX to 64 MiB. The
original comment for BUFF_MAX based the limit's calculation on a
saturated 480 Mbit/s bus. Applying the same logic to a modern USB 3.2
Gen 2×2 20 Gbit/s bus (~2500 MB/s over a 20ms window) indicates the
buffer should be at least 50 MB. The new limit of 64 MiB covers that,
plus a little extra for any overhead.
With this change, both users and developers should now be able to debug
and reverse engineer modern USB devices even when running unmodified
distro kernels.
Please note that this change does not affect the default buffer size. A
larger buffer is only allocated when a user explicitly requests it via
the MON_IOCT_RING_SIZE ioctl, so the change to the maximum buffer size
should not unduly increase memory usage for users that don't
deliberately request a larger buffer.
Link: https://lore.kernel.org/CAO3ALPzdUkmMr0YMrODLeDSLZqNCkWcAP8NumuPHLjNJ8wC1kQ@mail.gmail.com
Signed-off-by: Forest Crossman <cyrozap@gmail.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/CAO3ALPxU5RzcoueC454L=WZ1qGMfAcnxm+T+p+9D8O9mcrUbCQ@mail.gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2bf81856a4 ]
There is a timing race condition when a PRLI may be sent on the wire
before PLOGI_ACC in Point to Point topology. Fix by deferring REG_RPI
mbox completion handling to after PLOGI_ACC's CQE completion. Because
the discovery state machine only sends PRLI after REG_RPI mbox
completion, PRLI is now guaranteed to be sent after PLOGI_ACC.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Message-ID: <20250915180811.137530-8-justintee8345@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5de09770b1 ]
To assist in debugging lpfc_xri_rebalancing driver parameter, a debugfs
entry is used. The debugfs file operations for xri rebalancing have
been previously implemented, but lack definition for its information
buffer size. Similar to other pre-existing debugfs entry buffers,
define LPFC_HDWQINFO_SIZE as 8192 bytes.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Message-ID: <20250915180811.137530-9-justintee8345@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a4809b98eb ]
In lpfc_cleanup, there is an extraneous nlp_put for NPIV ports on the
F_Port_Ctrl ndlp object. In cases when an ABTS is issued, the
outstanding kref is needed for when a second XRI_ABORTED CQE is
received. The final kref for the ndlp is designed to be decremented in
lpfc_sli4_els_xri_aborted instead. Also, add a new log message to allow
for future diagnostics when debugging related issues.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Message-ID: <20250915180811.137530-5-justintee8345@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f408dde246 ]
If lpfc_reset_flush_io_context fails to execute, then the wrong return
status code may be passed back to upper layers when issuing a target
reset TMF command. Fix by checking the return status from
lpfc_reset_flush_io_context() first in order to properly return FAILED
or FAST_IO_FAIL.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Message-ID: <20250915180811.137530-7-justintee8345@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b5bf6d681f ]
The kref for Fabric_DID ndlps is not decremented after repeated FDISC
failures and exhausting maximum allowed retries. This can leave the
ndlp lingering unnecessarily. Add a test and set bit operation for the
NLP_DROPPED flag. If not previously set, then a kref is decremented. The
ndlp is freed when the remaining reference for the completing ELS is
put.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Message-ID: <20250915180811.137530-6-justintee8345@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 803dfd83df ]
lpfc_sli4_queue_setup() does not allocate memory and is used for
submitting CREATE_QUEUE mailbox commands. Thus, if such mailbox
commands fail we should clean up by also freeing the memory allocated
for the queues with lpfc_sli4_queue_destroy(). Change the intended
clean up label for the lpfc_sli4_queue_setup() error case to
out_destroy_queue.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Message-ID: <20250915180811.137530-4-justintee8345@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d7ddcf921e ]
Gang submission means that the kernel driver guarantees that multiple
submissions are executed on the HW at the same time on different engines.
Background is that those submissions then depend on each other and each
can't finish stand alone.
SRIOV now uses world switch to preempt submissions on the engines to allow
sharing the HW resources between multiple VFs.
The problem is now that the SRIOV world switch can't know about such inter
dependencies and will cause a timeout if it waits for a partially running
gang submission.
To conclude SRIOV and gang submissions are fundamentally incompatible at
the moment. For now just disable them.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3b09b11805 ]
Due to multiple explosion issues in the early days of the Xe driver,
the GuC load was hacked to never return a failure. That prevented
kernel panics and such initially, but now all it achieves is creating
more confusing errors when the driver tries to submit commands to a
GuC it already knows is not there. So fix that up.
As a stop-gap and to help with debug of load failures due to invalid
GuC init params, a wedge call had been added to the inner GuC load
function. The reason being that it leaves the GuC log accessible via
debugfs. However, for an end user, simply aborting the module load is
much cleaner than wedging and trying to continue. The wedge blocks
user submissions but it seems that various bits of the driver itself
still try to submit to a dead GuC and lots of subsequent errors occur.
And with regards to developers debugging why their particular code
change is being rejected by the GuC, it is trivial to either add the
wedge back in and hack the return code to zero again or to just do a
GuC log dump to dmesg.
v2: Add support for error injection testing and drop the now redundant
wedge call.
CC: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://lore.kernel.org/r/20250909224132.536320-1-John.C.Harrison@Intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7d62beb102 ]
Dell systems utilize an EC-based touchpad emulation when the ACPI
touchpad _DSM is not invoked. This emulation acts as a secondary
master on the I2C bus, designed for scenarios where the I2C touchpad
driver is absent, such as in BIOS menus. Typically, loading the
i2c-hid module triggers the _DSM at initialization, disabling the
EC-based emulation.
However, if the i2c-hid module is missing from the boot kernel
used for hibernation snapshot restoration, the _DSM remains
uncalled, resulting in dual masters on the I2C bus and
subsequent arbitration errors. This issue arises when i2c-hid
resides in the rootfs instead of the kernel or initramfs.
To address this, switch from using the SYSTEM_SLEEP_PM_OPS()
macro to dedicated callbacks, introducing a specific
callback for restoring the S4 image. This callback ensures
the _DSM is invoked.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d2d3f529e7 ]
A lot of modern SoC have the ability to store MAC addresses in their
NVMEM. So extend the generic function device_get_mac_address() to
obtain the MAC address from an nvmem cell named 'mac-address' in
case there is no firmware node which contains the MAC address directly.
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250912140332.35395-3-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f3b52167a0 ]
Driver authors often forget to add GFP_NOWARN for page allocation
from the datapath. This is annoying to users as OOMs are a fact
of life, and we pretty much expect network Rx to hit page allocation
failures during OOM. Make page pool add GFP_NOWARN for ATOMIC allocations
by default.
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250912161703.361272-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c97a7dccb3 ]
dml21_map_dc_state_into_dml_display_cfg calls (the call is usually
inlined by the compiler) populate_dml21_surface_config_from_plane_state
and populate_dml21_plane_config_from_plane_state which may use FPU. In
a x86-64 build:
$ objdump --disassemble=dml21_map_dc_state_into_dml_display_cfg \
> drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_translation_helper.o |
> grep %xmm -c
63
Thus it needs to be guarded with DC_FP_START. But we must note that the
current code quality of the in-kernel FPU use in AMD dml2 is very much
problematic: we are actually calling DC_FP_START in dml21_wrapper.c
here, and this translation unit is built with CC_FLAGS_FPU. Strictly
speaking this does not make any sense: with CC_FLAGS_FPU the compiler is
allowed to generate FPU uses anywhere in the translated code, perhaps
out of the DC_FP_START guard. This problematic pattern also occurs in
at least dml2_wrapper.c, dcn35_fpu.c, and dcn351_fpu.c. Thus we really
need a careful audit and refactor for the in-kernel FPU uses, and this
patch is simply whacking a mole. However per the reporter, whacking
this mole is enough to make a 9060XT "just work."
Reported-by: Asiacn <710187964@qq.com>
Closes: https://github.com/loongson-community/discussions/issues/102
Tested-by: Asiacn <710187964@qq.com>
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Reviewed-by: Huacai Chen <chenhuacai@loongson.cn>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 489f0f600c ]
When the EDID has the HDMI bit, we should simply select
the HDMI signal type even on DVI ports.
For reference see, the legacy amdgpu display code:
amdgpu_atombios_encoder_get_encoder_mode
which selects ATOM_ENCODER_MODE_HDMI for the same case.
This commit fixes DVI connectors to work with DVI-D/HDMI
adapters so that they can now produce output over these
connectors for HDMI monitors with higher bandwidth modes.
With this change, even HDMI audio works through DVI.
For testing, I used a CAA-DMDHFD3 DVI-D/HDMI adapter
with the following GPUs:
Tahiti (DCE 6) - DC can now output 4K 30 Hz over DVI
Polaris 10 (DCE 11.2) - DC can now output 4K 60 Hz over DVI
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 02a6c2e4b2 ]
[why&how]
small error in order of operations in immediateflipbytes
calculation on dml ms side that can result in dml ms
and mp mismatch immediateflip support for a given pipe
and thus an invalid hw state, correct the order to align
with mp.
Reviewed-by: Leo Chen <leo.chen@amd.com>
Signed-off-by: Ausef Yousof <Ausef.Yousof@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5e76bc677c ]
[Why]
fill_stream_properties_from_drm_display_mode() will not configure pixel
encoding to YCBCR422 when the DRM color format supports YCBCR422 but not
YCBCR420 or YCBCR4444. Instead it will fallback to RGB.
[How]
Add support for YCBCR422 in pixel encoding mapping.
Suggested-by: Mauri Carvalho <mcarvalho3@lenovo.com>
Reviewed-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Mario Limonciello <Mario.Limonciello@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dea75df7af ]
Replace kmalloc_array() + copy_from_user() with memdup_array_user().
This shrinks the source code and improves separation between the kernel
and userspace slabs.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8497324901 ]
If multiple instances of this driver are instantiated and try to send
concurrently then the single static buffer snd_serial_generic_tx_work()
will cause corruption in the data output.
Move the buffer into the per-instance driver data to avoid this.
Signed-off-by: John Keeping <jkeeping@inmusicbrands.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 25ef5b5d02 ]
Enable 160MHz beamformee support on mt7922 by updating HE capability
element configuration. Previously, only 160MHz channel width was set,
but beamformee for 160MHz was not properly advertised. This patch
adds BEAMFORMEE_MAX_STS_ABOVE_80MHZ_4 capability to allow devices
to utilize 160MHz BW for beamforming.
Tested by connecting to 160MHz-bandwidth beamforming AP and verified
HE capability.
Signed-off-by: Quan Zhou <quan.zhou@mediatek.com>
Link: https://patch.msgid.link/ae637afaffed387018fdc43709470ef65898ff0b.1756383627.git.quan.zhou@mediatek.com
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 66048f8b3c ]
During recent testing with the netem qdisc to inject delays into TCP
traffic, we observed that our CLS BPF program failed to function correctly
due to incorrect classid retrieval from task_get_classid(). The issue
manifests in the following call stack:
bpf_get_cgroup_classid+5
cls_bpf_classify+507
__tcf_classify+90
tcf_classify+217
__dev_queue_xmit+798
bond_dev_queue_xmit+43
__bond_start_xmit+211
bond_start_xmit+70
dev_hard_start_xmit+142
sch_direct_xmit+161
__qdisc_run+102 <<<<< Issue location
__dev_xmit_skb+1015
__dev_queue_xmit+637
neigh_hh_output+159
ip_finish_output2+461
__ip_finish_output+183
ip_finish_output+41
ip_output+120
ip_local_out+94
__ip_queue_xmit+394
ip_queue_xmit+21
__tcp_transmit_skb+2169
tcp_write_xmit+959
__tcp_push_pending_frames+55
tcp_push+264
tcp_sendmsg_locked+661
tcp_sendmsg+45
inet_sendmsg+67
sock_sendmsg+98
sock_write_iter+147
vfs_write+786
ksys_write+181
__x64_sys_write+25
do_syscall_64+56
entry_SYSCALL_64_after_hwframe+100
The problem occurs when multiple tasks share a single qdisc. In such cases,
__qdisc_run() may transmit skbs created by different tasks. Consequently,
task_get_classid() retrieves an incorrect classid since it references the
current task's context rather than the skb's originating task.
Given that dev_queue_xmit() always executes with bh disabled, we can use
softirq_count() instead to obtain the correct classid.
The simple steps to reproduce this issue:
1. Add network delay to the network interface:
such as: tc qdisc add dev bond0 root netem delay 1.5ms
2. Build two distinct net_cls cgroups, each with a network-intensive task
3. Initiate parallel TCP streams from both tasks to external servers.
Under this specific condition, the issue reliably occurs. The kernel
eventually dequeues an SKB that originated from Task-A while executing in
the context of Task-B.
It is worth noting that it will change the established behavior for a
slightly different scenario:
<sock S is created by task A>
<class ID for task A is changed>
<skb is created by sock S xmit and classified>
prior to this patch the skb will be classified with the 'new' task A
classid, now with the old/original one. The bpf_get_cgroup_classid_curr()
function is a more appropriate choice for this case.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250902062933.30087-1-laoar.shao@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9048beca9c ]
during entropy evaluation, if the generated samples fail
any statistical test, then, all of the bits will be discarded,
and a second set of samples will be generated and tested.
the entropy delay interval should be doubled before performing the
retry.
also, ctrlpriv->rng4_sh_init and inst_handles both reads RNG DRNG
status register, but only inst_handles is updated before every retry.
so only check inst_handles and removing ctrlpriv->rng4_sh_init
Signed-off-by: Gaurav Jain <gaurav.jain@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2b0dc40ac6 ]
payload_size field of the request header is incorrectly calculated using
sizeof(req). Since 'req' is a pointer (struct hsti_request *), sizeof(req)
returns the size of the pointer itself (e.g., 8 bytes on a 64-bit system),
rather than the size of the structure it points to. This leads to an
incorrect payload size being sent to the Platform Security Processor (PSP),
potentially causing the HSTI query command to fail.
Fix this by using sizeof(*req) to correctly calculate the size of the
struct hsti_request.
Signed-off-by: Yunseong Kim <ysk@kzalloc.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>> ---
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 09fefb24ed ]
dw_pcie_edma_irq_verify() is supposed to verify the eDMA IRQs in devicetree
by fetching them using either 'dma' or 'dmaX' IRQ names. Former is used
when the platform uses a single IRQ for all eDMA channels and latter is
used when the platform uses separate IRQ per channel. But currently,
dw_pcie_edma_irq_verify() bails out early if edma::nr_irqs is 1, i.e., when
a single IRQ is used. This gives an impression that the driver could work
with any single IRQ in devicetree, not necessarily with name 'dma'.
But dw_pcie_edma_irq_vector(), which actually requests the IRQ, does
require the single IRQ to be named as 'dma'. So this creates inconsistency
between dw_pcie_edma_irq_verify() and dw_pcie_edma_irq_vector().
Thus, to fix this inconsistency, make sure dw_pcie_edma_irq_verify() also
verifies the single IRQ name by removing the bail out code.
Signed-off-by: Niklas Cassel <cassel@kernel.org>
[mani: reworded subject and description]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: fix typos]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250908165914.547002-3-cassel@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9a23ea1f75 ]
Using the number of bytes in the request as DMA timeout is really
inconsistent, as large requests could possibly set a timeout of
hundreds of seconds.
Remove the per-channel timeout field and use a single, static DMA
timeout of 3 seconds for all requests.
Signed-off-by: Ovidiu Panait <ovidiu.panait.oss@gmail.com>
Tested-by: Corentin LABBE <clabbe.montjoie@gmail.com>
Reviewed-by: Corentin LABBE <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit df3c6e0b6d ]
Fix the issue of max_timeout being calculated larger than actual value.
The calculation result of freq / (S3C2410_WTCON_PRESCALE_MAX + 1) /
S3C2410_WTCON_MAXDIV is smaller than the actual value because the remainder
is discarded during the calculation process. This leads to a larger
calculated value for max_timeout compared to the actual settable value.
To resolve this issue, the order of calculations in the computation process
has been adjusted.
Reviewed-by: Sam Protsenko <semen.protsenko@linaro.org>
Signed-off-by: Sangwook Shin <sw617.shin@samsung.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b595974b4a ]
The Asus Z13 folio has a multitouch touchpad that needs to bind
to the hid-multitouch driver in order to work properly. So bind
it to the HID_GROUP_GENERIC group to release the touchpad and
move it to the bottom so that the comment applies to it.
While at it, change the generic KEYBOARD3 name to Z13_FOLIO.
Reviewed-by: Luke D. Jones <luke@ljones.dev>
Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dc2f650f7e ]
netdev_WARN() uses WARN/WARN_ON to print a backtrace along with
file and line information. In this case, udp_tunnel_nic_register()
returning an error is just a failed operation, not a kernel bug.
udp_tunnel_nic_register() can fail due to a memory allocation
failure (kzalloc() or udp_tunnel_nic_alloc()).
This is a normal runtime error and not a kernel bug.
Replace netdev_WARN() with netdev_warn() accordingly.
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250910195031.3784748-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 18282100d7 ]
tcp_recvmsg_dmabuf can export the following errors:
- EFAULT when linear copy fails
- ETOOSMALL when cmsg put fails
- ENODEV if one of the frags is readable
- ENOMEM on xarray failures
But they are all ignored and replaced by EFAULT in the caller
(tcp_recvmsg_locked). Expose real error to the userspace to
add more transparency on what specifically fails.
In non-devmem case (skb_copy_datagram_msg) doing `if (!copied)
copied=-EFAULT` is ok because skb_copy_datagram_msg can return only EFAULT.
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250910162429.4127997-1-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 43adad382e ]
When 8139too is probing and 8139TOO_PIO=y it will call pci_iomap_range()
and from there __pci_ioport_map() for the PCI IO space.
If HAS_IOPORT_MAP=n and NO_GENERIC_PCI_IOPORT_MAP=n, like it is on my
m68k config, __pci_ioport_map() becomes NULL, pci_iomap_range() will
always fail and the driver will complain it couldn't map the PIO space
and return an error.
NO_IOPORT_MAP seems to cover the case where what 8139too is trying
to do cannot ever work so make 8139TOO_PIO depend on being it false
and avoid creating an unusable driver.
Signed-off-by: Daniel Palmer <daniel@thingy.jp>
Link: https://patch.msgid.link/20250907064349.3427600-1-daniel@thingy.jp
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9605505039 ]
The commit b2798ba0b8 ("KVM: X86: Choose qspinlock when dedicated
physical CPUs are available") states that when PV_DEDICATED=1
(vCPU has dedicated pCPU), qspinlock should be preferred regardless of
PV_UNHALT. However, the current implementation doesn't reflect this: when
PV_UNHALT=0, we still use virt_spin_lock() even with dedicated pCPUs.
This is suboptimal because:
1. Native qspinlocks should outperform virt_spin_lock() for dedicated
vCPUs irrespective of HALT exiting
2. virt_spin_lock() should only be preferred when vCPUs may be preempted
(non-dedicated case)
So reorder the PV spinlock checks to:
1. First handle dedicated pCPU case (disable virt_spin_lock_key)
2. Second check single CPU, and nopvspin configuration
3. Only then check PV_UNHALT support
This ensures we always use native qspinlock for dedicated vCPUs, delivering
pretty performance gains at high contention levels.
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Tested-by: Wangyang Guo <wangyang.guo@intel.com>
Link: https://lore.kernel.org/r/20250722110005.4988-1-lirongqing@baidu.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit db99b2f2b3 ]
tcp reject code won't reply to a tcp reset.
But the icmp reject 'netdev' family versions will reply to icmp
dst-unreach errors, unlike icmp_send() and icmp6_send() which are used
by the inet family implementation (and internally by the REJECT target).
Check for the icmp(6) type and do not respond if its an unreachable error.
Without this, something like 'ip protocol icmp reject', when used
in a netdev chain attached to 'lo', cause a packet loop.
Same for two hosts that both use such a rule: each error packet
will be replied to.
Such situation persist until the (bogus) rule is amended to ratelimit or
checks the icmp type before the reject statement.
As the inet versions don't do this make the netdev ones follow along.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9d0cb6d00b ]
To ensure the proper functioning of the jump_label test module, this patch
adds support for the R_OR1K_32_PCREL relocation type for any modules. The
implementation calculates the PC-relative offset by subtracting the
instruction location from the target value and stores the result at the
specified location.
Signed-off-by: chenmiao <chenmiao.ku@gmail.com>
Signed-off-by: Stafford Horne <shorne@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c068ba9d3d ]
The test always returns success even if some tests were modified to
fail. Fix by converting the test to use the appropriate library
functions instead of using its own functions.
Before:
# ./traceroute.sh
TEST: IPV6 traceroute [FAIL]
TEST: IPV4 traceroute [ OK ]
Tests passed: 1
Tests failed: 1
$ echo $?
0
After:
# ./traceroute.sh
TEST: IPv6 traceroute [FAIL]
traceroute6 did not return 2000:102::2
TEST: IPv4 traceroute [ OK ]
$ echo $?
1
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250908073238.119240-5-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 47efbac9b7 ]
Use require_command() so that the test will return SKIP (4) when a
required command is not present.
Before:
# ./traceroute.sh
SKIP: Could not run IPV6 test without traceroute6
SKIP: Could not run IPV4 test without traceroute
$ echo $?
0
After:
# ./traceroute.sh
TEST: traceroute6 not installed [SKIP]
$ echo $?
4
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250908073238.119240-6-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ecba852dc9 ]
Change "ret" from u8 to int type in redrat3_enable_detector() to store
negative error codes or zero returned by redrat3_send_cmd() and
usb_submit_urb() - this better aligns with the coding standards and
maintains code consistency.
No effect on runtime.
Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 60e9f776b7 ]
The upstream mesa copy of the GPU regs has shifted more things to reg64
instead of seperate 32b HI/LO reg32's. This works better with the "new-
style" c++ builders that mesa has been migrating to for a6xx+ (to better
handle register shuffling between gens), but it leaves the C builders
with missing _HI/LO builders.
So handle the special case of reg64, automatically generating the
missing _HI/LO builders.
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/673559/
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f5ca8d0c7a ]
Disable auto-hibern8 during power mode transitions to prevent unintended
entry into auto-hibern8. Restore the original auto-hibern8 timer value
after completing the power mode change to maintain system stability and
prevent potential issues during power state transitions.
Signed-off-by: Peter Wang <peter.wang@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 77b96ef70b ]
Refine the system power management (PM) flow by skipping low power mode
(LPM) and MTCMOS settings if runtime PM is already applied. Prevent
redundant operations to ensure a more efficient PM process.
Signed-off-by: Peter Wang <peter.wang@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit deb105f498 ]
The 88e1510 PHY has an erratum where the phy downshift counter is not
cleared after phy being suspended(BMCR_PDOWN set) and then later
resumed(BMCR_PDOWN cleared). This can cause the gigabit link to
intermittently downshift to a lower speed.
Disabling and re-enabling the downshift feature clears the counter,
allowing the PHY to retry gigabit link negotiation up to the programmed
retry count times before downshifting. This behavior has been observed
on copper links.
Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com>
Reviewed-by: Matthew Gerlach <matthew.gerlach@altera.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250906-marvell_fix-v2-1-f6efb286937f@altera.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit faac32d4ec ]
Improve the recovery process for hibernation exit failures. Trigger the
error handler and break the suspend operation to ensure effective
recovery from hibernation errors. Activate the error handling mechanism
by ufshcd_force_error_recovery and scheduling the error handler work.
Signed-off-by: Peter Wang <peter.wang@mediatek.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 91cad911ed ]
Resolve the issue of unbalanced IRQ enablement by setting the
'is_mcq_intr_enabled' flag after the first successful IRQ enablement.
Ensure proper tracking of the IRQ state and prevent potential mismatches
in IRQ handling.
Signed-off-by: Peter Wang <peter.wang@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 15ef3f5aa8 ]
Improve the recovery process for failed resume operations. Log the
device's power status and return 0 if both resume and recovery fail to
prevent I/O hang.
Signed-off-by: Peter Wang <peter.wang@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bfbd5aa534 ]
The OmniVision OG01A1B image sensor is a monochrome sensor, it supports
8-bit and 10-bit RAW output formats only.
That said the planar greyscale Y8/Y10 media formats are more appropriate
for the sensor instead of the originally and arbitrary selected SGRBG one,
since there is no red, green or blue color components.
Signed-off-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d5f6bd3ee3 ]
Currently, the test allocates BAR sizes according to fixed table bar_size.
This does not work with controllers which have fixed size BARs that are
smaller than the requested BAR size. One such controller is Renesas R-Car
V4H PCIe controller, which has BAR4 size limited to 256 bytes, which is
much less than one of the BAR size, 131072 currently requested by this
test. A lot of controllers drivers in-tree have fixed size BARs, and they
do work perfectly fine, but it is only because their fixed size is larger
than the size requested by pci-epf-test.c
Adjust the test such that in case a fixed size BAR is detected, the fixed
BAR size is used, as that is the only possible option.
This helps with test failures reported as follows:
pci_epf_test pci_epf_test.0: requested BAR size is larger than fixed size
pci_epf_test pci_epf_test.0: Failed to allocate space for BAR4
Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
[mani: reworded description]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Link: https://patch.msgid.link/20250905184240.144431-1-marek.vasut+renesas@mailbox.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e28022873c ]
Currently, misc_deregister() uses list_del() to remove the device
from the list. After list_del(), the list pointers are set to
LIST_POISON1 and LIST_POISON2, which may help catch use-after-free bugs,
but does not reset the list head.
If misc_deregister() is called more than once on the same device,
list_empty() will not return true, and list_del() may be called again,
leading to undefined behavior.
Replace list_del() with list_del_init() to reinitialize the list head
after deletion. This makes the code more robust against double
deregistration and allows safe usage of list_empty() on the miscdevice
after deregistration.
[ Note, this seems to keep broken out-of-tree drivers from doing foolish
things. While this does not matter for any in-kernel drivers,
external drivers could use a bit of help to show them they shouldn't
be doing stuff like re-registering misc devices - gregkh ]
Signed-off-by: Xion Wang <xion.wang@mediatek.com>
Link: https://lore.kernel.org/r/20250904063714.28925-2-xion.wang@mediatek.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 87c5ff5615 ]
In the __cdnsp_gadget_init() and cdnsp_gadget_exit() functions, the gadget
structure (pdev->gadget) was freed before its endpoints.
The endpoints are linked via the ep_list in the gadget structure.
Freeing the gadget first leaves dangling pointers in the endpoint list.
When the endpoints are subsequently freed, this results in a use-after-free.
Fix:
By separating the usb_del_gadget_udc() operation into distinct "del" and
"put" steps, cdnsp_gadget_free_endpoints() can be executed prior to the
final release of the gadget structure with usb_put_gadget().
A patch similar to bb9c74a5bd14("usb: dwc3: gadget: Free gadget structure
only after freeing endpoints").
Signed-off-by: Chen Yufeng <chenyufeng@iie.ac.cn>
Link: https://lore.kernel.org/r/20250905094842.1232-1-chenyufeng@iie.ac.cn
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6f616757dd ]
The "usxgmii" phy-mode that the Felix switch ports support on LS1028A is
not quite USXGMII, it is defined by the USXGMII multiport specification
document as 10G-QXGMII. It uses the same signaling as USXGMII, but it
multiplexes 4 ports over the link, resulting in a maximum speed of 2.5G
per port.
This change is needed in preparation for the lynx-10g SerDes driver on
LS1028A, which will make a more clear distinction between usxgmii
(supported on lane 0) and 10g-qxgmii (supported on lane 1). These
protocols have their configuration in different PCCR registers (PCCRB vs
PCCR9).
Continue parsing and supporting single-port-per-lane USXGMII when found
in the device tree as usual (because it works), but add support for
10G-QXGMII too. Using phy-mode = "10g-qxgmii" will be required when
modifying the device trees to specify a "phys" phandle to the SerDes
lane. The result when the "phys" phandle is present but the phy-mode is
wrong is undefined.
The only PHY driver in known use with this phy-mode, AQR412C, will gain
logic to transition from "usxgmii" to "10g-qxgmii" in a future change.
Prepare the driver by also setting PHY_INTERFACE_MODE_10G_QXGMII in
supported_interfaces when PHY_INTERFACE_MODE_USXGMII is there, to
prevent breakage with existing device trees.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250903130730.2836022-3-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 12cdfb61b3 ]
[Why]
dm_mst_get_pbn_divider() returns value integer coming from
the cast from fixed point, but the casted integer will then be used
in dfixed_const to be multiplied by 4096. The cast from fixed point to integer
causes the calculation error becomes bigger when multiplied by 4096.
That makes the calculated pbn_div value becomes smaller than
it should be, which leads to the req_slot number becomes bigger.
Such error is getting reflected in 8k30 timing,
where the correct and incorrect calculated req_slot 62.9 Vs 63.1.
That makes the wrong calculation failed to light up 8k30
after a dock under HBR3 x 4.
[How]
Restore the accuracy by keeping the fraction part
calculated for the left shift operation.
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fa819e3a7c ]
Some SOCs which are part of the cyan skillfish family
rely on an explicit firmware for IP discovery. Add support
for the gpu_info firmware.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9be15fbfc6 ]
After a panic if SNP is enabled in the previous kernel then the kdump
kernel boots with IOMMU SNP enforcement still enabled.
IOMMU command buffers and event buffer registers remain locked and
exclusive to the previous kernel. Attempts to enable command and event
buffers in the kdump kernel will fail, as hardware ignores writes to
the locked MMIO registers as per AMD IOMMU spec Section 2.12.2.1.
Skip enabling command buffers and event buffers for kdump boot as they
are already enabled in the previous kernel.
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Tested-by: Sairaj Kodilkar <sarunkod@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Link: https://lore.kernel.org/r/576445eb4f168b467b0fc789079b650ca7c5b037.1756157913.git.ashish.kalra@amd.com
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2506af5f81 ]
The GuC communication protocol allows GuC to send NO_RESPONSE_RETRY
reply message to indicate that due to some interim condition it can
not handle incoming H2G request and the host shall resend it.
But in some cases, due to errors, this unsatisfied condition might
be final and this could lead to endless retries as it was recently
seen on the CI:
[drm] GT0: PF: VF1 FLR didn't finish in 5000 ms (-ETIMEDOUT)
[drm] GT0: PF: VF1 resource sanitizing failed (-ETIMEDOUT)
[drm] GT0: PF: VF1 FLR failed!
[drm:guc_ct_send_recv [xe]] GT0: H2G action 0x5503 retrying: reason 0x0
[drm:guc_ct_send_recv [xe]] GT0: H2G action 0x5503 retrying: reason 0x0
[drm:guc_ct_send_recv [xe]] GT0: H2G action 0x5503 retrying: reason 0x0
[drm:guc_ct_send_recv [xe]] GT0: H2G action 0x5503 retrying: reason 0x0
To avoid such dangerous loops allow only limited number of retries
(for now 50) and add some delays (n * 5ms) to slow down the rate of
resending this repeated request.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com>
Link: https://lore.kernel.org/r/20250903223330.6408-1-michal.wajdeczko@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c221cbf8dc ]
When the 3.3Vaux supply is present, fetch it at the probe time and keep it
enabled for the entire PCIe controller lifecycle so that the link can enter
L2 state and the devices can signal wakeup using either Beacon or WAKE#
mechanisms.
Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
[mani: reworded the subject, description and error message]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20250820022328.2143374-1-hongxing.zhu@nxp.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e53f8b12a2 ]
Currently, when adding the 6 GHz Band Capabilities element, the channel
list of the wiphy is checked to determine if 6 GHz is supported for a given
virtual interface. However, in a multi-radio wiphy (e.g., one that has
both lower bands and 6 GHz combined), the wiphy advertises support for
all bands. As a result, the 6 GHz Band Capabilities element is incorrectly
included in mesh beacon and station's association request frames of
interfaces operating in lower bands, without verifying whether the
interface is actually operating in a 6 GHz channel.
Fix this by verifying if the interface operates on 6 GHz channel
before adding the element. Note that this check cannot be placed
directly in ieee80211_put_he_6ghz_cap() as the same function is used to
add probe request elements while initiating scan in which case the
interface may not be operating in any band's channel.
Signed-off-by: Ramya Gnanasekar <ramya.gnanasekar@oss.qualcomm.com>
Signed-off-by: Rameshkumar Sundaram <rameshkumar.sundaram@oss.qualcomm.com>
Link: https://patch.msgid.link/20250606104436.326654-1-rameshkumar.sundaram@oss.qualcomm.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 91c5d7c849 ]
The .querystd callback should not program the device with the detected
standard, it should only report the standard to user-space. User-space
may then use .s_std to set the standard, if it wants to use it.
All that is required of .querystd is to setup the auto detection of
standards and report its findings.
While at it add some documentation on why this can't happen while
streaming and improve the error handling using a scoped guard.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 46c1e7814d ]
The .set_fmt callback should not write the new format directly do the
device, it should only store it and have it applied by .s_stream.
The .s_stream callback already calls adv7180_set_field_mode() so it's
safe to remove programming of the device and just store the format and
have .s_stream apply it.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 878c496ac5 ]
The adv7180_set_power() utilizes adv7180_write() which in turn requires
the state mutex to be held, take it before calling adv7180_set_power()
to avoid tripping a lockdep_assert_held().
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 21f82062d0 ]
An exchange with a NFC target must complete within NCI_DATA_TIMEOUT.
A delay of 700 ms is not sufficient for cryptographic operations on smart
cards. CardOS 6.0 may need up to 1.3 seconds to perform 256-bit ECDH
or 3072-bit RSA. To prevent brute-force attacks, passports and similar
documents introduce even longer delays into access control protocols
(BAC/PACE).
The timeout should be higher, but not too much. The expiration allows
us to detect that a NFC target has disappeared.
Signed-off-by: Juraj Šarinay <juraj@sarinay.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20250902113630.62393-1-juraj@sarinay.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 85705b18ae ]
The kfd CRIU checkpoint ioctl would return an error if trying
to checkpoint a process with no kfd buffer objects.
This is a normal case and should not be an error.
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: David Francis <David.Francis@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5ddcb0cb9d ]
Driver unconditionally saves current state on first init in
dsi_pll_7nm_init(), but does not save the VCO rate, only some of the
divider registers. The state is then restored during probe/enable via
msm_dsi_phy_enable() -> msm_dsi_phy_pll_restore_state() ->
dsi_7nm_pll_restore_state().
Restoring calls dsi_pll_7nm_vco_set_rate() with
pll_7nm->vco_current_rate=0, which basically overwrites existing rate of
VCO and messes with clock hierarchy, by setting frequency to 0 to clock
tree. This makes anyway little sense - VCO rate was not saved, so
should not be restored.
If PLL was not configured configure it to minimum rate to avoid glitches
and configuring entire in clock hierarchy to 0 Hz.
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Patchwork: https://patchwork.freedesktop.org/patch/657827/
Link: https://lore.kernel.org/r/20250610-b4-sm8750-display-v6-9-ee633e3ddbff@linaro.org
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5e742de97c ]
DMA Engine has support for the callback_result which provides
the status of the request and the residue. This helps in
determining the correct status of the request and in
efficient resource management of the request.
The 'callback_result' method is preferred over the deprecated
'callback' method.
Signed-off-by: Devendra K Verma <devverma@amd.com>
Link: https://lore.kernel.org/r/20250821121505.318179-1-devverma@amd.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5a8c02a6bf ]
Networking drivers implementing PTP clocks and kernel socket code
handling hardware timestamps use the 64-bit signed ktime_t type counting
nanoseconds. When a PTP clock reaches the maximum value in year 2262,
the timestamps returned to applications will overflow into year 1667.
The same thing happens when injecting a large offset with
clock_adjtime(ADJ_SETOFFSET).
The commit 7a8e61f847 ("timekeeping: Force upper bound for setting
CLOCK_REALTIME") limited the maximum accepted value setting the system
clock to 30 years before the maximum representable value (i.e. year
2232) to avoid the overflow, assuming the system will not run for more
than 30 years.
Enforce the same limit for PTP clocks. Don't allow negative values and
values closer than 30 years to the maximum value. Drivers may implement
an even lower limit if the hardware registers cannot represent the whole
interval between years 1970 and 2262 in the required resolution.
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <jstultz@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250828103300.1387025-1-mlichvar@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 34c21e9119 ]
Introduce support for standard MII ioctl operations in the LAN865x
Ethernet driver by implementing the .ndo_eth_ioctl callback. This allows
PHY-related ioctl commands to be handled via phy_do_ioctl_running() and
enables support for ethtool and other user-space tools that rely on ioctl
interface to perform PHY register access using commands like SIOCGMIIREG
and SIOCSMIIREG.
This feature enables improved diagnostics and PHY configuration
capabilities from userspace.
Signed-off-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250828114549.46116-1-parthiban.veerasooran@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b0d04fe6a6 ]
Bindig requires a node name matching ‘^gpio@[0-9a-f]+$’. This patch
changes the clock name from “stp” to “gpio”.
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 585b2f685c ]
Update the legacy (non-DC) display code to respect the maximum
pixel clock for HDMI and DVI-D. Reject modes that would require
a higher pixel clock than can be supported.
Also update the maximum supported HDMI clock value depending on
the ASIC type.
For reference, see the DC code:
check max_hdmi_pixel_clock in dce*_resource.c
v2:
Fix maximum clocks for DVI-D and DVI/HDMI adapters.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 306cbcc6f6 ]
[Why & How]
Previously, when calculating dto phase, we would incorrectly fail when phase
<=0 without additionally checking for the integer value. This meant that
calculations would incorrectly fail when the desired pixel clock was an exact
multiple of the reference clock.
Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Clay King <clayking@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 002a612023 ]
[Why]
-Pipe splitting allows for clocks to be reduced, but when using TMDS 420,
reduced clocks lead to missed clocks cycles on clock resyncing
[How]
-Impose a minimum clock when using TMDS 420
Reviewed-by: Chris Park <chris.park@amd.com>
Signed-off-by: Relja Vojvodic <rvojvodi@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0750649b52 ]
Compare the whole v4l2_bt_timings struct, not just the width/height when
setting new timings. Timings with the same resolution and different
pixelclock can now be properly set.
Signed-off-by: Martin Tůma <martin.tuma@digiteqautomotive.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c158b5a570 ]
Commit 0d6ccfe6b3 ("selftests: drv-net: rss_ctx: check for all-zero keys")
added a skip exception if NIC has fewer than 3 queues enabled,
but it's just constructing the object, it's not actually rising
this exception.
Before:
# Exception| net.lib.py.utils.CmdExitFailure: Command failed: ethtool -X enp1s0 equal 3 hkey d1:cc:77:47:9d:ea:15:f2:b9:6c:ef:68:62:c0:45:d5:b0:99:7d:cf:29:53:40:06:3d:8e:b9:bc:d4:70:89:b8:8d:59:04:ea:a9:c2:21:b3:55:b8:ab:6b:d9:48:b4:bd:4c:ff:a5:f0:a8:c2
not ok 1 rss_ctx.test_rss_key_indir
After:
ok 1 rss_ctx.test_rss_key_indir # SKIP Device has fewer than 3 queues (or doesn't support queue stats)
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250827173558.3259072-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6d47b4f084 ]
A partitioned system configured with only one package and one compute
die, warning will be generated for duplicate sysfs entry. This typically
occurs during the platform bring-up phase.
Partitioned systems expose dies, equivalent to TPMI compute domains,
through the CPUID. Each partitioned system must contains at least one
compute die per partition, resulting in a minimum of two dies per
package. Hence the function topology_max_dies_per_package() returns at
least two, and the condition "topology_max_dies_per_package() > 1"
prevents the creation of a root domain.
In this case topology_max_dies_per_package() will return 1 and root
domain will be created for partition 0 and a duplicate sysfs warning
for partition 1 as both partitions have same package ID.
To address this also check for non zero partition in addition to
topology_max_dies_per_package() > 1.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20250819211034.3776284-1-srinivas.pandruvada@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f3820e9d35 ]
When KFD asks CP to preempt queues, other than preempt CP queues, CP
also requests SDMA to preempt SDMA queues with UNMAP_LATENCY timeout.
Currently queue_preemption_timeout_ms is 9000 ms by default but can be
configured via module parameter. KFD_UNMAP_LATENCY_MS is hard coded as
4000 ms though. This patch ties KFD_UNMAP_LATENCY_MS to
queue_preemption_timeout_ms so in a slow system such as emulator, both
CP and SDMA slowness are taken into account.
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8a359f0f13 ]
[Why]
For the HW cursor, its current position in the pipe_ctx->stream struct is
not affected by the 180 rotation, i. e. the top left corner is still at
0,0. However, the DPP & HUBP set_cursor_position functions require rotated
position.
The current approach is hard-coded for ODM 2:1, thus it's failing for
ODM 4:1, resulting in a double cursor.
[How]
Instead of calculating the new cursor position relatively to the
viewports, we calculate it using a viewavable clip_rect of each plane.
The clip_rects are first offset and scaled to the same space as the
src_rect, i. e. Stream space -> Plane space.
In case of a pipe split, which divides the plane into 2 or more viewports,
the clip_rect is the union of all the viewports of the given plane.
With the assumption that the viewports in HUBP's set_cursor_position are
in the Plane space as well, it should produce a correct cursor position
for any number of pipe splits.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Signed-off-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 93aa919ca0 ]
When it only allocates vram without va, which is 0, and a
SVM range allocated stays in this range, the vram allocation
returns failure. It should be skipped for this case from
SVM usage check.
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a0f849c1cc ]
fixed_phy_register() creates and registers the phy_device. To be
symmetric, we should not only unregister, but also free the phy_device
in fixed_phy_unregister(). This allows to simplify code in users.
Note wrt of_phy_deregister_fixed_link():
put_device(&phydev->mdio.dev) and phy_device_free(phydev) are identical.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/ad8dda9a-10ed-4060-916b-3f13bdbb899d@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1b8c5fa0cb ]
Currently, ip_extract_route_hint uses RTN_BROADCAST to decide
whether to use the route dst hint mechanism.
This check is too strict, as it prevents directed broadcast
routes from using the hint, resulting in poor performance
during bursts of directed broadcast traffic.
Fix this in ip_extract_route_hint and modify ip_route_use_hint
to preserve the intended behaviour.
Signed-off-by: Oscar Maes <oscmaes92@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250819174642.5148-2-oscmaes92@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6e29c30d8d ]
Module aliases are used by userspace to identify the correct module to
load for a detected hardware. The currently supported RPMSG device IDs for
this module include "rpmsg-raw", but the module alias is "rpmsg_chrdev".
Use the helper macro MODULE_DEVICE_TABLE(rpmsg) to export the correct
supported IDs. And while here, to keep backwards compatibility we also add
the other ID "rpmsg_chrdev" so that it is also still exported as an alias.
This has the side benefit of adding support for some legacy firmware
which still uses the original "rpmsg_chrdev" ID. This was the ID used for
this driver before it was upstreamed (as reflected by the module alias).
Signed-off-by: Andrew Davis <afd@ti.com>
Acked-by: Hari Nagalla <hnagalla@ti.com>
Tested-by: Hari Nagalla <hnagalla@ti.com>
Link: https://lore.kernel.org/r/20250619205722.133827-1-afd@ti.com
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f5a2826cd5 ]
The IPU6 ISYS driver supported metadata formats but was missing correct
embedded data type in the receiver configuration. Add it now.
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2aec0b6a6b ]
Just add fixed struct size validations for UAC2 and UAC3 effect
units. The descriptor has a variable-length array, so it should be
validated with a proper function later once when the unit is really
parsed and used by the driver (currently only referred partially for
the input terminal parsing).
Link: https://patch.msgid.link/20250821151751.12100-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6238784e50 ]
The error handling path in pci_p2pdma_add_resource() contains a bug in its
`pgmap_free` label.
Memory is allocated for the `p2p_pgmap` struct, and the pointer is stored
in `p2p_pgmap`. However, the error path calls devm_kfree() with `pgmap`,
which is a pointer to a member field within the `p2p_pgmap` struct, not the
base pointer of the allocation.
Correct the bug by passing the correct base pointer, `p2p_pgmap`, to
devm_kfree().
Signed-off-by: Sungho Kim <sungho.kim@furiosa.ai>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Link: https://patch.msgid.link/20250820105714.2939896-1-sungho.kim@furiosa.ai
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8fc6056dcf ]
As reported, on-disk footer.ino and footer.nid is the same and
out-of-range, let's add sanity check on f2fs_alloc_nid() to detect
any potential corruption in free_nid_list.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9d85c565a7 ]
Initially, trace_sock_exceed_buf_limit() was invoked when
__sk_mem_raise_allocated() failed due to the memcg limit or the
global limit.
However, commit d6f19938eb ("net: expose sk wmem in
sock_exceed_buf_limit tracepoint") somehow suppressed the event
only when memcg failed to charge for SK_MEM_RECV, although the
memcg failure for SK_MEM_SEND still triggers the event.
Let's restore the event for SK_MEM_RECV.
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Link: https://patch.msgid.link/20250815201712.1745332-5-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ee0aace5f8 ]
The stmmac_rx function would previously set skb->ip_summed to
CHECKSUM_UNNECESSARY if hardware checksum offload (CoE) was enabled
and the packet was of a known IP ethertype.
However, this logic failed to check if the hardware had actually
reported a checksum error. The hardware status, indicating a header or
payload checksum failure, was being ignored at this stage. This could
cause corrupt packets to be passed up the network stack as valid.
This patch corrects the logic by checking the `csum_none` status flag,
which is set when the hardware reports a checksum error. If this flag
is set, skb->ip_summed is now correctly set to CHECKSUM_NONE,
ensuring the kernel's network stack will perform its own validation and
properly handle the corrupt packet.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20250818090217.2789521-2-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b0ac6d3b56 ]
When removing a nexthop, commit
90f33bffa3 ("nexthops: don't modify published nexthop groups") added a
call to synchronize_rcu() (later changed to _net()) to make sure
everyone sees the new nexthop-group before the rtnl-lock is released.
When one wants to delete a large number of groups and nexthops, it is
fastest to first flush the groups (ip nexthop flush groups) and then
flush the nexthops themselves (ip -6 nexthop flush). As that way the
groups don't need to be rebalanced.
However, `ip -6 nexthop flush` will still take a long time if there is
a very large number of nexthops because of the call to
synchronize_net(). Now, if there are no more groups, there is no point
in calling synchronize_net(). So, let's skip that entirely by checking
if nh->grp_list is empty.
This gives us a nice speedup:
BEFORE:
=======
$ time sudo ip -6 nexthop flush
Dump was interrupted and may be inconsistent.
Flushed 2097152 nexthops
real 1m45.345s
user 0m0.001s
sys 0m0.005s
$ time sudo ip -6 nexthop flush
Dump was interrupted and may be inconsistent.
Flushed 4194304 nexthops
real 3m10.430s
user 0m0.002s
sys 0m0.004s
AFTER:
======
$ time sudo ip -6 nexthop flush
Dump was interrupted and may be inconsistent.
Flushed 2097152 nexthops
real 0m17.545s
user 0m0.003s
sys 0m0.003s
$ time sudo ip -6 nexthop flush
Dump was interrupted and may be inconsistent.
Flushed 4194304 nexthops
real 0m35.823s
user 0m0.002s
sys 0m0.004s
Signed-off-by: Christoph Paasch <cpaasch@openai.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250816-nexthop_dump-v2-2-491da3462118@openai.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 52e2bb5ff0 ]
For miscdevice who wants dynamic minor, it may fail to be registered again
without reinitialization after being de-registered, which is illustrated
by kunit test case miscdev_test_dynamic_reentry() newly added.
There is a real case found by cascardo when a part of minor range were
contained by range [0, 255):
1) wmi/dell-smbios registered minor 122, and acpi_thermal_rel registered
minor 123
2) unbind "int3400 thermal" driver from its device, this will de-register
acpi_thermal_rel
3) rmmod then insmod dell_smbios again, now wmi/dell-smbios is using minor
123
4) bind the device to "int3400 thermal" driver again, acpi_thermal_rel
fails to register.
Some drivers may reuse the miscdevice structure after they are deregistered
If the intention is to allocate a dynamic minor, if the minor number is not
reset to MISC_DYNAMIC_MINOR before calling misc_register(), it will try to
register a previously dynamically allocated minor number, which may have
been registered by a different driver.
One such case is the acpi_thermal_rel misc device, registered by the
int3400 thermal driver. If the device is unbound from the driver and later
bound, if there was another dynamic misc device registered in between, it
would fail to register the acpi_thermal_rel misc device. Other drivers
behave similarly.
Actually, this kind of issue is prone to happen if APIs
misc_register()/misc_deregister() are invoked by driver's
probe()/remove() separately.
Instead of fixing all the drivers, just reset the minor member to
MISC_DYNAMIC_MINOR in misc_deregister() in case it was a dynamically
allocated minor number, as error handling of misc_register() does.
Cc: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Signed-off-by: Zijun Hu <zijun.hu@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250714-rfc_miscdev-v6-5-2ed949665bde@oss.qualcomm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 499cbe0f2f ]
Mark dm error as DM_TARGET_PASSES_INTEGRITY so that it can be stacked on
top of PI capable devices. The claim is strictly speaking as lie as dm
error fails all I/O and doesn't pass anything on, but doing the same for
integrity I/O work just fine :)
This helps to make about two dozen xfstests test cases pass on PI capable
devices.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 956606bafb ]
This fix is already present in f_ecm.c and was never
propagated to f_ncm.c
When creating multiple NCM ethernet devices
on a composite usb gadget device
each MAC address on the HOST side will be identical.
Having the same MAC on different network interfaces is bad.
This fix updates the MAC address inside the
ncm_strings_defs global during the ncm_bind call.
This ensures each device has a unique MAC.
In f_ecm.c ecm_string_defs is updated in the same way.
The defunct MAC assignment in ncm_alloc has been removed.
Signed-off-by: raub camaioni <raubcameo@gmail.com>
Link: https://lore.kernel.org/r/20250815131358.1047525-1-raubcameo@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 12c9b09e98 ]
ADC calibration might fail because of the noise on reference voltage.
To avoid calibration fail, need to meet the following requirement:
ADC reference voltage Noise < 1.8V * 1/2^ENOB
For the case which the ADC reference voltage on board do not meet
the requirement, still load the calibrated values, so ADC can also
work but maybe not that accurate.
Signed-off-by: Haibo Chen <haibo.chen@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Primoz Fiser <primoz.fiser@norik.com>
Link: https://patch.msgid.link/20250812-adc-v2-2-0260833f13b8@nxp.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0ed704d058 ]
HMM assumes that pages have READ permissions by default. Inside
svm_range_validate_and_map, we add READ permissions then add WRITE
permissions if the VMA isn't read-only. This will conflict with regions
that only have PROT_WRITE or have PROT_NONE. When that happens,
svm_range_restore_work will continue to retry, silently, giving the
impression of a hang if pr_debug isn't enabled to show the retries..
If pages don't have READ permissions, simply unmap them and continue. If
they weren't mapped in the first place, this would be a no-op. Since x86
doesn't support write-only, and PROT_NONE doesn't allow reads or writes
anyways, this will allow the svm range validation to continue without
getting stuck in a loop forever on mappings we can't use with HMM.
Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 859958a7fa ]
If a amdgpu_bo_va is fpriv->prt_va, the bo of this one is always NULL.
So, such kind of amdgpu_bo_va should be updated separately before
amdgpu_vm_handle_moved.
Signed-off-by: Heng Zhou <Heng.Zhou@amd.com>
Reviewed-by: Kasiviswanathan, Harish <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cb640b2ca5 ]
Detecting the monitor for DisplayPort targets is more complicated than
just reading the HPD pin level: it requires reading the DPCD in order to
check what kind of device is attached to the port and whether there is
an actual display attached.
In order to let DRM framework handle such configurations, disable
DRM_BRIDGE_OP_DETECT for dp-connector devices, letting the actual DP
driver perform detection. This still keeps DRM_BRIDGE_OP_HPD enabled, so
it is valid for the bridge to report HPD events.
Currently inside the kernel there are only two targets which list
hpd-gpios for dp-connector devices: arm64/qcom/qcs6490-rb3gen2 and
arm64/qcom/sa8295p-adp. Both should be fine with this change.
Cc: Bjorn Andersson <andersson@kernel.org>
Cc: Konrad Dybcio <konradybcio@kernel.org>
Cc: linux-arm-msm@vger.kernel.org
Acked-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Link: https://lore.kernel.org/r/20250802-dp-conn-no-detect-v1-1-2748c2b946da@oss.qualcomm.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c2dc9f0b36 ]
Fixes force feedback for devices built with MMOS firmware and many more
not yet detected devices.
Update quirks mask debug message to always contain all 32 bits of data.
Signed-off-by: Tomasz Pakuła <tomasz.pakula.oficjalny@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f345a4798d ]
The already fixed bug in SDL only affected conditional effects. This
should fix FFB in Forza Horizion 4/5 on Moza Devices as Forza Horizon
flips the constant force direction instead of using negative magnitude
values.
Changing the direction in the effect directly in pidff_upload_effect()
would affect it's value in further operations like comparing to the old
effect and/or just reading the effect values in the user application.
This, in turn, would lead to constant PID_SET_EFFECT spam as the effect
direction would constantly not match the value that's set by the
application.
This way, it's still transparent to any software/API.
Only affects conditional effects now so it's better for it to explicitly
state that in the name. If any HW ever needs fixed direction for other
effects, we'll add more quirks.
Signed-off-by: Tomasz Pakuła <tomasz.pakula.oficjalny@gmail.com>
Reviewed-by: Oleg Makarenko <oleg@makarenk.ooo>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit eecd203ada ]
syzbot is reporting that imon has three problems which result in
hung tasks due to forever holding device lock [1].
First problem is that when usb_rx_callback_intf0() once got -EPROTO error
after ictx->dev_present_intf0 became true, usb_rx_callback_intf0()
resubmits urb after printk(), and resubmitted urb causes
usb_rx_callback_intf0() to again get -EPROTO error. This results in
printk() flooding (RCU stalls).
Alan Stern commented [2] that
In theory it's okay to resubmit _if_ the driver has a robust
error-recovery scheme (such as giving up after some fixed limit on the
number of errors or after some fixed time has elapsed, perhaps with a
time delay to prevent a flood of errors). Most drivers don't bother to
do this; they simply give up right away. This makes them more
vulnerable to short-term noise interference during USB transfers, but in
reality such interference is quite rare. There's nothing really wrong
with giving up right away.
but imon has a poor error-recovery scheme which just retries forever;
this behavior should be fixed.
Since I'm not sure whether it is safe for imon users to give up upon any
error code, this patch takes care of only union of error codes chosen from
modules in drivers/media/rc/ directory which handle -EPROTO error (i.e.
ir_toy, mceusb and igorplugusb).
Second problem is that when usb_rx_callback_intf0() once got -EPROTO error
before ictx->dev_present_intf0 becomes true, usb_rx_callback_intf0() always
resubmits urb due to commit 8791d63af0 ("[media] imon: don't wedge
hardware after early callbacks"). Move the ictx->dev_present_intf0 test
introduced by commit 6f6b90c923 ("[media] imon: don't parse scancodes
until intf configured") to immediately before imon_incoming_packet(), or
the first problem explained above happens without printk() flooding (i.e.
hung task).
Third problem is that when usb_rx_callback_intf0() is not called for some
reason (e.g. flaky hardware; the reproducer for this problem sometimes
prevents usb_rx_callback_intf0() from being called),
wait_for_completion_interruptible() in send_packet() never returns (i.e.
hung task). As a workaround for such situation, change send_packet() to
wait for completion with timeout of 10 seconds.
Link: https://syzkaller.appspot.com/bug?extid=592e2ab8775dbe0bf09a [1]
Link: https://lkml.kernel.org/r/d6da6709-d799-4be3-a695-850bddd6eb24@rowland.harvard.edu [2]
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2327a3d6f6 ]
Fix field-spanning memcpy warnings in ah6_output() and
ah6_output_done() where extension headers are copied to/from IPv6
address fields, triggering fortify-string warnings about writes beyond
the 16-byte address fields.
memcpy: detected field-spanning write (size 40) of single field "&top_iph->saddr" at net/ipv6/ah6.c:439 (size 16)
WARNING: CPU: 0 PID: 8838 at net/ipv6/ah6.c:439 ah6_output+0xe7e/0x14e0 net/ipv6/ah6.c:439
The warnings are false positives as the extension headers are
intentionally placed after the IPv6 header in memory. Fix by properly
copying addresses and extension headers separately, and introduce
helper functions to avoid code duplication.
Reported-by: syzbot+01b0667934cdceb4451c@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=01b0667934cdceb4451c
Signed-off-by: Charalampos Mitrodimas <charmitro@posteo.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit aa86602a48 ]
Move the configuration of the Auto-Hibern8 (AHIT) timer from the
post-link stage to the 'fixup_dev_quirks' function. This change allows
setting the AHIT based on the vendor requirements:
(a) Samsung: 3.5 ms
(b) Micron: 2 ms
(c) Others: 1 ms
Additionally, the clock gating timer is adjusted based on the AHIT
scale, with a maximum setting of 10 ms. This ensures that the clock
gating delay is appropriately configured to match the AHIT settings.
Signed-off-by: Peter Wang <peter.wang@mediatek.com>
Link: https://lore.kernel.org/r/20250811131423.3444014-3-peter.wang@mediatek.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3d05b24429 ]
If a backup port is configured for a bridge port, the bridge will
redirect known unicast traffic towards the backup port when the primary
port is administratively up but without a carrier. This is useful, for
example, in MLAG configurations where a system is connected to two
switches and there is a peer link between both switches. The peer link
serves as the backup port in case one of the switches loses its
connection to the multi-homed system.
In order to avoid flooding when the primary port loses its carrier, the
bridge does not flush dynamic FDB entries pointing to the port upon STP
disablement, if the port has a backup port.
The above means that known unicast traffic destined to the primary port
will be blackholed when the port is put administratively down, until the
FDB entries pointing to it are aged-out.
Given that the current behavior is quite weird and unlikely to be
depended on by anyone, amend the bridge to redirect to the backup port
also when the primary port is administratively down and not only when it
does not have a carrier.
The change is motivated by a report from a user who expected traffic to
be redirected to the backup port when the primary port was put
administratively down while debugging a network issue.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250812080213.325298-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5d03847175 ]
The thunderbolt driver sets up device link dependencies from hotplug ports
to the Host Router (aka Native Host Interface, NHI). When resuming from
system sleep, this allows the Host Router to re-establish tunnels to
attached Thunderbolt devices before the hotplug ports resume.
To identify the hotplug ports, the driver utilizes the is_hotplug_bridge
flag which also encompasses ACPI slots handled by the ACPI hotplug driver.
Thunderbolt hotplug ports are always Hot-Plug Capable PCIe ports, so it is
more apt to identify them with the is_pciehp flag.
Similarly, hotplug ports on older Thunderbolt controllers have broken MSI
support and are quirked to use legacy INTx interrupts instead. The quirk
identifies them with is_hotplug_bridge, even though all affected ports are
also matched by is_pciehp. So use is_pciehp here as well.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 45bc82563d ]
After a Fatal Error has been reported by a device and has been recovered
through a Secondary Bus Reset, AER updates the device's error_state to
pci_channel_io_normal before invoking its driver's ->resume() callback.
By contrast, EEH updates the error_state earlier, namely after resetting
the device and before invoking its driver's ->slot_reset() callback.
Commit c58dc575f3 ("powerpc/pseries: Set error_state to
pci_channel_io_normal in eeh_report_reset()") explains in great detail
that the earlier invocation is necessitated by various drivers checking
accessibility of the device with pci_channel_offline() and avoiding
accesses if it returns true. It returns true for any other error_state
than pci_channel_io_normal.
The device should be accessible already after reset, hence the reasoning
is that it's safe to update the error_state immediately afterwards.
This deviation between AER and EEH seems problematic because drivers
behave differently depending on which error recovery mechanism the
platform uses. Three drivers have gone so far as to update the
error_state themselves, presumably to work around AER's behavior.
For consistency, amend AER to update the error_state at the same recovery
steps as EEH. Drop the now unnecessary workaround from the three drivers.
Keep updating the error_state before ->resume() in case ->error_detected()
or ->mmio_enabled() return PCI_ERS_RESULT_RECOVERED, which causes
->slot_reset() to be skipped. There are drivers doing this even for Fatal
Errors, e.g. mhi_pci_error_detected().
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/4517af6359ffb9d66152b827a5d2833459144e3f.1755008151.git.lukas@wunner.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2d240b124c ]
Both ACPI and DT-based systems are required to obtain the external
camera sensor clock using the new devm_v4l2_sensor_clk_get() helper
function.
Ensure a dependency on HAVE_CLK when config VIDEO_CAMERA_SENSOR is
enabled.
Signed-off-by: Mehdi Djait <mehdi.djait@linux.intel.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit eea4f89b64 ]
The driver tries to calculate the value for REG_WAKEUP_TIME. However,
the calculation itself is not correct, and to add on it, the resulting
value is almost always larger than the field's size, so the actual
result is more or less random.
According to the docs, figuring out the value for REG_WAKEUP_TIME
requires HW characterization and there's no way to have a generic
algorithm to come up with the value. That doesn't help at all...
However, we know that the value must be smaller than the line time, and,
at least in my understanding, the proper value for it is quite small.
Testing shows that setting it to 1/10 of the line time seems to work
well. All video modes from my HDMI monitor work with this algorithm.
Hopefully we'll get more information on how to calculate the value, and
we can then update this.
Tested-by: Parth Pancholi <parth.pancholi@toradex.com>
Tested-by: Jayesh Choudhary <j-choudhary@ti.com>
Reviewed-by: Devarsh Thakkar <devarsht@ti.com>
Link: https://lore.kernel.org/r/20250723-cdns-dsi-impro-v5-11-e61cc06074c2@ideasonboard.com
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 19fb9c5b81 ]
The v4l2_fh initialized and added in vpu_v4l2_open() is delete and
cleaned up when the last reference to the vpu_inst is released. This may
happen later than at vpu_v4l2_close() time.
Not deleting and cleaning up the v4l2_fh when closing the file handle to
the video device is not ideal, as the v4l2_fh will still be present in
the video device's fh_list, and will store a copy of events queued to
the video device. There may also be other side effects of keeping alive
an object that represents an open file handle after the file handle is
closed.
The v4l2_fh instance is embedded in the vpu_inst structure, and is
accessed in two different ways:
- in vpu_notify_eos() and vpu_notify_source_change(), to queue V4L2
events to the file handle ; and
- through the driver to access the v4l2_fh.m2m_ctx pointer.
The v4l2_fh.m2m_ctx pointer is not touched by v4l2_fh_del() and
v4l2_fh_exit(). It is set to NULL by the driver when closing the file
handle, in vpu_v4l2_close().
The vpu_notify_eos() and vpu_notify_source_change() functions are called
in vpu_set_last_buffer_dequeued() and vdec_handle_resolution_change()
respectively, only if the v4l2_fh.m2m_ctx pointer is not NULL. There is
therefore a guarantee that no new event will be queued to the v4l2_fh
after vpu_v4l2_close() destroys the m2m_ctx.
The vpu_notify_eos() function is also called from vpu_vb2_buf_finish(),
which is guaranteed to be called for all queued buffers when
vpu_v4l2_close() calls v4l2_m2m_ctx_release(), and will not be called
later.
It is therefore safe to assume that the driver will not touch the
v4l2_fh, except to check the m2m_ctx pointer, after vpu_v4l2_close()
destroys the m2m_ctx. We can safely delete and cleanup the v4l2_fh
synchronously in vpu_v4l2_close().
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Reviewed-by: Ming Qian <ming.qian@oss.nxp.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cc6e8d1cce ]
The ivtv driver has a structure named ivtv_open_id that models an open
file handle for the device. It embeds a v4l2_fh instance for file
handles that correspond to a V4L2 video device, and stores a pointer to
that v4l2_fh in struct ivtv_stream to identify which open file handle
owns a particular stream.
In addition to video devices, streams can be owned by ALSA PCM devices.
Those devices do not make use of the v4l2_fh instance for obvious
reasons, but the snd_ivtv_pcm_capture_open() function still initializes
a "fake" v4l2_fh for the sole purpose of using it as an open file handle
identifier. The v4l2_fh is not properly destroyed when the ALSA PCM
device is closed, leading to possible resource leaks.
Fortunately, the v4l2_fh instance pointed to by ivtv_stream is not
accessed, only the pointer value is used for comparison. Replace it with
a pointer to the ivtv_open_id structure that embeds the v4l2_fh, and
don't initialize the v4l2_fh for ALSA PCM devices.
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bc4c0a48bd ]
The get_next_frame() function in psock_tpacket.c was missing a return
statement in its default switch case, leading to a compiler warning.
This was caused by a `bug_on(1)` call, which is defined as an
`assert()`, being compiled out because NDEBUG is defined during the
build.
Instead of adding a `return NULL;` which would silently hide the error
and could lead to crashes later, this change restores the original
author's intent. By adding `#undef NDEBUG` before including <assert.h>,
we ensure the assertion is active and will cause the test to abort if
this unreachable code is ever executed.
Signed-off-by: Wake Liu <wakel@google.com>
Link: https://patch.msgid.link/20250809062013.2407822-1-wakel@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c36748e873 ]
The `__WORDSIZE` macro, defined in the non-standard `<bits/wordsize.h>`
header, is a GNU extension and not universally available with all
toolchains, such as Clang when used with musl libc.
This can lead to build failures in environments where this header is
missing.
The intention of the code is to determine the bit width of a C `long`.
Replace the non-portable `__WORDSIZE` with the standard and portable
`sizeof(long) * 8` expression to achieve the same result.
This change also removes the inclusion of the now-unused
`<bits/wordsize.h>` header.
Signed-off-by: Wake Liu <wakel@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ebc7086b39 ]
RDC PCI to PCIe bridges, present on Vortex86DX3 and Vortex86EX2 SoCs, do
not support MSIs. If enabled, interrupts generated by PCIe devices never
reach the processor.
I have contacted the manufacturer (DM&P) and they confirmed that PCI MSIs
need to be disabled for them.
Signed-off-by: Marcos Del Sol Vives <marcos@orca.pet>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250705233209.721507-1-marcos@orca.pet
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e7496c15d8 ]
[Why]
Customer reported an issue that OS starts and stops device multiple times
during driver installation. Frequently disabling and enabling OTG may
prevent OTG from being safely disabled and cause incorrect configuration
upon the next enablement.
[How]
Add a wait until OTG_CURRENT_MASTER_EN_STATE is cleared as a short term
solution.
Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: TungYu Lu <tungyu.lu@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ad335b5fc9 ]
[WHY&HOW]
The user closed the lid while the system was powering on and opened it
again before the “apply_seamless_boot_optimization” was set to false,
resulting in the eDP remaining blank.
Reset the “apply_seamless_boot_optimization” to false when dpms off.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Danny Wang <Danny.Wang@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6510b62fe9 ]
snprintf() returns the number of characters that *would* have been
written, which can overestimate how much you actually wrote to the
buffer in case of truncation. That leads to 'data += this' advancing
the pointer past the end of the buffer and size going negative.
Switching to scnprintf() prevents potential buffer overflows and ensures
consistent behavior when building the output string.
Signed-off-by: Seyediman Seyedarab <ImanDevel@gmail.com>
Link: https://lore.kernel.org/r/20250724195913.60742-1-ImanDevel@gmail.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2f3b1ccf83 ]
Cached metrics data validity is 1ms on arcturus. It's not reasonable for
any client to query gpu_metrics at a faster rate and constantly
interrupt PMFW.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e87577ef6d ]
Cached metrics data validity is 1ms on aldebaran. It's not reasonable
for any client to query gpu_metrics at a faster rate and constantly
interrupt PMFW.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2e72fdba8a ]
[Why]
The reason some high-resolution monitors fail to display properly
is that this platform does not support sufficiently high DPP and
DISP clock frequencies
[How]
Update DISP and DPP clocks from the smu clock table then DML can
filter these mode if not support.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Paul Hsieh <Paul.Hsieh@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1cda3c755b ]
I saw an oops in xe_gem_fault when running the xe-fast-feedback
testlist against the realtime kernel without debug options enabled.
The panic happens after core_hotunplug unbind-rebind finishes.
Presumably what happens is that a process mmaps, unlocks because
of the FAULT_FLAG_RETRY_NOWAIT logic, has no process memory left,
causing ttm_bo_vm_dummy_page() to return VM_FAULT_NOPAGE, since
there was nothing left to populate, and then oopses in
"mem_type_is_vram(tbo->resource->mem_type)" because tbo->resource
is NULL.
It's convoluted, but fits the data and explains the oops after
the test exits.
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250715152057.23254-2-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 45fbb51050 ]
The GuC load process will abort if certain status codes (which are
indicative of a fatal error) are reported. Otherwise, it keeps waiting
until the 'success' code is returned. New error codes have been added
in recent GuC releases, so add support for aborting on those as well.
v2: Shuffle HWCONFIG_START to the front of the switch to keep the
ordering as per the enum define for clarity (review feedback by
Jonathan). Also add a description for the basic 'invalid init data'
code which was missing.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250726024337.4056272-1-John.C.Harrison@Intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e3419e1e44 ]
[WHY]
In the worst case, AUX intra-hop done can take hundreds of milliseconds as
each retimer in a link might have to wait a full AUX_RD_INTERVAL to send
LT abort downstream.
[HOW]
Wait 300ms for each retimer in a link to allow time to propagate a LT abort
without infinitely waiting on intra-hop done.
For no-retimer case, keep the max duration at 10ms.
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Michael Strauss <michael.strauss@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2681bf4ae8 ]
[WHY]
If symclk RCO is enabled, stream encoder may not be receiving an ungated
clock by the time we attempt to set stream attributes when setting dpms
on. Since the clock is gated, register writes to the stream encoder fail.
[HOW]
Move set_stream_attribute call into enable_stream, just after the point
where symclk32_se is ungated.
Logically there is no need to set stream attributes as early as is
currently done in link_set_dpms_on, so this should have no impact beyond
the RCO fix.
Reviewed-by: Ovidiu (Ovi) Bunea <ovidiu.bunea@amd.com>
Signed-off-by: Michael Strauss <michael.strauss@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ca74cc428f ]
[Why]
When transitioning between topologies such as multi-display to single
display ODM 2:1, pipes might not be freed before use.
[How]
In dc_commit_streams, commit an additional, minimal transition if
original transition is not seamless to ensure pipes are freed.
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Clay King <clayking@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5f4bbee069 ]
When an MFD device is added, a platform_device is allocated. If this
device is linked to a DT description, the corresponding OF node is linked
to the new platform device but the OF node's refcount isn't incremented.
As of_node_put() is called during the platform device release, it leads
to a refcount underflow.
Call of_node_get() to increment the OF node's refcount when the node is
linked to the newly created platform device.
Signed-off-by: Bastien Curutchet <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20250820-mfd-refcount-v1-1-6dcb5eb41756@bootlin.com
Signed-off-by: Lee Jones <lee@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9ac4890ac3 ]
We observed the initial probe of the da9063 failing in
da9063_get_device_type in about 30% of boots on a Xilinx ZynqMP based
board. The problem originates in da9063_i2c_blockreg_read, which uses
a single bus transaction to turn the register page and then read a
register. On the bus, this should translate to a write to register 0,
followed by a read to the target register, separated by a repeated
start. However, we found that after the write to register 0, the
controller sometimes continues directly with the register address of
the read request, without sending the chip address or a repeated start
in between, which makes the read request invalid.
To fix this, separate turning the page and reading the register into
two separate transactions. This brings the initialization code in line
with the rest of the driver, which uses register maps (which to my
knowledge do not use repeated starts after turning the page). This has
been included in our kernel for several months and was recently
included in a shipped product. For us, it reliably fixes the issue,
and we have not observed any new issues.
While the underlying problem is probably with the i2c controller or
its driver, I still propose a change here in the interest of
robustness: First, I'm not sure this issue can be fixed on the
controller side, since there are other issues related to repeated
start which can't (AR# 60695, AR# 61664). Second, similar problems
might exist with other controllers.
Signed-off-by: Jens Kehne <jens.kehne@agilent.com>
Link: https://lore.kernel.org/r/20250804133754.3496718-1-jens.kehne@agilent.com
Signed-off-by: Lee Jones <lee@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 364752aa0c ]
clang-21 warns about one uninitialized variable getting dereferenced
in madera_dev_init:
drivers/mfd/madera-core.c:739:10: error: variable 'mfd_devs' is uninitialized when used here [-Werror,-Wuninitialized]
739 | mfd_devs, n_devs,
| ^~~~~~~~
drivers/mfd/madera-core.c:459:33: note: initialize the variable 'mfd_devs' to silence this warning
459 | const struct mfd_cell *mfd_devs;
| ^
| = NULL
The code is actually correct here because n_devs is only nonzero
when mfd_devs is a valid pointer, but this is impossible for the
compiler to see reliably.
Change the logic to check for the pointer as well, to make this easier
for the compiler to follow.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://lore.kernel.org/r/20250807071932.4085458-1-arnd@kernel.org
Signed-off-by: Lee Jones <lee@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2734fdbc9b ]
When we are successful in using cpufreq min/max limits,
skip setting the raw MSR limits entirely.
This is necessary to avoid undoing any modification that
the cpufreq driver makes to our sysfs request.
eg. intel_pstate may take our request for a limit
that is valid according to HWP.CAP.MIN/MAX and clip
it to be within the range available in PLATFORM_INFO.
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c97c057d35 ]
On enabling HWP, preserve the reserved bits in MSR_PM_ENABLE.
Also, skip writing the MSR_PM_ENABLE if HWP is already enabled.
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 62127655b7 ]
The fopen_or_die() function was previously hardcoded
to open files in read-only mode ("r"), ignoring the
mode parameter passed to it. This patch corrects
fopen_or_die() to use the provided mode argument,
allowing for flexible file access as intended.
Additionally, the call to fopen_or_die() in
err_on_hypervisor() incorrectly used the mode
"ro", which is not a valid fopen mode. This is
fixed to use the correct "r" mode.
Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 105eb5dc74 ]
bpf_cookie can fail on perf_event_open(), when it runs after the task_work
selftest. The task_work test causes perf to lower
sysctl_perf_event_sample_rate, and bpf_cookie uses sample_freq,
which is validated against that sysctl. As a result,
perf_event_open() rejects the attr if the (now tighter) limit is
exceeded.
>From perf_event_open():
if (attr.freq) {
if (attr.sample_freq > sysctl_perf_event_sample_rate)
return -EINVAL;
} else {
if (attr.sample_period & (1ULL << 63))
return -EINVAL;
}
Switch bpf_cookie to use sample_period, which is not checked against
sysctl_perf_event_sample_rate.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250925215230.265501-1-mykyta.yatsenko5@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 23199d2aa6 ]
Fix incorrect size parameter passed to cpuidle_state_write_file() in
cpuidle_state_disable().
The function was incorrectly using sizeof(disable) which returns the
size of the unsigned int variable (4 bytes) instead of the actual
length of the string stored in the 'value' buffer.
Since 'value' is populated with snprintf() to contain the string
representation of the disable value, we should use the length
returned by snprintf() to get the correct string length for
writing to the sysfs file.
This ensures the correct number of bytes is written to the cpuidle
state disable file in sysfs.
Link: https://lore.kernel.org/r/20250917050820.1785377-1-kaushlendra.kumar@intel.com
Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ddb61e737f ]
It turns out the second fan on the Dell Precision 490 does not
really support I8K_FAN_TURBO. Setting the fan state to 3 enables
automatic fan control, just like on the other two fans.
The reason why this was misinterpreted as turbo mode was that
the second fan normally spins faster in automatic mode than
in the previous fan states. Yet when in state 3, the fan speed
reacts to heat exposure, exposing the automatic mode setting.
Link: https://github.com/lm-sensors/lm-sensors/pull/383
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Link: https://lore.kernel.org/r/20250917181036.10972-2-W_Armin@gmx.de
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4363264111 ]
If uprobe handler changes instruction pointer we still execute single
step) or emulate the original instruction and increment the (new) ip
with its length.
This makes the new instruction pointer bogus and application will
likely crash on illegal instruction execution.
If user decided to take execution elsewhere, it makes little sense
to execute the original instruction, so let's skip it.
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20250916215301.664963-3-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2e48265501 ]
The NVMe Base Specification 2.1 states that:
"""
A host requests an explicit persistent connection ... by specifying a
non-zero Keep Alive Timer value in the Connect command.
"""
As such if we are starting a persistent connection to a discovery
controller and the KATO is currently 0 we need to update KATO to a non
zero value to avoid continuous timeouts on the target.
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8f12d1137c ]
It is possible for bpf_xdp_adjust_tail() to free all fragments. The
kfunc currently clears the XDP_FLAGS_HAS_FRAGS bit, but not
XDP_FLAGS_FRAGS_PF_MEMALLOC. So far, this has not caused a issue when
building sk_buff from xdp_buff since all readers of xdp_buff->flags
use the flag only when there are fragments. Clear the
XDP_FLAGS_FRAGS_PF_MEMALLOC bit as well to make the flags correct.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://patch.msgid.link/20250922233356.3356453-2-ameryhung@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d0bf7cd5df ]
In the __arch_prepare_bpf_trampoline() function, retval_off is only
meaningful when save_ret is true, so the current logic is correct.
However, in the original logic, retval_off is only initialized under
certain conditions; for example, in the fmod_ret logic, the compiler is
not aware that the flags of the fmod_ret program (prog) have set
BPF_TRAMP_F_CALL_ORIG, which results in an uninitialized symbol
compilation warning.
So initialize retval_off unconditionally to fix it.
Signed-off-by: Chenghao Duan <duanchenghao@kylinos.cn>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20250922062244.822937-2-duanchenghao@kylinos.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5d726c4dbe ]
Following deadlock can be triggered easily by lockdep:
WARNING: possible circular locking dependency detected
6.17.0-rc3-00124-ga12c2658ced0 #1665 Not tainted
------------------------------------------------------
check/1334 is trying to acquire lock:
ff1100011d9d0678 (&q->sysfs_lock){+.+.}-{4:4}, at: blk_unregister_queue+0x53/0x180
but task is already holding lock:
ff1100011d9d00e0 (&q->q_usage_counter(queue)#3){++++}-{0:0}, at: del_gendisk+0xba/0x110
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&q->q_usage_counter(queue)#3){++++}-{0:0}:
blk_queue_enter+0x40b/0x470
blkg_conf_prep+0x7b/0x3c0
tg_set_limit+0x10a/0x3e0
cgroup_file_write+0xc6/0x420
kernfs_fop_write_iter+0x189/0x280
vfs_write+0x256/0x490
ksys_write+0x83/0x190
__x64_sys_write+0x21/0x30
x64_sys_call+0x4608/0x4630
do_syscall_64+0xdb/0x6b0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
-> #1 (&q->rq_qos_mutex){+.+.}-{4:4}:
__mutex_lock+0xd8/0xf50
mutex_lock_nested+0x2b/0x40
wbt_init+0x17e/0x280
wbt_enable_default+0xe9/0x140
blk_register_queue+0x1da/0x2e0
__add_disk+0x38c/0x5d0
add_disk_fwnode+0x89/0x250
device_add_disk+0x18/0x30
virtblk_probe+0x13a3/0x1800
virtio_dev_probe+0x389/0x610
really_probe+0x136/0x620
__driver_probe_device+0xb3/0x230
driver_probe_device+0x2f/0xe0
__driver_attach+0x158/0x250
bus_for_each_dev+0xa9/0x130
driver_attach+0x26/0x40
bus_add_driver+0x178/0x3d0
driver_register+0x7d/0x1c0
__register_virtio_driver+0x2c/0x60
virtio_blk_init+0x6f/0xe0
do_one_initcall+0x94/0x540
kernel_init_freeable+0x56a/0x7b0
kernel_init+0x2b/0x270
ret_from_fork+0x268/0x4c0
ret_from_fork_asm+0x1a/0x30
-> #0 (&q->sysfs_lock){+.+.}-{4:4}:
__lock_acquire+0x1835/0x2940
lock_acquire+0xf9/0x450
__mutex_lock+0xd8/0xf50
mutex_lock_nested+0x2b/0x40
blk_unregister_queue+0x53/0x180
__del_gendisk+0x226/0x690
del_gendisk+0xba/0x110
sd_remove+0x49/0xb0 [sd_mod]
device_remove+0x87/0xb0
device_release_driver_internal+0x11e/0x230
device_release_driver+0x1a/0x30
bus_remove_device+0x14d/0x220
device_del+0x1e1/0x5a0
__scsi_remove_device+0x1ff/0x2f0
scsi_remove_device+0x37/0x60
sdev_store_delete+0x77/0x100
dev_attr_store+0x1f/0x40
sysfs_kf_write+0x65/0x90
kernfs_fop_write_iter+0x189/0x280
vfs_write+0x256/0x490
ksys_write+0x83/0x190
__x64_sys_write+0x21/0x30
x64_sys_call+0x4608/0x4630
do_syscall_64+0xdb/0x6b0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
other info that might help us debug this:
Chain exists of:
&q->sysfs_lock --> &q->rq_qos_mutex --> &q->q_usage_counter(queue)#3
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&q->q_usage_counter(queue)#3);
lock(&q->rq_qos_mutex);
lock(&q->q_usage_counter(queue)#3);
lock(&q->sysfs_lock);
Root cause is that queue_usage_counter is grabbed with rq_qos_mutex
held in blkg_conf_prep(), while queue should be freezed before
rq_qos_mutex from other context.
The blk_queue_enter() from blkg_conf_prep() is used to protect against
policy deactivation, which is already protected with blkcg_mutex, hence
convert blk_queue_enter() to blkcg_mutex to fix this problem. Meanwhile,
consider that blkcg_mutex is held after queue is freezed from policy
deactivation, also convert blkg_alloc() to use GFP_NOIO.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e7a2510633 ]
The OpenWrt distribution has switched from kernel longterm 6.6 to
6.12. Reports show that devices with the Realtek Otto switch platform
die during operation and are rebooted by the watchdog. Sorting out
other possible reasons the Otto timer is to blame. The platform
currently consists of 4 targets with different hardware revisions.
It is not 100% clear which devices and revisions are affected.
Analysis shows:
A more aggressive sched/deadline handling leads to more timer starts
with small intervals. This increases the bug chances. See
https://marc.info/?l=linux-kernel&m=175276556023276&w=2
Focusing on the real issue a hardware limitation on some devices was
found. There is a minimal chance that a timer ends without firing an
interrupt if it is reprogrammed within the 5us before its expiration
time. Work around this issue by introducing a bounce() function. It
restarts the timer directly before the normal restart functions as
follows:
- Stop timer
- Restart timer with a slow frequency.
- Target time will be >5us
- The subsequent normal restart is outside the critical window
Downstream has already tested and confirmed a patch. See
https://github.com/openwrt/openwrt/pull/19468https://forum.openwrt.org/t/support-for-rtl838x-based-managed-switches/57875/3788
Signed-off-by: Markus Stockhausen <markus.stockhausen@gmx.de>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Stephen Howell <howels@allthatwemight.be>
Tested-by: Bjørn Mork <bjorn@mork.no>
Link: https://lore.kernel.org/r/20250804080328.2609287-2-markus.stockhausen@gmx.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6b54082c3e ]
sys_get_robust_list() and compat_get_robust_list() use ptrace_may_access()
to check if the calling task is allowed to access another task's
robust_list pointer. This check is racy against a concurrent exec() in the
target process.
During exec(), a task may transition from a non-privileged binary to a
privileged one (e.g., setuid binary) and its credentials/memory mappings
may change. If get_robust_list() performs ptrace_may_access() before
this transition, it may erroneously allow access to sensitive information
after the target becomes privileged.
A racy access allows an attacker to exploit a window during which
ptrace_may_access() passes before a target process transitions to a
privileged state via exec().
For example, consider a non-privileged task T that is about to execute a
setuid-root binary. An attacker task A calls get_robust_list(T) while T
is still unprivileged. Since ptrace_may_access() checks permissions
based on current credentials, it succeeds. However, if T begins exec
immediately afterwards, it becomes privileged and may change its memory
mappings. Because get_robust_list() proceeds to access T->robust_list
without synchronizing with exec() it may read user-space pointers from a
now-privileged process.
This violates the intended post-exec access restrictions and could
expose sensitive memory addresses or be used as a primitive in a larger
exploit chain. Consequently, the race can lead to unauthorized
disclosure of information across privilege boundaries and poses a
potential security risk.
Take a read lock on signal->exec_update_lock prior to invoking
ptrace_may_access() and accessing the robust_list/compat_robust_list.
This ensures that the target task's exec state remains stable during the
check, allowing for consistent and synchronized validation of
credentials.
Suggested-by: Jann Horn <jann@thejh.net>
Signed-off-by: Pranav Tyagi <pranav.tyagi03@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/linux-fsdevel/1477863998-3298-5-git-send-email-jann@thejh.net/
Link: https://github.com/KSPP/linux/issues/119
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7b1b796117 ]
Refuse to register a cpuidle device if the given CPU has a cpuidle
device already and print a message regarding it.
Without this, an attempt to register a new cpuidle device without
unregistering the existing one leads to the removal of the existing
cpuidle device without removing its sysfs interface.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5612ea8b55 ]
This fixes the build with -Werror -Wall.
btf_dumper.c:71:31: error: variable 'finfo' is uninitialized when passed as a const pointer argument here [-Werror,-Wuninitialized-const-pointer]
71 | info.func_info = ptr_to_u64(&finfo);
| ^~~~~
prog.c:2294:31: error: variable 'func_info' is uninitialized when passed as a const pointer argument here [-Werror,-Wuninitialized-const-pointer]
2294 | info.func_info = ptr_to_u64(&func_info);
|
v2:
- Initialize instead of using memset.
Signed-off-by: Tom Stellard <tstellar@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/bpf/20250917183847.318163-1-tstellar@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 41307ec7df ]
The X1E80100 battery management firmware sends a notification with
code 0x83 when the battery charging state changes, such as switching
between fast charge, taper charge, end of charge, or any other error
charging states.
The same notification code is used with bit[8] set when charging stops
because the charge control end threshold is reached. Additionally,
a 2-bit value is included in bit[10:9] with the same code to indicate
the charging source capability, which is determined by the calculated
power from voltage and current readings from PDOs: 2 means a strong
charger over 60W, 1 indicates a weak charger, and 0 means there is no
charging source.
These 3-MSB [10:8] in the notification code is not much useful for now,
hence just ignore them and trigger a power supply change event whenever
0x83 notification code is received. This helps to eliminate the unknown
notification error messages.
Reported-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Closes: https://lore.kernel.org/all/r65idyc4of5obo6untebw4iqfj2zteiggnnzabrqtlcinvtddx@xc4aig5abesu/
Signed-off-by: Fenglin Wu <fenglin.wu@oss.qualcomm.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 57b100d4cf ]
The cpupower_write_sysfs() function currently returns -1 on
write failure, but the function signature indicates it should
return an unsigned int. Returning -1 from an unsigned function
results in a large positive value rather than indicating
an error condition.
Fix this by returning 0 on failure, which is more appropriate
for an unsigned return type and maintains consistency with typical
success/failure semantics where 0 indicates failure and non-zero
indicates success (bytes written).
Link: https://lore.kernel.org/r/20250828063000.803229-1-kaushlendra.kumar@intel.com
Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2c89513395 ]
The bpf_cgroup_from_id kfunc relies on cgroup_get_from_id to obtain the
cgroup corresponding to a given cgroup ID. This helper can be called in
a lot of contexts where the current thread can be random. A recent
example was its use in sched_ext's ops.tick(), to obtain the root cgroup
pointer. Since the current task can be whatever random user space task
preempted by the timer tick, this makes the behavior of the helper
unreliable.
Refactor out __cgroup_get_from_id as the non-namespace aware version of
cgroup_get_from_id, and change bpf_cgroup_from_id to make use of it.
There is no compatibility breakage here, since changing the namespace
against which the lookup is being done to the root cgroup namespace only
permits a wider set of lookups to succeed now. The cgroup IDs across
namespaces are globally unique, and thus don't need to be retranslated.
Reported-by: Dan Schatzberg <dschatzberg@meta.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20250915032618.1551762-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f2537be4f8 ]
When forcefully shutting down a port via the configfs interface,
nvmet_port_subsys_drop_link() first calls nvmet_port_del_ctrls() and
then nvmet_disable_port(). Both functions will eventually schedule all
remaining associations for deletion.
The current implementation checks whether an association is about to be
removed, but only after the work item has already been scheduled. As a
result, it is possible for the first scheduled work item to free all
resources, and then for the same work item to be scheduled again for
deletion.
Because the association list is an RCU list, it is not possible to take
a lock and remove the list entry directly, so it cannot be looked up
again. Instead, a flag (terminating) must be used to determine whether
the association is already in the process of being deleted.
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Closes: https://lore.kernel.org/all/rsdinhafrtlguauhesmrrzkybpnvwantwmyfq2ih5aregghax5@mhr7v3eryci3/
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Daniel Wagner <wagi@kernel.org>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6dbcd5a9ab ]
A TEE driver doesn't always need to provide a pool if it doesn't
support memory sharing ioctls and can allocate memory for TEE
messages in another way. Although this is mentioned in the
documentation for tee_device_alloc(), it is not handled correctly.
Reviewed-by: Sumit Garg <sumit.garg@oss.qualcomm.com>
Signed-off-by: Amirreza Zarrabi <amirreza.zarrabi@oss.qualcomm.com>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e9dff11a7a ]
When deleting the previous walkstate operand stack
acpi_ds_call_control_method() was deleting obj_desc->Method.param_count
operands. But Method.param_count does not necessarily match
this_walk_state->num_operands, it may be either less or more.
After correcting the for loop to check `i < this_walk_state->num_operands`
the code is identical to acpi_ds_clear_operands(), so just outright
replace the code with acpi_ds_clear_operands() to fix this.
Link: https://github.com/acpica/acpica/commit/53fc0220
Signed-off-by: Hans de Goede <hansg@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 08b68ca543 ]
For Qualcomm SoCs which needs level shifter for SD card, extra delay is
seen on receiver data path.
To compensate this delay enable tuning for SDR50 mode for targets which
has level shifter. SDHCI_SDR50_NEEDS_TUNING caps will be set for targets
with level shifter on Qualcomm SOC's.
Signed-off-by: Sarthak Garg <quic_sartgarg@quicinc.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7f3cfb7943 ]
IO time is considered busy by default for modern Intel processors. The
current check covers recent Family 6 models but excludes the brand new
Families 18 and 19.
According to Arjan van de Ven, the model check was mainly due to a lack
of testing on systems before INTEL_CORE2_MEROM. He suggests considering
all Intel processors as having an efficient idle.
Extend the IO busy classification to all Intel processors starting with
Family 6, including Family 15 (Pentium 4s) and upcoming Families 18/19.
Use an x86 VFM check and move the function to the header file to avoid
using arch-specific #ifdefs in the C file.
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://patch.msgid.link/20250908230655.2562440-1-sohil.mehta@intel.com
[ rjw: Added empty line after #include ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c33c43f71b ]
On certain Loongson platforms, drivers attempting to request a legacy
ISA IRQ directly via request_irq() (e.g., IRQ 4) may fail. The
virtual IRQ descriptor is not fully initialized and lacks a valid irqchip.
This issue does not affect ACPI-enumerated devices described in DSDT,
as their interrupts are properly mapped via the GSI translation path.
This indicates the LPC irqdomain itself is functional but is not correctly
handling direct VIRQ-to-HWIRQ mappings.
The root cause is the use of irq_domain_create_linear(). This API sets
up a domain for dynamic, on-demand mapping, typically triggered by a GSI
request. It does not pre-populate the mappings for the legacy VIRQ range
(0-15). Consequently, if no ACPI device claims a specific GSI
(e.g., GSI 4), the corresponding VIRQ (e.g., VIRQ 4) is never mapped to
the LPC domain. A direct call to request_irq(4, ...) then fails because
the kernel cannot resolve this VIRQ to a hardware interrupt managed by
the LPC controller.
The PCH-LPC interrupt controller is an i8259-compatible legacy device
that requires a deterministic, static 1-to-1 mapping for IRQs 0-15 to
support legacy drivers.
Fix this by replacing irq_domain_create_linear() with
irq_domain_create_legacy(). This API is specifically designed for such
controllers. It establishes the required static 1-to-1 VIRQ-to-HWIRQ
mapping for the entire legacy range (0-15) immediately upon domain
creation. This ensures that any VIRQ in this range is always resolvable,
making direct calls to request_irq() for legacy IRQs function correctly.
Signed-off-by: Ming Wang <wangming01@loongson.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit aae7a2876c ]
Unlike all the other allocations in this driver, the memory for storing
the pin function descriptions allocated with kcalloc() and later resized
with krealloc() is never freed. Use devres like elsewhere to handle
that. While at it - replace krealloc() with more suitable
devm_krealloc_array().
Note: the logic in this module is pretty convoluted and could probably
use some revisiting, we should probably be able to calculate the exact
amount of memory needed in advance or even skip the allocation
altogether and just add each function to the radix tree separately.
Tested-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d9d61f1da3 ]
Many AMD CPUs can support this feature now. We would get a wrong CPU DIE
temperature if don't consider this. In low-temperature environments,
the CPU die temperature can drop below zero. So many platforms would like
to make extended temperature range as their default configuration.
Default temperature range (0C to 255.875C).
Extended temperature range (-49C to +206.875C).
Ref Doc: AMD V3000 PPR (Doc ID #56558).
Signed-off-by: Chuande Chen <chuachen@cisco.com>
Link: https://lore.kernel.org/r/20250814053940.96764-1-chenchuande@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2e82368359 ]
The current behavior of the Step-wise thermal governor is to increase
the cooling level one step at a time after trip point threshold passing
by thermal zone temperature until the temperature stops to rise. Then,
nothing is done until the temperature decreases below the (possibly
updated) trip point threshold, at which point the cooling level is
reduced straight to the applicable minimum.
While this generally works, it is not in agreement with the throttling
logic description comment in step_wise_manage() any more after some
relatively recent changes, and in the case of passive cooling, it may
lead to undesirable performance oscillations between high and low
levels.
For this reason, modify the governor's cooling device state selection
function, get_target_state(), to reduce cooling by one level even if
the temperature is still above the thermal zone threshold, but the
temperature has started to fall down. However, ensure that the cooling
level will remain above the applicable minimum in that case to pull
the zone temperature further down, possibly until it falls below the
trip threshold (which may now be equal to the low temperature of the
trip).
Doing so should help higher performance to be restored earlier in some
cases which is desirable especially for passive trip points with
relatively high hysteresis values.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/1947735.tdWV9SEqCh@rafael.j.wysocki
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4405a214df ]
Some x86/ACPI laptops with MIPI cameras have a INTC10DE or INTC10E0 ACPI
device in the _DEP dependency list of the ACPI devices for the camera-
sensors (which have flags.honor_deps set).
These devices are for an Intel Vision CVS chip for which an out of tree
driver is available [1].
The camera sensor works fine without a driver being loaded for this
ACPI device on the 2 laptops this was tested on:
ThinkPad X1 Carbon Gen 12 (Meteor Lake)
ThinkPad X1 2-in-1 Gen 10 (Arrow Lake)
For now add these HIDs to acpi_ignore_dep_ids[] so that
acpi_dev_ready_for_enumeration() will return true once the other _DEP
dependencies are met and an i2c_client for the camera sensor will get
instantiated.
Link: https://github.com/intel/vision-drivers/ [1]
Signed-off-by: Hans de Goede <hansg@kernel.org>
Link: https://patch.msgid.link/20250829142748.21089-1-hansg@kernel.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3a351de0d9 ]
Just like the other Vivobooks here, the N6506CU has its keyboard IRQ
described as ActiveLow in the DSDT, which the kernel overrides to
EdgeHigh, causing the internal keyboard not to work.
Add the N6506CU to the irq1_level_low_skip_override[] quirk table to fix
this.
Signed-off-by: Sam van Kampen <sam@tehsvk.net>
Link: https://patch.msgid.link/20250829145221.2294784-2-sam@tehsvk.net
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2ef3886ce6 ]
The PCI Local Bus Specification 3.0 (section 6.8.1.6) allows modifying the
low-order bits of the MSI Message DATA register to encode nr_irqs interrupt
numbers in the log2(nr_irqs) bits for the domain.
The problem arises if the base vector (GICV2m base spi) is not aligned with
nr_irqs; in this case, the low-order log2(nr_irqs) bits from the base
vector conflict with the nr_irqs masking, causing the wrong MSI interrupt
to be identified.
To fix this, use bitmap_find_next_zero_area_off() instead of
bitmap_find_free_region() to align the initial base vector with nr_irqs.
Signed-off-by: Christian Bruel <christian.bruel@foss.st.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/all/20250902091045.220847-1-christian.bruel@foss.st.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a3fecb9160 ]
While tracking down a problem where constant expressions used by
BUILD_BUG_ON() suddenly stopped working[1], we found that an added static
initializer was convincing the compiler that it couldn't track the state
of the prior statically initialized value. Tracing this down found that
ffs() was used in the initializer macro, but since it wasn't marked with
__attribute__const__, the compiler had to assume the function might
change variable states as a side-effect (which is not true for ffs(),
which provides deterministic math results).
For arc architecture with CONFIG_ISA_ARCV2=y, the __fls() function
uses __builtin_arc_fls() which lacks GCC's const attribute, preventing
compile-time constant folding, and KUnit testing of ffs/fls fails on
arc[3]. A patch[2] to GCC to solve this has been sent.
Add a fix for this by handling compile-time constants with the standard
__builtin_clzl() builtin (which has const attribute) while preserving
the optimized arc-specific builtin for runtime cases. This has the added
benefit of skipping runtime calculation of compile-time constant values.
Even with the GCC bug fixed (which is about "attribute const") this is a
good change to avoid needless runtime costs, and should be done
regardless of the state of GCC's bug.
Build tested ARCH=arc allyesconfig with GCC arc-linux 15.2.0.
Link: https://github.com/KSPP/linux/issues/364 [1]
Link: https://gcc.gnu.org/pipermail/gcc-patches/2025-August/693273.html
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202508031025.doWxtzzc-lkp@intel.com/ [3]
Signed-off-by: Kees Cook <kees@kernel.org>
Acked-by: Vineet Gupta <vgupta@kernel.org>
Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 592532a77b ]
longhaul_exit() was calling cpufreq_cpu_get(0) without checking
for a NULL policy pointer. On some systems, this could lead to a
NULL dereference and a kernel warning or panic.
This patch adds a check using unlikely() and returns early if the
policy is NULL.
Bugzilla: #219962
Signed-off-by: Dennis Beier <nanovim@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 758acb9ccf ]
On x86-64, USDT arguments can be specified using Scale-Index-Base (SIB)
addressing, e.g. "1@-96(%rbp,%rax,8)". The current USDT implementation
in libbpf cannot parse this format, causing `bpf_program__attach_usdt()`
to fail with -ENOENT (unrecognized register).
This patch fixes this by implementing the necessary changes:
- add correct handling for SIB-addressed arguments in `bpf_usdt_arg`.
- add adaptive support to `__bpf_usdt_arg_type` and
`__bpf_usdt_arg_spec` to represent SIB addressing parameters.
Signed-off-by: Jiawei Zhao <phoenix500526@163.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250827053128.1301287-2-phoenix500526@163.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f41345f47f ]
In the following toy program (reg states minimized for readability), R0
and R1 always have different values at instruction 6. This is obvious
when reading the program but cannot be guessed from ranges alone as
they overlap (R0 in [0; 0xc0000000], R1 in [1024; 0xc0000400]).
0: call bpf_get_prandom_u32#7 ; R0_w=scalar()
1: w0 = w0 ; R0_w=scalar(var_off=(0x0; 0xffffffff))
2: r0 >>= 30 ; R0_w=scalar(var_off=(0x0; 0x3))
3: r0 <<= 30 ; R0_w=scalar(var_off=(0x0; 0xc0000000))
4: r1 = r0 ; R1_w=scalar(var_off=(0x0; 0xc0000000))
5: r1 += 1024 ; R1_w=scalar(var_off=(0x400; 0xc0000000))
6: if r1 != r0 goto pc+1
Looking at tnums however, we can deduce that R1 is always different from
R0 because their tnums don't agree on known bits. This patch uses this
logic to improve is_scalar_branch_taken in case of BPF_JEQ and BPF_JNE.
This change has a tiny impact on complexity, which was measured with
the Cilium complexity CI test. That test covers 72 programs with
various build and load time configurations for a total of 970 test
cases. For 80% of test cases, the patch has no impact. On the other
test cases, the patch decreases complexity by only 0.08% on average. In
the best case, the verifier needs to walk 3% less instructions and, in
the worst case, 1.5% more. Overall, the patch has a small positive
impact, especially for our largest programs.
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/be3ee70b6e489c49881cb1646114b1d861b5c334.1755694147.git.paul.chaignon@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b5af45302e ]
Add support for TI K3 AM62D2 SoC to read speed and revision values
from hardware and pass to OPP layer. AM62D shares the same configuations
as AM62A so use existing am62a7_soc_data.
Signed-off-by: Paresh Bhagat <p-bhagat@ti.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 236152dd9b ]
In the pin_config_set function, when handling PIN_CONFIG_BIAS_PULL_DOWN or
PIN_CONFIG_BIAS_PULL_UP, the function calls pcs_pinconf_clear_bias()
which writes the register. However, the subsequent operations continue
using the stale 'data' value from before the register write, effectively
causing the bias clear operation to be overwritten and not take effect.
Fix this by reading the 'data' value from the register after calling
pcs_pinconf_clear_bias().
This bug seems to have existed when this code was first merged in commit
9dddb4df90 ("pinctrl: single: support generic pinconf").
Signed-off-by: Chi Zhang <chizhang@asrmicro.com>
Link: https://lore.kernel.org/20250807062038.13610-1-chizhang@asrmicro.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2caa6b88e0 ]
In the past %pK was preferable to %p as it would not leak raw pointer
values into the kernel log.
Since commit ad67b74d24 ("printk: hash addresses printed with %p")
the regular %p has been improved to avoid this issue.
Furthermore, restricted pointers ("%pK") were never meant to be used
through printk(). They can still unintentionally leak raw pointers or
acquire sleeping locks in atomic contexts.
Switch to the regular pointer formatting which is safer and
easier to reason about.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250811-restricted-pointers-bpf-v1-1-a1d7cc3cb9e7@linutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a5039648f8 ]
In the past %pK was preferable to %p as it would not leak raw pointer
values into the kernel log.
Since commit ad67b74d24 ("printk: hash addresses printed with %p")
the regular %p has been improved to avoid this issue.
Furthermore, restricted pointers ("%pK") were never meant to be used
through printk(). They can still unintentionally leak raw pointers or
acquire sleeping locks in atomic contexts.
Switch to the regular pointer formatting which is safer and
easier to reason about.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Link: https://lore.kernel.org/r/20250811-restricted-pointers-soc-v2-1-7af7ed993546@linutronix.de
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b832b19318 ]
In the past %pK was preferable to %p as it would not leak raw pointer
values into the kernel log.
Since commit ad67b74d24 ("printk: hash addresses printed with %p")
the regular %p has been improved to avoid this issue.
Furthermore, restricted pointers ("%pK") were never meant to be used
through printk(). They can still unintentionally leak raw pointers or
acquire sleeping locks in atomic contexts.
Switch to the regular pointer formatting which is safer and
easier to reason about.
There are still a few users of %pK left, but these use it through seq_file,
for which its usage is safe.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Link: https://patch.msgid.link/20250811-restricted-pointers-spi-v1-1-32c47f954e4d@linutronix.de
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 64e2f60f35 ]
As reported by Luiz Capitulino enabling HVO on s390 leads to reproducible
crashes. The problem is that kernel page tables are modified without
flushing corresponding TLB entries.
Even if it looks like the empty flush_tlb_all() implementation on s390 is
the problem, it is actually a different problem: on s390 it is not allowed
to replace an active/valid page table entry with another valid page table
entry without the detour over an invalid entry. A direct replacement may
lead to random crashes and/or data corruption.
In order to invalidate an entry special instructions have to be used
(e.g. ipte or idte). Alternatively there are also special instructions
available which allow to replace a valid entry with a different valid
entry (e.g. crdte or cspg).
Given that the HVO code currently does not provide the hooks to allow for
an implementation which is compliant with the s390 architecture
requirements, disable ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP again, which is
basically a revert of the original patch which enabled it.
Reported-by: Luiz Capitulino <luizcap@redhat.com>
Closes: https://lore.kernel.org/all/20251028153930.37107-1-luizcap@redhat.com/
Fixes: 00a34d5a99 ("s390: select ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP")
Cc: stable@vger.kernel.org
Tested-by: Luiz Capitulino <luizcap@redhat.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
[ Adjust context ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0fd20f65df ]
Do not block PCI config accesses through pci_cfg_access_lock() when
executing the s390 variant of PCI error recovery: Acquire just
device_lock() instead of pci_dev_lock() as powerpc's EEH and
generig PCI AER processing do.
During error recovery testing a pair of tasks was reported to be hung:
mlx5_core 0000:00:00.1: mlx5_health_try_recover:338:(pid 5553): health recovery flow aborted, PCI reads still not working
INFO: task kmcheck:72 blocked for more than 122 seconds.
Not tainted 5.14.0-570.12.1.bringup7.el9.s390x #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kmcheck state:D stack:0 pid:72 tgid:72 ppid:2 flags:0x00000000
Call Trace:
[<000000065256f030>] __schedule+0x2a0/0x590
[<000000065256f356>] schedule+0x36/0xe0
[<000000065256f572>] schedule_preempt_disabled+0x22/0x30
[<0000000652570a94>] __mutex_lock.constprop.0+0x484/0x8a8
[<000003ff800673a4>] mlx5_unload_one+0x34/0x58 [mlx5_core]
[<000003ff8006745c>] mlx5_pci_err_detected+0x94/0x140 [mlx5_core]
[<0000000652556c5a>] zpci_event_attempt_error_recovery+0xf2/0x398
[<0000000651b9184a>] __zpci_event_error+0x23a/0x2c0
INFO: task kworker/u1664:6:1514 blocked for more than 122 seconds.
Not tainted 5.14.0-570.12.1.bringup7.el9.s390x #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u1664:6 state:D stack:0 pid:1514 tgid:1514 ppid:2 flags:0x00000000
Workqueue: mlx5_health0000:00:00.0 mlx5_fw_fatal_reporter_err_work [mlx5_core]
Call Trace:
[<000000065256f030>] __schedule+0x2a0/0x590
[<000000065256f356>] schedule+0x36/0xe0
[<0000000652172e28>] pci_wait_cfg+0x80/0xe8
[<0000000652172f94>] pci_cfg_access_lock+0x74/0x88
[<000003ff800916b6>] mlx5_vsc_gw_lock+0x36/0x178 [mlx5_core]
[<000003ff80098824>] mlx5_crdump_collect+0x34/0x1c8 [mlx5_core]
[<000003ff80074b62>] mlx5_fw_fatal_reporter_dump+0x6a/0xe8 [mlx5_core]
[<0000000652512242>] devlink_health_do_dump.part.0+0x82/0x168
[<0000000652513212>] devlink_health_report+0x19a/0x230
[<000003ff80075a12>] mlx5_fw_fatal_reporter_err_work+0xba/0x1b0 [mlx5_core]
No kernel log of the exact same error with an upstream kernel is
available - but the very same deadlock situation can be constructed there,
too:
- task: kmcheck
mlx5_unload_one() tries to acquire devlink lock while the PCI error
recovery code has set pdev->block_cfg_access by way of
pci_cfg_access_lock()
- task: kworker
mlx5_crdump_collect() tries to set block_cfg_access through
pci_cfg_access_lock() while devlink_health_report() had acquired
the devlink lock.
A similar deadlock situation can be reproduced by requesting a
crdump with
> devlink health dump show pci/<BDF> reporter fw_fatal
while PCI error recovery is executed on the same <BDF> physical function
by mlx5_core's pci_error_handlers. On s390 this can be injected with
> zpcictl --reset-fw <BDF>
Tests with this patch failed to reproduce that second deadlock situation,
the devlink command is rejected with "kernel answers: Permission denied" -
and we get a kernel log message of:
mlx5_core 1ed0:00:00.1: mlx5_crdump_collect:50:(pid 254382): crdump: failed to lock vsc gw err -5
because the config read of VSC_SEMAPHORE is rejected by the underlying
hardware.
Two prior attempts to address this issue have been discussed and
ultimately rejected [see link], with the primary argument that s390's
implementation of PCI error recovery is imposing restrictions that
neither powerpc's EEH nor PCI AER handling need. Tests show that PCI
error recovery on s390 is running to completion even without blocking
access to PCI config space.
Link: https://lore.kernel.org/all/20251007144826.2825134-1-gbayer@linux.ibm.com/
Cc: stable@vger.kernel.org
Fixes: 4cdf2f4e24 ("s390/pci: implement minimal PCI error recovery")
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
[ Adjust context ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cfd6f1a7b4 upstream.
A race condition occurs when ffs_func_eps_enable() runs concurrently
with ffs_data_reset(). The ffs_data_clear() called in ffs_data_reset()
sets ffs->epfiles to NULL before resetting ffs->eps_count to 0, leading
to a NULL pointer dereference when accessing epfile->ep in
ffs_func_eps_enable() after successful usb_ep_enable().
The ffs->epfiles pointer is set to NULL in both ffs_data_clear() and
ffs_data_close() functions, and its modification is protected by the
spinlock ffs->eps_lock. And the whole ffs_func_eps_enable() function
is also protected by ffs->eps_lock.
Thus, add NULL pointer handling for ffs->epfiles in the
ffs_func_eps_enable() function to fix issues
Signed-off-by: Owen Gu <guhuinan@xiaomi.com>
Link: https://lore.kernel.org/r/20250915092907.17802-1-guhuinan@xiaomi.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 607b9fb2ce upstream.
There's an issue with RDSEED's 16-bit and 32-bit register output
variants on Zen5 which return a random value of 0 "at a rate inconsistent
with randomness while incorrectly signaling success (CF=1)". Search the
web for AMD-SB-7055 for more detail.
Add a fix glue which checks microcode revisions.
[ bp: Add microcode revisions checking, rewrite. ]
Cc: stable@vger.kernel.org
Signed-off-by: Gregory Price <gourry@gourry.net>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20251018024010.4112396-1-gourry@gourry.net
[ bp: 6.12 backport: use the alternative microcode version checking. ]
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 54e96258a6 ]
scx_bpf_dsq_move_set_slice() and scx_bpf_dsq_move_set_vtime() take a DSQ
iterator argument which has to be valid. Mark them with KF_RCU.
Fixes: 4c30f5ce4f ("sched_ext: Implement scx_bpf_dispatch[_vtime]_from_dsq()")
Cc: stable@vger.kernel.org # v6.12+
Acked-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
[ scx_bpf_dsq_move_set_* => scx_bpf_dispatch_from_dsq_set_* ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d91a1d129b ]
Device-managed resources are cleaned up when the driver unbinds from
the underlying device. In our case this is the platform device as this
driver is a platform driver. Registering device-managed resources on
the associated ACPI device will thus result in a resource leak when
this driver unbinds.
Ensure that any device-managed resources are only registered on the
platform device to ensure that they are cleaned up during removal.
Fixes: 35c50d853a ("ACPI: fan: Add hwmon support")
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Cc: 6.11+ <stable@vger.kernel.org> # 6.11+
Link: https://patch.msgid.link/20251007234149.2769-4-W_Armin@gmx.de
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ba10f8d92a upstream.
[Why]
Newer VPE microcode has functionality that will decrease DPM level
only when a workload has run for 2 or more seconds. If VPE is turned
off before this DPM decrease and the PMFW doesn't reset it when
power gating VPE, the SOC can get stuck with a higher DPM level.
This can happen from amdgpu's ring buffer test because it's a short
quick workload for VPE and VPE is turned off after 1s.
[How]
In idle handler besides checking fences are drained check PMFW version
to determine if it will reset DPM when power gating VPE. If PMFW will
not do this, then check VPE DPM level. If it is not DPM0 reschedule
delayed work again until it is.
v2: squash in return fix (Alex)
Cc: Peyton.Lee@amd.com
Reported-by: Sultan Alsawaf <sultan@kerneltoast.com>
Reviewed-by: Sultan Alsawaf <sultan@kerneltoast.com>
Tested-by: Sultan Alsawaf <sultan@kerneltoast.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4615
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 3ac635367eb589bee8edcc722f812a89970e14b7)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e0023c8a74 upstream.
nouveau_sched_fini() uses a memory barrier before wait_event().
wait_event(), however, is a macro which expands to a loop which might
check the passed condition several times. The barrier would only take
effect for the first check.
Replace the barrier with a function which takes the spinlock.
Cc: stable@vger.kernel.org # v6.8+
Fixes: 5f03a507b2 ("drm/nouveau: implement 1:1 scheduler - entity relationship")
Acked-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Link: https://patch.msgid.link/20251024161221.196155-2-phasta@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 07ad45e06b upstream.
The function has a memory leak when kvrealloc() fails.
The function directly assigns NULL to the markers pointer, losing the
reference to the previously allocated memory. This causes kvfree() in
pt_dump_init() to free NULL instead of the leaked memory.
Fix by:
1. Using kvrealloc() uniformly for all allocations
2. Using a temporary variable to preserve the original pointer until
allocation succeeds
3. Removing the error path that sets markers_cnt=0 to keep
consistency between markers and markers_cnt
Found via static analysis and this is similar to commit 42378a9ca5
("bpf, verifier: Fix memory leak in array reallocation for stack state")
Fixes: d0e7915d2a ("s390/mm/ptdump: Generate address marker array dynamically")
Cc: stable@vger.kernel.org
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 434f7349a1 upstream.
Commit 4e65bda827 ("ASoC: wcd934x: fix error handling in
wcd934x_codec_parse_data()") revealed the problem in the slimbus regmap.
That commit breaks audio playback, for instance, on sdm845 Thundercomm
Dragonboard 845c board:
Unable to handle kernel paging request at virtual address ffff8000847cbad4
...
CPU: 5 UID: 0 PID: 776 Comm: aplay Not tainted 6.18.0-rc1-00028-g7ea30958b305 #11 PREEMPT
Hardware name: Thundercomm Dragonboard 845c (DT)
...
Call trace:
slim_xfer_msg+0x24/0x1ac [slimbus] (P)
slim_read+0x48/0x74 [slimbus]
regmap_slimbus_read+0x18/0x24 [regmap_slimbus]
_regmap_raw_read+0xe8/0x174
_regmap_bus_read+0x44/0x80
_regmap_read+0x60/0xd8
_regmap_update_bits+0xf4/0x140
_regmap_select_page+0xa8/0x124
_regmap_raw_write_impl+0x3b8/0x65c
_regmap_bus_raw_write+0x60/0x80
_regmap_write+0x58/0xc0
regmap_write+0x4c/0x80
wcd934x_hw_params+0x494/0x8b8 [snd_soc_wcd934x]
snd_soc_dai_hw_params+0x3c/0x7c [snd_soc_core]
__soc_pcm_hw_params+0x22c/0x634 [snd_soc_core]
dpcm_be_dai_hw_params+0x1d4/0x38c [snd_soc_core]
dpcm_fe_dai_hw_params+0x9c/0x17c [snd_soc_core]
snd_pcm_hw_params+0x124/0x464 [snd_pcm]
snd_pcm_common_ioctl+0x110c/0x1820 [snd_pcm]
snd_pcm_ioctl+0x34/0x4c [snd_pcm]
__arm64_sys_ioctl+0xac/0x104
invoke_syscall+0x48/0x104
el0_svc_common.constprop.0+0x40/0xe0
do_el0_svc+0x1c/0x28
el0_svc+0x34/0xec
el0t_64_sync_handler+0xa0/0xf0
el0t_64_sync+0x198/0x19c
The __devm_regmap_init_slimbus() started to be used instead of
__regmap_init_slimbus() after the commit mentioned above and turns out
the incorrect bus_context pointer (3rd argument) was used in
__devm_regmap_init_slimbus(). It should be just "slimbus" (which is equal
to &slimbus->dev). Correct it. The wcd934x codec seems to be the only or
the first user of devm_regmap_init_slimbus() but we should fix it till
the point where __devm_regmap_init_slimbus() was introduced therefore
two "Fixes" tags.
While at this, also correct the same argument in __regmap_init_slimbus().
Fixes: 4e65bda827 ("ASoC: wcd934x: fix error handling in wcd934x_codec_parse_data()")
Fixes: 7d6f7fb053 ("regmap: add SLIMbus support")
Cc: stable@vger.kernel.org
Cc: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Cc: Ma Ke <make24@iscas.ac.cn>
Cc: Steev Klimaszewski <steev@kali.org>
Cc: Srinivas Kandagatla <srini@kernel.org>
Reviewed-by: Abel Vesa <abel.vesa@linaro.org>
Signed-off-by: Alexey Klimov <alexey.klimov@linaro.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://patch.msgid.link/20251022201013.1740211-1-alexey.klimov@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0ba6502ce1 upstream.
When running "perf mem record" command on CWF, the below KASAN
global-out-of-bounds warning is seen.
==================================================================
BUG: KASAN: global-out-of-bounds in cmt_latency_data+0x176/0x1b0
Read of size 4 at addr ffffffffb721d000 by task dtlb/9850
Call Trace:
kasan_report+0xb8/0xf0
cmt_latency_data+0x176/0x1b0
setup_arch_pebs_sample_data+0xf49/0x2560
intel_pmu_drain_arch_pebs+0x577/0xb00
handle_pmi_common+0x6c4/0xc80
The issue is caused by below code in __grt_latency_data(). The code
tries to access x86_hybrid_pmu structure which doesn't exist on
non-hybrid platform like CWF.
WARN_ON_ONCE(hybrid_pmu(event->pmu)->pmu_type == hybrid_big)
So add is_hybrid() check before calling this WARN_ON_ONCE to fix the
global-out-of-bounds access issue.
Fixes: 090262439f ("perf/x86/intel: Rename model-specific pebs_latency_data functions")
Reported-by: Xudong Hao <xudong.hao@intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Zide Chen <zide.chen@intel.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251028064214.1451968-1-dapeng1.mi@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 19de03b312 upstream.
A REQ_OP_OPEN_ZONE request changes the condition of a sequential zone of
a zoned block device to the explicitly open condition
(BLK_ZONE_COND_EXP_OPEN). As such, it should be considered a write
operation.
Change this operation code to be an odd number to reflect this. The
following operation numbers are changed to keep the numbering compact.
No problems were reported without this change as this operation has no
data. However, this unifies the zone operation to reflect that they
modify the device state and also allows strengthening checks in the
block layer, e.g. checking if this operation is not issued against a
read-only device.
Fixes: 6c1b1da58f ("block: add zone open, close and finish operations")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 12a1c9353c upstream.
REQ_OP_ZONE_RESET_ALL is a zone management request. Fix
op_is_zone_mgmt() to return true for that operation, like it already
does for REQ_OP_ZONE_RESET.
While no problems were reported without this fix, this change allows
strengthening checks in various block device drivers (scsi sd,
virtioblk, DM) where op_is_zone_mgmt() is used to verify that a zone
management command is not being issued to a regular block device.
Fixes: 6c1b1da58f ("block: add zone open, close and finish operations")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 58764259eb upstream.
Usage of the ACPI device should be phased out in the future, as
the driver itself is now using the platform bus.
Replace any usage of struct acpi_device in acpi_fan_get_fst() to
allow users to drop usage of struct acpi_device.
Also extend the integer check to all three package elements.
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Link: https://patch.msgid.link/20251007234149.2769-2-W_Armin@gmx.de
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 501672e3c1 ]
Previously this was initialized with zero which represented PCIe Gen
1.0 instead of using the
maximum value from the speed table which is the behaviour of all other
smumgr implementations.
Fixes: 18aafc59b1 ("drm/amd/powerplay: implement fw related smu interface for iceland.")
Signed-off-by: John Smith <itistotalbotnet@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 92b0a6ae6672857ddeabf892223943d2f0e06c97)
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 07a13f913c ]
Previously this was initialized with zero which represented PCIe Gen
1.0 instead of using the
maximum value from the speed table which is the behaviour of all other
smumgr implementations.
Fixes: 18edef19ea ("drm/amd/powerplay: implement fw image related smu interface for Fiji.")
Signed-off-by: John Smith <itistotalbotnet@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c52238c9fb414555c68340cd80e487d982c1921c)
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 238d468d3e ]
'table_index' is a variable defined by the smu driver (kmd)
'table_id' is a variable defined by the hw smu (pmfw)
This code should use table_index as a bounds check.
Fixes: caad2613dc ("drm/amd/powerplay: move table setting common code to smu_cmn.c")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit fca0c66b22303de0d1d6313059baf4dc960a4753)
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 745bae76ac ]
Since the allocation of the drivers main structure was changed to
devm_drm_dev_alloc() drm_put_dev()'ing to trigger it to be free'd
should be done by devres.
However, drm_put_dev() is still in the probe error and device remove
paths. When the driver fails to probe warnings like the following are
shown because devres is trying to drm_put_dev() after the driver
already did it.
[ 5.642230] radeon 0000:01:05.0: probe with driver radeon failed with error -22
[ 5.649605] ------------[ cut here ]------------
[ 5.649607] refcount_t: underflow; use-after-free.
[ 5.649620] WARNING: CPU: 0 PID: 357 at lib/refcount.c:28 refcount_warn_saturate+0xbe/0x110
Fixes: a9ed2f052c ("drm/radeon: change drm_dev_alloc to devm_drm_dev_alloc")
Signed-off-by: Daniel Palmer <daniel@0x0f.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 3eb8c0b4c091da0a623ade0d3ee7aa4a93df1ea4)
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3328443363 ]
Since the allocation of the drivers main structure was changed to
devm_drm_dev_alloc() rdev is managed by devres and we shouldn't be calling
kfree() on it.
This fixes things exploding if the driver probe fails and devres cleans up
the rdev after we already free'd it.
Fixes: a9ed2f052c ("drm/radeon: change drm_dev_alloc to devm_drm_dev_alloc")
Signed-off-by: Daniel Palmer <daniel@0x0f.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 16c0681617b8a045773d4d87b6140002fa75b03b)
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b2dd1d0d32 ]
When configured for default synchronisation (Rx syncs to Tx) and the
SAI operates in consumer mode (clocks provided externally to Tx), a
synchronisation error occurs on Tx on the first attempt after device
initialisation when the playback stream is started while a capture
stream is already active. This results in channel shift/swap on the
playback stream.
Subsequent streams (ie after that first failing one) always work
correctly, no matter the order, with or without the other stream active.
This issue was observed (and fix tested) on an i.MX6UL board connected
to an ADAU1761 codec, where the codec provides both frame and bit clock
(connected to TX pins).
To fix this, always initialize the 'other' xCR4 and xCR5 registers when
we're starting a stream which is synced to the opposite one, irregardless
of the producer/consumer status.
Fixes: 51659ca069 ("ASoC: fsl-sai: set xCR4/xCR5/xMR for SAI master mode")
Signed-off-by: Maarten Zanders <maarten@zanders.be>
Reviewed-by: Shengjiu Wang <shengjiu.wang@gmail.com>
Link: https://patch.msgid.link/20251024135716.584265-1-maarten@zanders.be
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 520ad9e969 ]
The dpll.yaml spec incorrectly omitted module-name and clock-id from the
pin-get operation reply specification, even though the kernel DPLL
implementation has always included these attributes in pin-get responses
since the initial implementation.
This spec inconsistency caused issues with the C YNL code generator.
The generated dpll_pin_get_rsp structure was missing these fields.
Fix the spec by adding module-name and clock-id to the pin-attrs reply
specification to match the actual kernel behavior.
Fixes: 3badff3a25 ("dpll: spec: Add Netlink spec in YAML")
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20251024185512.363376-1-poros@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 65f9c4c588 ]
The ynl_attr_put_str() function was not including the null terminator
in the attribute length calculation. This caused kernel to reject
CTRL_CMD_GETFAMILY requests with EINVAL:
"Attribute failed policy validation".
For a 4-character family name like "dpll":
- Sent: nla_len=8 (4 byte header + 4 byte string without null)
- Expected: nla_len=9 (4 byte header + 5 byte string with null)
The bug was introduced in commit 15d2540e0d ("tools: ynl: check for
overflow of constructed messages") when refactoring from stpcpy() to
strlen(). The original code correctly included the null terminator:
end = stpcpy(ynl_attr_data(attr), str);
attr->nla_len = NLA_HDRLEN + NLA_ALIGN(end -
(char *)ynl_attr_data(attr));
Since stpcpy() returns a pointer past the null terminator, the length
included it. The refactored version using strlen() omitted the +1.
The fix also removes NLA_ALIGN() from nla_len calculation, since
nla_len should contain actual attribute length, not aligned length.
Alignment is only for calculating next attribute position. This makes
the code consistent with ynl_attr_put().
CTRL_ATTR_FAMILY_NAME uses NLA_NUL_STRING policy which requires
null terminator. Kernel validates with memchr() and rejects if not
found.
Fixes: 15d2540e0d ("tools: ynl: check for overflow of constructed messages")
Signed-off-by: Petr Oros <poros@redhat.com>
Tested-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Link: https://lore.kernel.org/20251018151737.365485-3-zahari.doychev@linux.com
Link: https://patch.msgid.link/20251024132438.351290-1-poros@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a042beac6e ]
The current logic uses the flush sequence from the current address
space. This is harmless when deducing the flush requirements for the
current submit, as either the incoming address space is the same one
as the currently active one or we switch context, in which case the
flush is unconditional.
However, this sequence is also stored as the current flush sequence
of the GPU. If we switch context the stored flush sequence will no
longer belong to the currently active address space. This incoherency
can then cause missed flushes, resulting in translation errors.
Fixes: 27b67278e0 ("drm/etnaviv: rework MMU handling")
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Link: https://lore.kernel.org/r/20251021093723.3887980-1-l.stach@pengutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 751463ceef ]
Periodic advertising enabled flag cannot be tracked by the enabled
flag since advertising and periodic advertising each can be
enabled/disabled separately from one another causing the states to be
inconsistent when for example an advertising set is disabled its
enabled flag is set to false which is then used for periodic which has
not being disabled.
Fixes: eca0ae4aea ("Bluetooth: Add initial implementation of BIS connections")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c403da5e98 ]
Socket dst_type cannot be directly assigned to hci_conn->type since
there domain is different which may lead to the wrong address type being
used.
Fixes: 6a5ad251b7 ("Bluetooth: ISO: Fix possible circular locking dependency")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0d92808024 ]
This fixes the state tracking of advertisement set/instance 0x00 which
is considered a legacy instance and is not tracked individually by
adv_instances list, previously it was assumed that hci_dev itself would
track it via HCI_LE_ADV but that is a global state not specifc to
instance 0x00, so to fix it a new flag is introduced that only tracks the
state of instance 0x00.
Fixes: 1488af7b8b ("Bluetooth: hci_sync: Fix hci_resume_advertising_sync")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 77343b8b4f ]
This patch adds logic to handle power management control when the
Bluetooth function is closed during the SDIO reset sequence.
Specifically, if BT is closed before reset, the driver enables the
SDIO function and sets driver pmctrl. After reset, if BT remains
closed, the driver sets firmware pmctrl and disables the SDIO function.
These changes ensure proper power management and device state consistency
across the reset flow.
Fixes: 8fafe70225 ("Bluetooth: mt7921s: support bluetooth reset mechanism")
Signed-off-by: Chris Lu <chris.lu@mediatek.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f0c200a4a5 ]
Socket dst_type cannot be directly assigned to hci_conn->type since
there domain is different which may lead to the wrong address type being
used.
Fixes: 6a5ad251b7 ("Bluetooth: ISO: Fix possible circular locking dependency")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 83d328a72e ]
Currently, hci_conn_hash_lookup_big only checks for BIS master connections,
by filtering out connections with the destination address set. This commit
updates this function to also consider BIS slave connections, since it is
also used for a Broadcast Receiver to set an available BIG handle before
issuing the LE BIG Create Sync command.
Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Stable-dep-of: f0c200a4a5 ("Bluetooth: ISO: Fix BIS connection dst_type handling")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 09b0cd1297 ]
hci_cmd_sync_dequeue_once() does lookup and then cancel
the entry under two separate lock sections. Meanwhile,
hci_cmd_sync_work() can also delete the same entry,
leading to double list_del() and "UAF".
Fix this by holding cmd_sync_work_lock across both
lookup and cancel, so that the entry cannot be removed
concurrently.
Fixes: 505ea2b295 ("Bluetooth: hci_sync: Add helper functions to manipulate cmd_sync queue")
Reported-by: Cen Zhang <zzzccc427@163.com>
Signed-off-by: Cen Zhang <zzzccc427@163.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 420c84c330 ]
The root cause of this issue are:
1. When probing the usbnet device, executing usbnet_link_change(dev, 0, 0);
put the kevent work in global workqueue. However, the kevent has not yet
been scheduled when the usbnet device is unregistered. Therefore, executing
free_netdev() results in the "free active object (kevent)" error reported
here.
2. Another factor is that when calling usbnet_disconnect()->unregister_netdev(),
if the usbnet device is up, ndo_stop() is executed to cancel the kevent.
However, because the device is not up, ndo_stop() is not executed.
The solution to this problem is to cancel the kevent before executing
free_netdev().
Fixes: a69e617e53 ("usbnet: Fix linkwatch use-after-free on disconnect")
Reported-by: Sam Sun <samsun1006219@gmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=8bfd7bcc98f7300afb84
Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
Link: https://patch.msgid.link/20251022024007.1831898-1-lizhi.xu@windriver.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cfca1637bc ]
The pcm->prepare() function may be called multiple times in a row by the
userspace, as mentioned in the documentation. The driver shall take that
into account and prevent redundancy. However, the exact same function is
called during XRUNs and in such case, the particular stream shall be
reset and setup anew.
Fixes: 9114700b49 ("ASoC: Intel: avs: Generic PCM FE operations")
Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com>
Link: https://patch.msgid.link/20251023092348.3119313-2-cezary.rojewski@intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3c9bf72cc1 ]
The clock obtained via devm_clk_get_enabled() is automatically managed
by devres and will be disabled and freed on driver detach. Manually
calling clk_disable_unprepare() in error path and remove function
causes double free.
Remove the manual clock cleanup in both aspeed_acry_probe()'s error
path and aspeed_acry_remove().
Fixes: 2f1cf4e50c ("crypto: aspeed - Add ACRY RSA driver")
Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 881a9c9cb7 ]
The failure of this check only results in a security mitigation being
applied, slightly affecting performance of the compiled BPF program. It
doesn't result in a failed syscall, an thus auditing a failed LSM
permission check for it is unwanted. For example with SELinux, it causes
a denial to be reported for confined processes running as root, which
tends to be flagged as a problem to be fixed in the policy. Yet
dontauditing or allowing CAP_SYS_ADMIN to the domain may not be
desirable, as it would allow/silence also other checks - either going
against the principle of least privilege or making debugging potentially
harder.
Fix it by changing it from capable() to ns_capable_noaudit(), which
instructs the LSMs to not audit the resulting denials.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2369326
Fixes: d4e89d212d ("x86/bpf: Call branch history clearing sequence on exit")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20251021122758.2659513-1-omosnace@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a76ab5731e ]
Private stack will be allocated with percpu allocator in jit time.
To avoid complexity at runtime, only one copy of private stack is
available per cpu per prog. So runtime recursion check is necessary
to avoid stack corruption.
Current private stack only supports kprobe/perf_event/tp/raw_tp
which has recursion check in the kernel, and prog types that use
bpf trampoline recursion check. For trampoline related prog types,
currently only tracing progs have recursion checking.
To avoid complexity, all async_cb subprogs use normal kernel stack
including those subprogs used by both main prog subtree and async_cb
subtree. Any prog having tail call also uses kernel stack.
To avoid jit penalty with private stack support, a subprog stack
size threshold is set such that only if the stack size is no less
than the threshold, private stack is supported. The current threshold
is 64 bytes. This avoids jit penality if the stack usage is small.
A useless 'continue' is also removed from a loop in func
check_max_stack_depth().
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20241112163907.2223839-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Stable-dep-of: 881a9c9cb7 ("bpf: Do not audit capability check in do_jit()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2551a1eedc ]
The previous implementation incorrectly assumed the original type of
'priv' was void**, leading to an unnecessary and misleading
cast. Correct the cast of the 'priv' pointer in test_dev_action() to
its actual type, long*, removing an unnecessary cast.
As an additional benefit, this fixes an out-of-bounds CHERI fault on
hardware with architectural capabilities. The original implementation
tried to store a capability-sized pointer using the priv
pointer. However, the priv pointer's capability only granted access to
the memory region of its original long type, leading to a bounds
violation since the size of a long is smaller than the size of a
capability. This change ensures that the pointer usage respects the
capabilities' bounds.
Link: https://lore.kernel.org/r/20251017092814.80022-1-florian.schmaus@codasip.com
Fixes: d03c720e03 ("kunit: Add APIs for managing devices")
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Florian Schmaus <florian.schmaus@codasip.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9c78e747dd ]
Bitwise operations with WMI_KEY_PAIRWISE (defined as 0) are ineffective
and misleading. This results in pairwise key validations added in
commit 97acb0259c ("wifi: ath11k: fix group data packet drops
during rekey") to always evaluate false and clear key commands for
pairwise keys are not honored.
Since firmware supports overwriting the new key without explicitly
clearing the previous one, there is no visible impact currently.
However, to restore consistency with the previous behavior and improve
clarity, replace bitwise operations with direct assignments and
comparisons for key flags.
Tested-on: QCN9074 hw1.0 PCI WLAN.HK.2.9.0.1-02146-QCAHKSWPL_SILICONZ-1
Tested-on: WCN6855 hw2.1 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.41
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/linux-wireless/aLlaetkalDvWcB7b@stanley.mountain
Fixes: 97acb0259c ("wifi: ath11k: fix group data packet drops during rekey")
Signed-off-by: Rameshkumar Sundaram <rameshkumar.sundaram@oss.qualcomm.com>
Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com>
Link: https://patch.msgid.link/20251003092158.1080637-1-rameshkumar.sundaram@oss.qualcomm.com
[update copyright per current guidance]
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b78c02f7c7 ]
The current code does not have the MU EDCA feature, so it cannot support
the use of EDCA by STA in specific UL MU HE TB PPDU transmissions. Refer
to IEEE Std 802.11ax-2021 "9.4.2.251 MU EDCA Parameter Set element",
"26.2.7 EDCA operation using MU EDCA parameters".
Add ath11k_mac_op_conf_tx_mu_edca() to construct the MU EDCA parameters
received from mac80211 into WMI WMM parameters,and send to the firmware
according to the different WMM type flags.
Tested-on: WCN6855 hw2.1 PCI WLAN.HSP.1.1-04523-QCAHSPSWPL_V1_V2_SILICONZ_IOE-1
Signed-off-by: Yu Zhang (Yuriy) <quic_yuzha@quicinc.com>
Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com>
Link: https://patch.msgid.link/20250124061343.2263467-1-quic_yuzha@quicinc.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
Stable-dep-of: 9c78e747dd ("wifi: ath11k: avoid bit operation on key flags")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 92282074e1 ]
ath12k just like ath11k [1] did not handle skb cleanup during idr
cleanup callback. Both ath12k_mac_vif_txmgmt_idr_remove() and
ath12k_mac_tx_mgmt_pending_free() performed idr cleanup and DMA
unmapping for skb but only ath12k_mac_tx_mgmt_pending_free() freed
skb. As a result, during vdev deletion a memory leak occurs.
Refactor all clean up steps into a new function. New function
ath12k_mac_tx_mgmt_free() creates a centralized area where idr
cleanup, DMA unmapping for skb and freeing skb is performed. Utilize
skb pointer given by idr_remove(), instead of passed as a function
argument because IDR will be protected by locking. This will prevent
concurrent modification of the same IDR.
Now ath12k_mac_tx_mgmt_pending_free() and
ath12k_mac_vif_txmgmt_idr_remove() call ath12k_mac_tx_mgmt_free().
Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.4.1-00199-QCAHKSWPL_SILICONZ-1
Link: https://lore.kernel.org/r/1637832614-13831-1-git-send-email-quic_srirrama@quicinc.com > # [1]
Fixes: d889913205 ("wifi: ath12k: driver for Qualcomm Wi-Fi 7 devices")
Signed-off-by: Karthik M <quic_karm@quicinc.com>
Signed-off-by: Muna Sinada <muna.sinada@oss.qualcomm.com>
Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com>
Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Link: https://patch.msgid.link/20250923220316.1595758-1-muna.sinada@oss.qualcomm.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 388eff894d upstream.
Sean reported [1] the following splat when running KVM tests:
WARNING: CPU: 232 PID: 15391 at xfd_validate_state+0x65/0x70
Call Trace:
<TASK>
fpu__clear_user_states+0x9c/0x100
arch_do_signal_or_restart+0x142/0x210
exit_to_user_mode_loop+0x55/0x100
do_syscall_64+0x205/0x2c0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
Chao further identified [2] a reproducible scenario involving signal
delivery: a non-AMX task is preempted by an AMX-enabled task which
modifies the XFD MSR.
When the non-AMX task resumes and reloads XSTATE with init values,
a warning is triggered due to a mismatch between fpstate::xfd and the
CPU's current XFD state. fpu__clear_user_states() does not currently
re-synchronize the XFD state after such preemption.
Invoke xfd_update_state() which detects and corrects the mismatch if
there is a dynamic feature.
This also benefits the sigreturn path, as fpu__restore_sig() may call
fpu__clear_user_states() when the sigframe is inaccessible.
[ dhansen: minor changelog munging ]
Closes: https://lore.kernel.org/lkml/aDCo_SczQOUaB2rS@google.com [1]
Fixes: 672365477a ("x86/fpu: Update XFD state where required")
Reported-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Tested-by: Chao Gao <chao.gao@intel.com>
Link: https://lore.kernel.org/all/aDWbctO%2FRfTGiCg3@intel.com [2]
Cc:stable@vger.kernel.org
Link: https://patch.msgid.link/20250610001700.4097-1-chang.seok.bae%40intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5c76f9961c upstream.
When smb2_query_info_compound() retries, a previously allocated cfid may
have been freed in the first attempt.
Because cfid wasn't reset on replay, later cleanup could act on a stale
pointer, leading to a potential use-after-free.
Reinitialize cfid to NULL under the replay label.
Example trace (trimmed):
refcount_t: underflow; use-after-free.
WARNING: CPU: 1 PID: 11224 at ../lib/refcount.c:28 refcount_warn_saturate+0x9c/0x110
[...]
RIP: 0010:refcount_warn_saturate+0x9c/0x110
[...]
Call Trace:
<TASK>
smb2_query_info_compound+0x29c/0x5c0 [cifs f90b72658819bd21c94769b6a652029a07a7172f]
? step_into+0x10d/0x690
? __legitimize_path+0x28/0x60
smb2_queryfs+0x6a/0xf0 [cifs f90b72658819bd21c94769b6a652029a07a7172f]
smb311_queryfs+0x12d/0x140 [cifs f90b72658819bd21c94769b6a652029a07a7172f]
? kmem_cache_alloc+0x18a/0x340
? getname_flags+0x46/0x1e0
cifs_statfs+0x9f/0x2b0 [cifs f90b72658819bd21c94769b6a652029a07a7172f]
statfs_by_dentry+0x67/0x90
vfs_statfs+0x16/0xd0
user_statfs+0x54/0xa0
__do_sys_statfs+0x20/0x50
do_syscall_64+0x58/0x80
Cc: stable@kernel.org
Fixes: 4f1fffa237 ("cifs: commands that are retried should have replay flag set")
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Acked-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b45873c3f0 upstream.
Commit c1e18c17bd ("s390/pci: add zpci_set_irq()/zpci_clear_irq()"),
introduced the zpci_set_irq() and zpci_clear_irq(), to be used while
resetting a zPCI device.
Commit da995d538d ("s390/pci: implement reset_slot for hotplug
slot"), mentions zpci_clear_irq() being called in the path for
zpci_hot_reset_device(). But that is not the case anymore and these
functions are not called outside of this file. Instead
zpci_hot_reset_device() relies on zpci_disable_device() also clearing
the IRQs, but misses to reset the zdev->irqs_registered flag.
However after a CLP disable/enable reset, the device's IRQ are
unregistered, but the flag zdev->irq_registered does not get cleared. It
creates an inconsistent state and so arch_restore_msi_irqs() doesn't
correctly restore the device's IRQ. This becomes a problem when a PCI
driver tries to restore the state of the device through
pci_restore_state(). Restore IRQ unconditionally for the device and remove
the irq_registered flag as its redundant.
Fixes: c1e18c17bd ("s390/pci: add zpci_set_irq()/zpci_clear_irq()")
Cc: stable@vger.kernnel.org
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 27b0e701d3 upstream.
Accessing the transmit queue without owning the msk socket lock is
inherently racy, hence __mptcp_check_push() could actually quit early
even when there is pending data.
That in turn could cause unexpected tx lock and timeout.
Dropping the early check avoids the race, implicitly relaying on later
tests under the relevant lock. With such change, all the other
mptcp_send_head() call sites are now under the msk socket lock and we
can additionally drop the now unneeded annotation on the transmit head
pointer accesses.
Fixes: 6e628cd3a8 ("mptcp: use mptcp release_cb for delayed tasks")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Tested-by: Geliang Tang <geliang@kernel.org>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251028-net-mptcp-send-timeout-v1-1-38ffff5a9ec8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit eb53368f8d upstream.
The of_find_node_by_name() function returns a device tree node with its
reference count incremented. The caller is responsible for calling
of_node_put() to release this reference when done.
Found via static analysis.
Fixes: cc5d0189b9 ("[PATCH] powerpc: Remove device_node addrs/n_addr")
Cc: stable@vger.kernel.org
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5f566c0ac5 upstream.
Commit e24cca19ba ("sh: Kill off MAX_DMA_ADDRESS leftovers.") removed
the define ONCHIP_NR_DMA_CHANNELS. So that the leftover reference needs
to be replaced by CONFIG_NR_ONCHIP_DMA_CHANNELS to compile successfully
with CONFIG_PVR2_DMA enabled.
Signed-off-by: Florian Fuchs <fuchsfl@gmail.com>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3776c685eb upstream.
Currently, whenever there is a need to transmit an Action frame,
the brcmfmac driver always uses the P2P vif to send the "actframe" IOVAR to
firmware. The P2P interfaces were available when wpa_supplicant is managing
the wlan interface.
However, the P2P interfaces are not created/initialized when only hostapd
is managing the wlan interface. And if hostapd receives an ANQP Query REQ
Action frame even from an un-associated STA, the brcmfmac driver tries
to use an uninitialized P2P vif pointer for sending the IOVAR to firmware.
This NULL pointer dereferencing triggers a driver crash.
[ 1417.074538] Unable to handle kernel NULL pointer dereference at virtual
address 0000000000000000
[...]
[ 1417.075188] Hardware name: Raspberry Pi 4 Model B Rev 1.5 (DT)
[...]
[ 1417.075653] Call trace:
[ 1417.075662] brcmf_p2p_send_action_frame+0x23c/0xc58 [brcmfmac]
[ 1417.075738] brcmf_cfg80211_mgmt_tx+0x304/0x5c0 [brcmfmac]
[ 1417.075810] cfg80211_mlme_mgmt_tx+0x1b0/0x428 [cfg80211]
[ 1417.076067] nl80211_tx_mgmt+0x238/0x388 [cfg80211]
[ 1417.076281] genl_family_rcv_msg_doit+0xe0/0x158
[ 1417.076302] genl_rcv_msg+0x220/0x2a0
[ 1417.076317] netlink_rcv_skb+0x68/0x140
[ 1417.076330] genl_rcv+0x40/0x60
[ 1417.076343] netlink_unicast+0x330/0x3b8
[ 1417.076357] netlink_sendmsg+0x19c/0x3f8
[ 1417.076370] __sock_sendmsg+0x64/0xc0
[ 1417.076391] ____sys_sendmsg+0x268/0x2a0
[ 1417.076408] ___sys_sendmsg+0xb8/0x118
[ 1417.076427] __sys_sendmsg+0x90/0xf8
[ 1417.076445] __arm64_sys_sendmsg+0x2c/0x40
[ 1417.076465] invoke_syscall+0x50/0x120
[ 1417.076486] el0_svc_common.constprop.0+0x48/0xf0
[ 1417.076506] do_el0_svc+0x24/0x38
[ 1417.076525] el0_svc+0x30/0x100
[ 1417.076548] el0t_64_sync_handler+0x100/0x130
[ 1417.076569] el0t_64_sync+0x190/0x198
[ 1417.076589] Code: f9401e80 aa1603e2 f9403be1 5280e483 (f9400000)
Fix this, by always using the vif corresponding to the wdev on which the
Action frame Transmission request was initiated by the userspace. This way,
even if P2P vif is not available, the IOVAR is sent to firmware on AP vif
and the ANQP Query RESP Action frame is transmitted without crashing the
driver.
Move init_completion() for "send_af_done" from brcmf_p2p_create_p2pdev()
to brcmf_p2p_attach(). Because the former function would not get executed
when only hostapd is managing wlan interface, and it is not safe to do
reinit_completion() later in brcmf_p2p_tx_action_frame(), without any prior
init_completion().
And in the brcmf_p2p_tx_action_frame() function, the condition check for
P2P Presence response frame is not needed, since the wpa_supplicant is
properly sending the P2P Presense Response frame on the P2P-GO vif instead
of the P2P-Device vif.
Cc: stable@vger.kernel.org
Fixes: 18e2f61db3 ("brcmfmac: P2P action frame tx")
Signed-off-by: Gokul Sivakumar <gokulkumar.sivakumar@infineon.com>
Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Link: https://patch.msgid.link/20251013102819.9727-1-gokulkumar.sivakumar@infineon.com
[Cc stable]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 91d35ec9b3 upstream.
The RFCOMM driver confuses the local and remote modem control signals,
which specifically means that the reported DTR and RTS state will
instead reflect the remote end (i.e. DSR and CTS).
This issue dates back to the original driver (and a follow-on update)
merged in 2002, which resulted in a non-standard implementation of
TIOCMSET that allowed controlling also the TS07.10 IC and DV signals by
mapping them to the RI and DCD input flags, while TIOCMGET failed to
return the actual state of DTR and RTS.
Note that the bogus control of input signals in tiocmset() is just
dead code as those flags will have been masked out by the tty layer
since 2003.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1ab6658174 upstream.
In virtio-net, we have not yet supported multi-buffer XDP packet in
zerocopy mode when there is a binding XDP program. However, in that
case, when receiving multi-buffer XDP packet, we skip the XDP program
and return XDP_PASS. As a result, the packet is passed to normal network
stack which is an incorrect behavior (e.g. a XDP program for packet
count is installed, multi-buffer XDP packet arrives and does go through
XDP program. As a result, the packet count does not increase but the
packet is still received from network stack).This commit instead returns
XDP_ABORTED in that case.
Fixes: 99c861b44e ("virtio_net: xsk: rx: support recv merge mode")
Cc: stable@vger.kernel.org
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Link: https://patch.msgid.link/20251022155630.49272-1-minhquangbui99@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8f067aa594 upstream.
The switch_brightness_work delayed work accesses device->brightness
and device->backlight, freed by acpi_video_dev_unregister_backlight()
during device removal.
If the work executes after acpi_video_bus_unregister_backlight()
frees these resources, it causes a use-after-free when
acpi_video_switch_brightness() dereferences device->brightness or
device->backlight.
Fix this by calling cancel_delayed_work_sync() for each device's
switch_brightness_work in acpi_video_bus_remove_notify_handler()
after removing the notify handler that queues the work. This ensures
the work completes before the memory is freed.
Fixes: 8ab58e8e7e ("ACPI / video: Fix backlight taking 2 steps on a brightness up/down keypress")
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Yuhao Jiang <danisjiang@gmail.com>
Reviewed-by: Hans de Goede <hansg@kernel.org>
[ rjw: Changelog edit ]
Link: https://patch.msgid.link/20251022200704.2655507-1-danisjiang@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7073c7fc8d upstream.
Actually check the return value from pll_ops->init_pll()
as it can return an error.
If the card's BIOS didn't run because it's not the primary VGA card
the fact that the xclk source is unsupported is printed as shown
below but the driver continues on regardless and on my machine causes
a hard lock up.
[ 61.470088] atyfb 0000:03:05.0: enabling device (0080 -> 0083)
[ 61.476191] atyfb: using auxiliary register aperture
[ 61.481239] atyfb: 3D RAGE XL (Mach64 GR, PCI-33) [0x4752 rev 0x27]
[ 61.487569] atyfb: 512K SGRAM (1:1), 14.31818 MHz XTAL, 230 MHz PLL, 83 Mhz MCLK, 63 MHz XCLK
[ 61.496112] atyfb: Unsupported xclk source: 5.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Daniel Palmer <daniel@0x0f.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a1f3058930 upstream.
Recently, we discovered the following issue through syzkaller:
BUG: KASAN: slab-use-after-free in fb_mode_is_equal+0x285/0x2f0
Read of size 4 at addr ff11000001b3c69c by task syz.xxx
...
Call Trace:
<TASK>
dump_stack_lvl+0xab/0xe0
print_address_description.constprop.0+0x2c/0x390
print_report+0xb9/0x280
kasan_report+0xb8/0xf0
fb_mode_is_equal+0x285/0x2f0
fbcon_mode_deleted+0x129/0x180
fb_set_var+0xe7f/0x11d0
do_fb_ioctl+0x6a0/0x750
fb_ioctl+0xe0/0x140
__x64_sys_ioctl+0x193/0x210
do_syscall_64+0x5f/0x9c0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Based on experimentation and analysis, during framebuffer unregistration,
only the memory of fb_info->modelist is freed, without setting the
corresponding fb_display[i]->mode to NULL for the freed modes. This leads
to UAF issues during subsequent accesses. Here's an example of reproduction
steps:
1. With /dev/fb0 already registered in the system, load a kernel module
to register a new device /dev/fb1;
2. Set fb1's mode to the global fb_display[] array (via FBIOPUT_CON2FBMAP);
3. Switch console from fb to VGA (to allow normal rmmod of the ko);
4. Unload the kernel module, at this point fb1's modelist is freed, leaving
a wild pointer in fb_display[];
5. Trigger the bug via system calls through fb0 attempting to delete a mode
from fb0.
Add a check in do_unregister_framebuffer(): if the mode to be freed exists
in fb_display[], set the corresponding mode pointer to NULL.
Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Remove the need for the user to know the limitations of this PWM
implementation by adjusting configuration requests to be the closest
acceptable value. Add a get_state method so that the actual values can
be queried.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
EMMC clock speeds are based around divisions of 52Mhz, not the 50MHz
used by SD. As such, relax the "full speed" check (intended to stop
any overclock whenever an operation has to be retried) so that any
requested speed of 50MHz or higher will be overclocked.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
EMMC clock speeds are based around divisions of 52Mhz, not the 50MHz
used by SD. As such, relax the "full speed" check (intended to stop
any overclock whenever an operation has to be retried) so that any
requested speed of 50MHz or higher will be overclocked.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
EMMC clock speeds are based around divisions of 52Mhz, not the 50MHz
used by SD. As such, relax the "full speed" check (intended to stop
any overclock whenever an operation has to be retried) so that any
requested speed of 50MHz or higher will be overclocked.
See: https://github.com/raspberrypi/linux/issues/7120
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Switch the power management in the imx708 device driver to use auto-
suspend with a 5s timeout.
This improves mode switching time that avoids additional regulator
switch-on delays and common register I2C writes.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
Add an overlay to allow composite video to be output on GPIOs 4-11
on Raspberry Pi 5, 500, 500+ or CM5 only, with an optional 108 MHz
clock on GPIO 0 and duplicate MSB on GPIO 27.
Requires composite video to be enabled and DPI to be disabled.
Signed-off-by: Nick Hollinghurst <nick.hollinghurst@raspberrypi.com>
Follow up to "drm/vc4: plane: Swap Cb/Cr pointers for YVU formats"
Swap the offsets as well as the buffer objects for YVU formats.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Switch the power management in the imx477 device driver to use auto-
suspend with a 5s timeout.
This improves mode switching time that avoids additional regulator
switch-on delays and common register I2C writes.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
Fix these two smatch warnings for the vc-sm-cma driver, rest were false
positives:
../vc-sm-cma/vc_sm.c:413
vc_sm_dma_buf_attach() warn: inconsistent returns '&buf->lock'.
Locked on : 396
Unlocked on: 413
../vc-sm-cma/vc_sm.c:1225
vc_sm_cma_ioctl_alloc() error: we previously assumed 'buffer' could be
null (see line 1113)
Signed-off-by: Jai Luthra <jai.luthra@ideasonboard.com>
The 4056x3040 mode appears to need more vertical blanking lines
than any other, leaving a black bar at the bottom of the image.
Increase IMX477_VBLANK_MIN from 4 to 48 to compensate. (It may be
possible to reduce it slightly further, but fix the regression
for now).
https://github.com/raspberrypi/linux/issues/7109
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
hvs6 appears to have dropped support for the component order
field in 3 plane YUV formats.
Support them by swapping the Cb and Cr planes over when
reading the image pointers.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
commit e92c294120 upstream.
The parentheses for the unlikely() annotation were put in the wrong
place so it means that the condition is basically never true and the
bounds checking is skipped.
Fixes: aab9458b9f ("btrfs: tree-checker: add inode extref checks")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 00aaae60fa ]
There are GPIO controllers such as the one present in the LX2160ARDB
QIXIS FPGA which have fixed-direction input and output GPIO lines mixed
together in a single register. This cannot be modeled using the
gpio-regmap as-is since there is no way to present the true direction of
a GPIO line.
In order to make this use case possible, add a new configuration
parameter - fixed_direction_output - into the gpio_regmap_config
structure. This will enable user drivers to provide a bitmap that
represents the fixed direction of the GPIO lines.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Acked-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Michael Walle <mwalle@kernel.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Stable-dep-of: 2ba5772e53 ("gpio: idio-16: Define fixed direction of the GPIO lines")
Signed-off-by: William Breathitt Gray <wbg@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 19408200c0 ]
Add GENMASK_TYPE() which generalizes __GENMASK() to support different
types, and implement fixed-types versions of GENMASK() based on it.
The fixed-type version allows more strict checks to the min/max values
accepted, which is useful for defining registers like implemented by
i915 and xe drivers with their REG_GENMASK*() macros.
The strict checks rely on shift-count-overflow compiler check to fail
the build if a number outside of the range allowed is passed.
Example:
#define FOO_MASK GENMASK_U32(33, 4)
will generate a warning like:
include/linux/bits.h:51:27: error: right shift count >= width of type [-Werror=shift-count-overflow]
51 | type_max(t) >> (BITS_PER_TYPE(t) - 1 - (h)))))
| ^~
The result is casted to the corresponding fixed width type. For
example, GENMASK_U8() returns an u8. Note that because of the C
promotion rules, GENMASK_U8() and GENMASK_U16() will immediately be
promoted to int if used in an expression. Regardless, the main goal is
not to get the correct type, but rather to enforce more checks at
compile time.
While GENMASK_TYPE() is crafted to cover all variants, including the
already existing GENMASK(), GENMASK_ULL() and GENMASK_U128(), for the
moment, only use it for the newly introduced GENMASK_U*(). The
consolidation will be done in a separate change.
Co-developed-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Stable-dep-of: 2ba5772e53 ("gpio: idio-16: Define fixed direction of the GPIO lines")
Signed-off-by: William Breathitt Gray <wbg@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 31299a5e02 ]
This is a preparation for the upcoming GENMASK_U*() and BIT_U*()
changes. After introducing those new macros, there will be a lot of
scrolling between the #if, #else and #endif.
Add a comment to the #else and #endif preprocessor macros to help keep
track of which context we are in. Also, add new lines to better
visually separate the non-asm and asm sections.
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Stable-dep-of: 2ba5772e53 ("gpio: idio-16: Define fixed direction of the GPIO lines")
Signed-off-by: William Breathitt Gray <wbg@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 094ee6017e ]
Following operations can trigger a warning[1]:
ip netns add ns1
ip netns exec ns1 ip link add bond0 type bond mode balance-rr
ip netns exec ns1 ip link set dev bond0 xdp obj af_xdp_kern.o sec xdp
ip netns exec ns1 ip link set bond0 type bond mode broadcast
ip netns del ns1
When delete the namespace, dev_xdp_uninstall() is called to remove xdp
program on bond dev, and bond_xdp_set() will check the bond mode. If bond
mode is changed after attaching xdp program, the warning may occur.
Some bond modes (broadcast, etc.) do not support native xdp. Set bond mode
with xdp program attached is not good. Add check for xdp program when set
bond mode.
[1]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 11 at net/core/dev.c:9912 unregister_netdevice_many_notify+0x8d9/0x930
Modules linked in:
CPU: 0 UID: 0 PID: 11 Comm: kworker/u4:0 Not tainted 6.14.0-rc4 #107
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
Workqueue: netns cleanup_net
RIP: 0010:unregister_netdevice_many_notify+0x8d9/0x930
Code: 00 00 48 c7 c6 6f e3 a2 82 48 c7 c7 d0 b3 96 82 e8 9c 10 3e ...
RSP: 0018:ffffc90000063d80 EFLAGS: 00000282
RAX: 00000000ffffffa1 RBX: ffff888004959000 RCX: 00000000ffffdfff
RDX: 0000000000000000 RSI: 00000000ffffffea RDI: ffffc90000063b48
RBP: ffffc90000063e28 R08: ffffffff82d39b28 R09: 0000000000009ffb
R10: 0000000000000175 R11: ffffffff82d09b40 R12: ffff8880049598e8
R13: 0000000000000001 R14: dead000000000100 R15: ffffc90000045000
FS: 0000000000000000(0000) GS:ffff888007a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000d406b60 CR3: 000000000483e000 CR4: 00000000000006f0
Call Trace:
<TASK>
? __warn+0x83/0x130
? unregister_netdevice_many_notify+0x8d9/0x930
? report_bug+0x18e/0x1a0
? handle_bug+0x54/0x90
? exc_invalid_op+0x18/0x70
? asm_exc_invalid_op+0x1a/0x20
? unregister_netdevice_many_notify+0x8d9/0x930
? bond_net_exit_batch_rtnl+0x5c/0x90
cleanup_net+0x237/0x3d0
process_one_work+0x163/0x390
worker_thread+0x293/0x3b0
? __pfx_worker_thread+0x10/0x10
kthread+0xec/0x1e0
? __pfx_kthread+0x10/0x10
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2f/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
---[ end trace 0000000000000000 ]---
Fixes: 9e2ee5c7e7 ("net, bonding: Add XDP support to the bonding driver")
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Acked-by: Jussi Maki <joamaki@gmail.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20250321044852.1086551-1-wangliang74@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Rajani Kantha <681739313@139.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 22ccb684c1 ]
Bonding only supports native XDP for specific modes, which can lead to
confusion for users regarding why XDP loads successfully at times and
fails at others. This patch enhances error handling by returning detailed
error messages, providing users with clearer insights into the specific
reasons for the failure when loading native XDP.
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20241021031211.814-2-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Rajani Kantha <681739313@139.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 48ea8b2004 ]
syzbot reports a f2fs bug as below:
------------[ cut here ]------------
kernel BUG at fs/f2fs/segment.c:2746!
CPU: 0 UID: 0 PID: 5323 Comm: syz.0.0 Not tainted 6.13.0-rc2-syzkaller-00018-g7cb1b4663150 #0
RIP: 0010:get_new_segment fs/f2fs/segment.c:2746 [inline]
RIP: 0010:new_curseg+0x1f52/0x1f70 fs/f2fs/segment.c:2876
Call Trace:
<TASK>
__allocate_new_segment+0x1ce/0x940 fs/f2fs/segment.c:3210
f2fs_allocate_new_section fs/f2fs/segment.c:3224 [inline]
f2fs_allocate_pinning_section+0xfa/0x4e0 fs/f2fs/segment.c:3238
f2fs_expand_inode_data+0x696/0xca0 fs/f2fs/file.c:1830
f2fs_fallocate+0x537/0xa10 fs/f2fs/file.c:1940
vfs_fallocate+0x569/0x6e0 fs/open.c:327
do_vfs_ioctl+0x258c/0x2e40 fs/ioctl.c:885
__do_sys_ioctl fs/ioctl.c:904 [inline]
__se_sys_ioctl+0x80/0x170 fs/ioctl.c:892
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Concurrent pinfile allocation may run out of free section, result in
panic in get_new_segment(), let's expand pin_sem lock coverage to
include f2fs_gc(), so that we can make sure to reclaim enough free
space for following allocation.
In addition, do below changes to enhance error path handling:
- call f2fs_bug_on() only in non-pinfile allocation path in
get_new_segment().
- call reset_curseg_fields() to reset all fields of curseg in
new_curseg()
Fixes: f5a53edcf0 ("f2fs: support aligned pinned file")
Reported-by: syzbot+15669ec8c35ddf6c3d43@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/675cd64e.050a0220.37aaf.00bb.GAE@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Rajani Kantha <681739313@139.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e84cb860ac ]
The special C-flag case expects the ADD_ADDR to be received when
switching to 'fully-established'. But for various reasons, the ADD_ADDR
could be sent after the "4th ACK", and the special case doesn't work.
On NIPA, the new test validating this special case for the C-flag failed
a few times, e.g.
102 default limits, server deny join id 0
syn rx [FAIL] got 0 JOIN[s] syn rx expected 2
Server ns stats
(...)
MPTcpExtAddAddrTx 1
MPTcpExtEchoAdd 1
Client ns stats
(...)
MPTcpExtAddAddr 1
MPTcpExtEchoAddTx 1
synack rx [FAIL] got 0 JOIN[s] synack rx expected 2
ack rx [FAIL] got 0 JOIN[s] ack rx expected 2
join Rx [FAIL] see above
syn tx [FAIL] got 0 JOIN[s] syn tx expected 2
join Tx [FAIL] see above
I had a suspicion about what the issue could be: the ADD_ADDR might have
been received after the switch to the 'fully-established' state. The
issue was not easy to reproduce. The packet capture shown that the
ADD_ADDR can indeed be sent with a delay, and the client would not try
to establish subflows to it as expected.
A simple fix is not to mark the endpoints as 'used' in the C-flag case,
when looking at creating subflows to the remote initial IP address and
port. In this case, there is no need to try.
Note: newly added fullmesh endpoints will still continue to be used as
expected, thanks to the conditions behind mptcp_pm_add_addr_c_flag_case.
Fixes: 4b1ff850e0 ("mptcp: pm: in-kernel: usable client side with C-flag")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251020-net-mptcp-c-flag-late-add-addr-v1-1-8207030cb0e8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[ applied to pm_netlink.c instead of pm_kernel.c ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 00d95fcc4d upstream.
The ErrorString() and SafeString() docutils functions were helpers meant to
ease the handling of encodings during the Python 3 transition. There is no
real need for them after Python 3.6, and docutils 0.22 removes them,
breaking the docs build
Handle this by just injecting our own one-liner version of ErrorString(),
and removing the sole SafeString() call entirely.
Reported-by: Zhixu Liu <zhixu.liu@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <87ldmnv2pi.fsf@trenco.lwn.net>
[ Salvatore Bonaccorso: Backport to v6.17.y for context changes in
Documentation/sphinx/kernel_include.py with major refactorings for the v6.18
development cycle. Backport ErrorString definition as well to
Documentation/sphinx/kernel_abi.py file for 6.12.y where it is imported
from docutils before the faccc0ec64 ("docs: sphinx/kernel_abi: adjust
coding style") change. ]
Suggested-by: Andreas Radke <andreas.radke@mailbox.org>
Signed-off-by: Salvatore Bonaccorso <carnil@debian.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 35561bab76 ]
The include/generated/asm-offsets.h is generated in Kbuild during
compiling from arch/SRCARCH/kernel/asm-offsets.c. When we want to
generate another similar offset header file, circular dependency can
happen.
For example, we want to generate a offset file include/generated/test.h,
which is included in include/sched/sched.h. If we generate asm-offsets.h
first, it will fail, as include/sched/sched.h is included in asm-offsets.c
and include/generated/test.h doesn't exist; If we generate test.h first,
it can't success neither, as include/generated/asm-offsets.h is included
by it.
In x86_64, the macro COMPILE_OFFSETS is used to avoid such circular
dependency. We can generate asm-offsets.h first, and if the
COMPILE_OFFSETS is defined, we don't include the "generated/test.h".
And we define the macro COMPILE_OFFSETS for all the asm-offsets.c for this
purpose.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d452972858 ]
The qmap dump operation was destructively consuming queue entries while
displaying them. As dump can be triggered anytime, this can easily lead to
stalls. Add a temporary dump_store queue and modify the dump logic to pop
entries, display them, and then restore them back to the original queue.
This allows dump operations to be performed without affecting the
scheduler's queue state.
Note that if racing against new enqueues during dump, ordering can get
mixed up, but this is acceptable for debugging purposes.
Acked-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 45c222468d ]
After setting the BTRFS_ROOT_FORCE_COW flag on the root we are doing a
full write barrier, smp_wmb(), but we don't need to, all we need is a
smp_mb__after_atomic(). The use of the smp_wmb() is from the old days
when we didn't use a bit and used instead an int field in the root to
signal if cow is forced. After the int field was changed to a bit in
the root's state (flags field), we forgot to update the memory barrier
in create_pending_snapshot() to smp_mb__after_atomic(), but we did the
change in commit_fs_roots() after clearing BTRFS_ROOT_FORCE_COW. That
happened in commit 27cdeb7096 ("Btrfs: use bitfield instead of integer
data type for the some variants in btrfs_root"). On the reader side, in
should_cow_block(), we also use the counterpart smp_mb__before_atomic()
which generates further confusion.
So change the smp_wmb() to smp_mb__after_atomic(). In fact we don't
even need any barrier at all since create_pending_snapshot() is called
in the critical section of a transaction commit and therefore no one
can concurrently join/attach the transaction, or start a new one, until
the transaction is unblocked. By the time someone starts a new transaction
and enters should_cow_block(), a lot of implicit memory barriers already
took place by having acquired several locks such as fs_info->trans_lock
and extent buffer locks on the root node at least. Nevertlheless, for
consistency use smp_mb__after_atomic() after setting the force cow bit
in create_pending_snapshot().
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit aab9458b9f ]
Like inode refs, inode extrefs have a variable length name, which means
we have to do a proper check to make sure no header nor name can exceed
the item limits.
The check itself is very similar to check_inode_ref(), just a different
structure (btrfs_inode_extref vs btrfs_inode_ref).
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5a0565cad3 ]
If we fail to update the inode at link_to_fixup_dir(), we don't abort the
transaction and propagate the error up the call chain, which makes it hard
to pinpoint the error to the inode update. So abort the transaction if the
inode update call fails, so that if it happens we known immediately.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6cb7f0b8c9 ]
We already have the extent buffer's level in an argument, there's no need
to first ensure the extent buffer's data is loaded (by calling
btrfs_read_extent_buffer()) and then call btrfs_header_level() to check
the level. So use the level argument and do the check before calling
btrfs_read_extent_buffer().
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2f5b8095ea ]
Currently we have this odd behaviour:
1) At btrfs_replay_log() we drop the reference of the log root tree if
the call to btrfs_recover_log_trees() failed;
2) But if the call to btrfs_recover_log_trees() did not fail, we don't
drop the reference in btrfs_replay_log() - we expect that
btrfs_recover_log_trees() does it in case it returns success.
Let's simplify this and make btrfs_replay_log() always drop the reference
on the log root tree, not only this simplifies code as it's what makes
sense since it's btrfs_replay_log() who grabbed the reference in the first
place.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a7f3dfb829 ]
Replace max_t() followed by min_t() with a single clamp().
As was pointed by David Laight in
https://lore.kernel.org/linux-btrfs/20250906122458.75dfc8f0@pumpkin/
the calculation may overflow u32 when the input value is too large, so
clamp_t() is not used. In practice the expected values are in range of
megabytes to gigabytes (throughput limit) so the bug would not happen.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: David Sterba <dsterba@suse.com>
[ Use clamp() and add explanation. ]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0d703963d2 ]
The hint block group selection in the extent allocator is wrong in the
first place, as it can select the dedicated data relocation block group for
the normal data allocation.
Since we separated the normal data space_info and the data relocation
space_info, we can easily identify a block group is for data relocation or
not. Do not choose it for the normal data allocation.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3c44cd3c79 ]
Now that btrfs_zone_finish_endio_workfn() is directly calling
do_zone_finish() the only caller of btrfs_zone_finish_endio() is
btrfs_finish_one_ordered().
btrfs_finish_one_ordered() already has error handling in-place so
btrfs_zone_finish_endio() can return an error if the block group lookup
fails.
Also as btrfs_zone_finish_endio() already checks for zoned filesystems and
returns early, there's no need to do this in the caller.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e6dd405b66 ]
In the process_one_buffer() log tree walk callback we return errors to the
log tree walk caller and then the caller aborts the transaction, if we
have one, or turns the fs into error state if we don't have one. While
this reduces code it makes it harder to figure out where exactly an error
came from. So add the transaction aborts after every failure inside the
process_one_buffer() callback, so that it helps figuring out why failures
happen.
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6ebd726b10 ]
We do several things while walking a log tree (for replaying and for
freeing a log tree) like reading extent buffers and cleaning them up,
but we don't immediately abort the transaction, or turn the fs into an
error state, when one of these things fails. Instead we the transaction
abort or turn the fs into error state in the caller of the entry point
function that walks a log tree - walk_log_tree() - which means we don't
get to know exactly where an error came from.
Improve on this by doing a transaction abort / turn fs into error state
after each such failure so that when it happens we have a better
understanding where the failure comes from. This deliberately leaves
the transaction abort / turn fs into error state in the callers of
walk_log_tree() as to ensure we don't get into an inconsistent state in
case we forget to do it deeper in call chain. It also deliberately does
not do it after errors from the calls to the callback defined in
struct walk_control::process_func(), as we will do it later on another
patch.
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 59d5de3655 ]
A previous patch fixed a bug where new_prs should be assigned before
checking housekeeping conflicts. This patch addresses another potential
issue: the nocpu error check currently uses the xcpus which is not updated.
Although no issue has been observed so far, the check should be performed
using the new effective exclusive cpus.
The comment has been removed because the function returns an error if
nocpu checking fails, which is unrelated to the parent.
Signed-off-by: Chen Ridong <chenridong@huawei.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6e1c2c6c2c ]
Newer AMD systems can support up to 16 channels per EDAC "mc" device.
These are detected by the EDAC module running on the device, and the
current EDAC interface is appropriately enumerated.
The legacy EDAC sysfs interface however, provides device attributes for
channels 0 through 11 only. Consequently, the last four channels, 12
through 15, will not be enumerated and will not be visible through the
legacy sysfs interface.
Add additional device attributes to ensure that all 16 channels, if
present, are enumerated by and visible through the legacy EDAC sysfs
interface.
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250916203242.1281036-1-avadhut.naik@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d1cc1baef6 ]
The LFENCE retpoline mitigation is not secure but the kernel prints
inconsistent messages about this fact. The dmesg log says 'Mitigation:
LFENCE', implying the system is mitigated. But sysfs reports 'Vulnerable:
LFENCE' implying the system (correctly) is not mitigated.
Fix this by printing a consistent 'Vulnerable: LFENCE' string everywhere
when this mitigation is selected.
Signed-off-by: David Kaplan <david.kaplan@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250915134706.3201818-1-david.kaplan@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 930f2361fe ]
On Intel CPUs, the default retbleed mitigation is IBRS/eIBRS but this
requires that a similar spectre_v2 mitigation is applied. If the user
selects a different spectre_v2 mitigation (like spectre_v2=retpoline) a
warning is printed but sysfs will still report 'Mitigation: IBRS' or
'Mitigation: Enhanced IBRS'. This is incorrect because retbleed is not
mitigated, and IBRS is not actually set.
Fix this by choosing RETBLEED_MITIGATION_NONE in this scenario so the
kernel correctly reports the system as vulnerable to retbleed.
Signed-off-by: David Kaplan <david.kaplan@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250915134706.3201818-1-david.kaplan@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ce8370e2e6 ]
When no audit rules are in place, fanotify event results are
unconditionally dropped due to an explicit check for the existence of
any audit rules. Given this is a report from another security
sub-system, allow it to be recorded regardless of the existence of any
audit rules.
To test, install and run the fapolicyd daemon with default config. Then
as an unprivileged user, create and run a very simple binary that should
be denied. Then check for an event with
ausearch -m FANOTIFY -ts recent
Link: https://issues.redhat.com/browse/RHEL-9065
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit dd831ac822 upstream.
To prevent a potential crash in agg_dequeue (net/sched/sch_qfq.c)
when cl->qdisc->ops->peek(cl->qdisc) returns NULL, we check the return
value before using it, similar to the existing approach in sch_hfsc.c.
To avoid code duplication, the following changes are made:
1. Changed qdisc_warn_nonwc(include/net/pkt_sched.h) into a static
inline function.
2. Moved qdisc_peek_len from net/sched/sch_hfsc.c to
include/net/pkt_sched.h so that sch_qfq can reuse it.
3. Applied qdisc_peek_len in agg_dequeue to avoid crashing.
Signed-off-by: Xiang Mei <xmei5@asu.edu>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Link: https://patch.msgid.link/20250705212143.3982664-1-xmei5@asu.edu
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6f40e50ceb upstream.
handle_response() dereferences the payload as a 4-byte handle without
verifying that the declared payload size is at least 4 bytes. A malformed
or truncated message from ksmbd.mountd can lead to a 4-byte read past the
declared payload size. Validate the size before dereferencing.
This is a minimal fix to guard the initial handle read.
Fixes: 0626e6641f ("cifsd: add server handler for central processing and tranport layers")
Cc: stable@vger.kernel.org
Reported-by: Qianchang Zhao <pioooooooooip@gmail.com>
Signed-off-by: Qianchang Zhao <pioooooooooip@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 630785bfbe ]
The deprecation of the 'attr2' mount option in 6.18 wasn't entirely
successful because nobody noticed that the kernel never printed a
warning about attr2 being set in fstab if the only xfs filesystem is the
root fs; the initramfs mounts the root fs with no mount options; and the
init scripts only conveyed the fstab options by remounting the root fs.
Fix this by making it complain all the time.
Cc: stable@vger.kernel.org # v5.13
Fixes: 92cf7d3638 ("xfs: Skip repetitive warnings about mount options")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
[ Update existing xfs_fs_warn_deprecated() callers ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1c05bf6c02 upstream.
Commit 43c51bb573 ("sc16is7xx: make sure device is in suspend once
probed") permanently enabled access to the enhanced features in
sc16is7xx_probe(), and it is never disabled after that.
Therefore, remove re-enable of enhanced features in
sc16is7xx_set_baud(). This eliminates a potential useless read + write
cycle each time the baud rate is reconfigured.
Fixes: 43c51bb573 ("sc16is7xx: make sure device is in suspend once probed")
Cc: stable <stable@kernel.org>
Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Link: https://patch.msgid.link/20251006142002.177475-1-hugo@hugovil.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d518314a1f upstream.
Some MediaTek SoCs got a gated UART baud clock, which currently gets
disabled as the clk subsystem believes it would be unused. This results in
the uart freezing right after "clk: Disabling unused clocks" on those
platforms.
Request the baud clock to be prepared and enabled during probe, and to
restore run-time power management capabilities to what it was before commit
e32a83c70c ("serial: 8250-mtk: modify mtk uart power and clock
management") disable and unprepare the baud clock when suspending the UART,
prepare and enable it again when resuming it.
Fixes: e32a83c70c ("serial: 8250-mtk: modify mtk uart power and clock management")
Fixes: b6c7ff2693 ("serial: 8250_mtk: Simplify clock sequencing and runtime PM")
Cc: stable <stable@kernel.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/de5197ccc31e1dab0965cabcc11ca92e67246cf6.1758058441.git.daniel@makrotopia.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit daeb4037ad upstream.
Check the return value of reset_control_deassert() in the probe
function to prevent continuing probe when reset deassertion fails.
Previously, reset_control_deassert() was called without checking its
return value, which could lead to probe continuing even when the
device reset wasn't properly deasserted.
The fix checks the return value and returns an error with dev_err_probe()
if reset deassertion fails, providing better error handling and
diagnostics.
Fixes: acbdad8dd1 ("serial: 8250_dw: simplify optional reset handling")
Cc: stable <stable@kernel.org>
Signed-off-by: Artem Shimko <a.shimko.dev@gmail.com>
Link: https://patch.msgid.link/20251019095131.252848-1-a.shimko.dev@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2d8713f807 upstream.
When there is no port entry in the tcpci entry itself, the driver will
trigger an error message "OF: graph: no port node found in /...../typec" .
It is documented that the dts node should contain an connector entry
with ports and several port pointing to devices with usb-role-switch
property set. Only when those connector entry is missing, it should
check for port entries in the main node.
We switch the search order for looking after ports, which will avoid the
failure message while there are explicit connector entries.
Fixes: d56de8c9a1 ("usb: typec: tcpm: try to get role switch from tcpc fwnode")
Cc: stable <stable@kernel.org>
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Badhri Jagan Sridharan <badhri@google.com>
Link: https://patch.msgid.link/20251013-b4-ml-topic-tcpm-v2-1-63c9b2ab8a0b@pengutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a8cc9e5fcb upstream.
The early error path in hdm_probe() can jump to err_free_mdev before
&mdev->dev has been initialized with device_initialize(). Calling
put_device(&mdev->dev) there triggers a device core WARN and ends up
invoking kref_put(&kobj->kref, kobject_release) on an uninitialized
kobject.
In this path the private struct was only kmalloc'ed and the intended
release is effectively kfree(mdev) anyway, so free it directly instead
of calling put_device() on an uninitialized device.
This removes the WARNING and fixes the pre-initialization error path.
Fixes: 97a6f772f3 ("drivers: most: add USB adapter driver")
Cc: stable <stable@kernel.org>
Signed-off-by: Victoria Votokina <Victoria.Votokina@kaspersky.com>
Link: https://patch.msgid.link/20251010105241.4087114-3-Victoria.Votokina@kaspersky.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4b12709026 upstream.
hdm_disconnect() calls most_deregister_interface(), which eventually
unregisters the MOST interface device with device_unregister(iface->dev).
If that drops the last reference, the device core may call release_mdev()
immediately while hdm_disconnect() is still executing.
The old code also freed several mdev-owned allocations in
hdm_disconnect() and then performed additional put_device() calls.
Depending on refcount order, this could lead to use-after-free or
double-free when release_mdev() ran (or when unregister paths also
performed puts).
Fix by moving the frees of mdev-owned allocations into release_mdev(),
so they happen exactly once when the device is truly released, and by
dropping the extra put_device() calls in hdm_disconnect() that are
redundant after device_unregister() and most_deregister_interface().
This addresses the KASAN slab-use-after-free reported by syzbot in
hdm_disconnect(). See report and stack traces in the bug link below.
Reported-by: syzbot+916742d5d24f6c254761@syzkaller.appspotmail.com
Cc: stable <stable@kernel.org>
Closes: https://syzkaller.appspot.com/bug?extid=916742d5d24f6c254761
Fixes: 97a6f772f3 ("drivers: most: add USB adapter driver")
Signed-off-by: Victoria Votokina <Victoria.Votokina@kaspersky.com>
Link: https://patch.msgid.link/20251010105241.4087114-2-Victoria.Votokina@kaspersky.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dbdf2a7feb upstream.
Between Rust 1.79 and 1.86, under `CONFIG_RUST_KERNEL_DOCTESTS=y`,
`objtool` may report:
rust/doctests_kernel_generated.o: warning: objtool:
rust_doctest_kernel_alloc_kbox_rs_13() falls through to next
function rust_doctest_kernel_alloc_kvec_rs_0()
(as well as in rust_doctest_kernel_alloc_kvec_rs_0) due to calls to the
`noreturn` symbol:
core::option::expect_failed
from code added in commits 779db37373 ("rust: alloc: kvec: implement
AsPageIter for VVec") and 671618432f ("rust: alloc: kbox: implement
AsPageIter for VBox").
Thus add the mangled one to the list so that `objtool` knows it is
actually `noreturn`.
This can be reproduced as well in other versions by tweaking the code,
such as the latest stable Rust (1.90.0).
Stable does not have code that triggers this, but it could have it in
the future. Downstream forks could too. Thus tag it for backport.
See commit 56d680dd23 ("objtool/rust: list `noreturn` Rust functions")
for more details.
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Cc: stable@vger.kernel.org # Needed in 6.12.y and later.
Link: https://patch.msgid.link/20251020020714.2511718-1-ojeda@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d90eeb8ecd upstream.
There are no scenarios where a weak increment is invalid on binder_node.
The only possible case where it could be invalid is if the kernel
delivers BR_DECREFS to the process that owns the node, and then
increments the weak refcount again, effectively "reviving" a dead node.
However, that is not possible: when the BR_DECREFS command is delivered,
the kernel removes and frees the binder_node. The fact that you were
able to call binder_inc_node_nilocked() implies that the node is not yet
destroyed, which implies that BR_DECREFS has not been delivered to
userspace, so incrementing the weak refcount is valid.
Note that it's currently possible to trigger this condition if the owner
calls BINDER_THREAD_EXIT while node->has_weak_ref is true. This causes
BC_INCREFS on binder_ref instances to fail when they should not.
Cc: stable@vger.kernel.org
Fixes: 457b9a6f09 ("Staging: android: add binder driver")
Reported-by: Yu-Ting Tseng <yutingtseng@google.com>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20251015-binder-weak-inc-v1-1-7914b092c371@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f3d12ec847 upstream.
DbC may add 1024 bogus bytes to the beginneing of the receiving endpoint
if DbC hw triggers a STALL event before any Transfer Blocks (TRBs) for
incoming data are queued, but driver handles the event after it queued
the TRBs.
This is possible as xHCI DbC hardware may trigger spurious STALL transfer
events even if endpoint is empty. The STALL event contains a pointer
to the stalled TRB, and "remaining" untransferred data length.
As there are no TRBs queued yet the STALL event will just point to first
TRB position of the empty ring, with '0' bytes remaining untransferred.
DbC driver is polling for events, and may not handle the STALL event
before /dev/ttyDBC0 is opened and incoming data TRBs are queued.
The DbC event handler will now assume the first queued TRB (length 1024)
has stalled with '0' bytes remaining untransferred, and copies the data
This race situation can be practically mitigated by making sure the event
handler handles all pending transfer events when DbC reaches configured
state, and only then create dev/ttyDbC0, and start queueing transfers.
The event handler can this way detect the STALL events on empty rings
and discard them before any transfers are queued.
This does in practice solve the issue, but still leaves a small possible
gap for the race to trigger.
We still need a way to distinguish spurious STALLs on empty rings with '0'
bytes remaing, from actual STALL events with all bytes transmitted.
Cc: stable <stable@kernel.org>
Fixes: dfba2174dc ("usb: xhci: Add DbC support in xHCI driver")
Tested-by: Łukasz Bartosik <ukaszb@chromium.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2bbd38fcd2 upstream.
DbC is currently only enabled back if it's in configured state during
suspend.
If system is suspended after DbC is enabled, but before the device is
properly enumerated by the host, then DbC would not be enabled back in
resume.
Always enable DbC back in resume if it's suspended in enabled,
connected, or configured state
Cc: stable <stable@kernel.org>
Fixes: dfba2174dc ("usb: xhci: Add DbC support in xHCI driver")
Tested-by: Łukasz Bartosik <ukaszb@chromium.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 37b9dd0d11 upstream.
Drop the check on the maximum transfer length in Raw Gadget for both
control and non-control transfers.
Limiting the transfer length causes a problem with emulating USB devices
whose full configuration descriptor exceeds PAGE_SIZE in length.
Overall, there does not appear to be any reason to enforce any kind of
transfer length limit on the Raw Gadget side for either control or
non-control transfers, so let's just drop the related check.
Cc: stable <stable@kernel.org>
Fixes: f2c2e71764 ("usb: gadget: add raw-gadget interface")
Signed-off-by: Andrey Konovalov <andreyknvl@gmail.com>
Link: https://patch.msgid.link/a6024e8eab679043e9b8a5defdb41c4bda62f02b.1761085528.git.andreyknvl@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit a94e065726 upstream.
The current approach is a bit naive, and hence calls the time querying
way too often. Only start the "doing work" timer when there's actual
work to do, and then use that information to terminate (and account) the
work time once done. This greatly reduces the frequency of these calls,
when they cannot have changed anyway.
Running a basic random reader that is setup to use SQPOLL, a profile
before this change shows these as the top cycle consumers:
+ 32.60% iou-sqp-1074 [kernel.kallsyms] [k] thread_group_cputime_adjusted
+ 19.97% iou-sqp-1074 [kernel.kallsyms] [k] thread_group_cputime
+ 12.20% io_uring io_uring [.] submitter_uring_fn
+ 4.13% iou-sqp-1074 [kernel.kallsyms] [k] getrusage
+ 2.45% iou-sqp-1074 [kernel.kallsyms] [k] io_submit_sqes
+ 2.18% iou-sqp-1074 [kernel.kallsyms] [k] __pi_memset_generic
+ 2.09% iou-sqp-1074 [kernel.kallsyms] [k] cputime_adjust
and after this change, top of profile looks as follows:
+ 36.23% io_uring io_uring [.] submitter_uring_fn
+ 23.26% iou-sqp-819 [kernel.kallsyms] [k] io_sq_thread
+ 10.14% iou-sqp-819 [kernel.kallsyms] [k] io_sq_tw
+ 6.52% iou-sqp-819 [kernel.kallsyms] [k] tctx_task_work_run
+ 4.82% iou-sqp-819 [kernel.kallsyms] [k] nvme_submit_cmds.part.0
+ 2.91% iou-sqp-819 [kernel.kallsyms] [k] io_submit_sqes
[...]
0.02% iou-sqp-819 [kernel.kallsyms] [k] cputime_adjust
where it's spending the cycles on things that actually matter.
Reported-by: Fengnan Chang <changfengnan@bytedance.com>
Cc: stable@vger.kernel.org
Fixes: 3fcb9d1720 ("io_uring/sqpoll: statistics of the true utilization of sq threads")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 8ac9b0d33e upstream.
getrusage() does a lot more than what the SQPOLL accounting needs, the
latter only cares about (and uses) the stime. Rather than do a full
RUSAGE_SELF summation, just query the used stime instead.
Cc: stable@vger.kernel.org
Fixes: 3fcb9d1720 ("io_uring/sqpoll: statistics of the true utilization of sq threads")
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 53abe3e1c1 ]
Clang is not happy with set but unused variable (this is visible
with `make W=1` build:
kernel/sched/sched.h:3744:18: error: variable 'cpumask' set but not used [-Werror,-Wunused-but-set-variable]
It seems like the variable was never used along with the assignment
that does not have side effects as far as I can see. Remove those
altogether.
Fixes: 223baf9d17 ("sched: Fix performance regression introduced by mm_cid")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c5efc6a0b3 ]
The __must_hold annotation references &req->ctx->uring_lock, but req
is not in scope in io_install_fixed_file. This change updates the
annotation to reference the correct ctx->uring_lock.
improving code clarity.
Fixes: f110ed8498 ("io_uring: split out fixed file installation and removal")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4c4e6ea4a1 ]
The generic_handle_domain_irq() function resolves the hardware IRQ
internally. The driver performed a duplicative mapping by calling
irq_find_mapping() first, which could lead to an RCU stall.
Delete the redundant irq_find_mapping() call and pass the hardware IRQ
directly to generic_handle_domain_irq().
Fixes: c5a4b6fd31 ("gpio: Add support for Intel LJCA USB GPIO driver")
Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn>
Link: https://lore.kernel.org/r/20251023070231.1305-1-vulab@iscas.ac.cn
[Bartosz: remove unused variable]
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 81ccca3121 ]
sock_{send,recv}msg() internally calls security_socket_{send,recv}msg(),
which does security checks (e.g. SELinux) for socket access against the
current task. However, _sock_xmit() in drivers/block/nbd.c may be called
indirectly from a userspace syscall, where the NBD socket access would
be incorrectly checked against the calling userspace task (which simply
tries to read/write a file that happens to reside on an NBD device).
To fix this, temporarily override creds to kernel ones before calling
the sock_*() functions. This allows the security modules to recognize
this as internal access by the kernel, which will normally be allowed.
A way to trigger the issue is to do the following (on a system with
SELinux set to enforcing):
### Create nbd device:
truncate -s 256M /tmp/testfile
nbd-server localhost:10809 /tmp/testfile
### Connect to the nbd server:
nbd-client localhost
### Create mdraid array
mdadm --create -l 1 -n 2 /dev/md/testarray /dev/nbd0 missing
After these steps, assuming the SELinux policy doesn't allow the
unexpected access pattern, errors will be visible on the kernel console:
[ 142.204243] nbd0: detected capacity change from 0 to 524288
[ 165.189967] md: async del_gendisk mode will be removed in future, please upgrade to mdadm-4.5+
[ 165.252299] md/raid1:md127: active with 1 out of 2 mirrors
[ 165.252725] md127: detected capacity change from 0 to 522240
[ 165.255434] block nbd0: Send control failed (result -13)
[ 165.255718] block nbd0: Request send failed, requeueing
[ 165.256006] block nbd0: Dead connection, failed to find a fallback
[ 165.256041] block nbd0: Receive control failed (result -32)
[ 165.256423] block nbd0: shutting down sockets
[ 165.257196] I/O error, dev nbd0, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[ 165.257736] Buffer I/O error on dev md127, logical block 0, async page read
[ 165.258263] I/O error, dev nbd0, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[ 165.259376] Buffer I/O error on dev md127, logical block 0, async page read
[ 165.259920] I/O error, dev nbd0, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[ 165.260628] Buffer I/O error on dev md127, logical block 0, async page read
[ 165.261661] ldm_validate_partition_table(): Disk read failed.
[ 165.262108] I/O error, dev nbd0, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[ 165.262769] Buffer I/O error on dev md127, logical block 0, async page read
[ 165.263697] I/O error, dev nbd0, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[ 165.264412] Buffer I/O error on dev md127, logical block 0, async page read
[ 165.265412] I/O error, dev nbd0, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[ 165.265872] Buffer I/O error on dev md127, logical block 0, async page read
[ 165.266378] I/O error, dev nbd0, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[ 165.267168] Buffer I/O error on dev md127, logical block 0, async page read
[ 165.267564] md127: unable to read partition table
[ 165.269581] I/O error, dev nbd0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[ 165.269960] Buffer I/O error on dev nbd0, logical block 0, async page read
[ 165.270316] I/O error, dev nbd0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[ 165.270913] Buffer I/O error on dev nbd0, logical block 0, async page read
[ 165.271253] I/O error, dev nbd0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[ 165.271809] Buffer I/O error on dev nbd0, logical block 0, async page read
[ 165.272074] ldm_validate_partition_table(): Disk read failed.
[ 165.272360] nbd0: unable to read partition table
[ 165.289004] ldm_validate_partition_table(): Disk read failed.
[ 165.289614] nbd0: unable to read partition table
The corresponding SELinux denial on Fedora/RHEL will look like this
(assuming it's not silenced):
type=AVC msg=audit(1758104872.510:116): avc: denied { write } for pid=1908 comm="mdadm" laddr=::1 lport=32772 faddr=::1 fport=10809 scontext=system_u:system_r:mdadm_t:s0-s0:c0.c1023 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=tcp_socket permissive=0
The respective backtrace looks like this:
@security[mdadm, -13,
handshake_exit+221615650
handshake_exit+221615650
handshake_exit+221616465
security_socket_sendmsg+5
sock_sendmsg+106
handshake_exit+221616150
sock_sendmsg+5
__sock_xmit+162
nbd_send_cmd+597
nbd_handle_cmd+377
nbd_queue_rq+63
blk_mq_dispatch_rq_list+653
__blk_mq_do_dispatch_sched+184
__blk_mq_sched_dispatch_requests+333
blk_mq_sched_dispatch_requests+38
blk_mq_run_hw_queue+239
blk_mq_dispatch_plug_list+382
blk_mq_flush_plug_list.part.0+55
__blk_flush_plug+241
__submit_bio+353
submit_bio_noacct_nocheck+364
submit_bio_wait+84
__blkdev_direct_IO_simple+232
blkdev_read_iter+162
vfs_read+591
ksys_read+95
do_syscall_64+92
entry_SYSCALL_64_after_hwframe+120
]: 1
The issue has started to appear since commit 060406c61c ("block: add
plug while submitting IO").
Cc: Ming Lei <ming.lei@redhat.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2348878
Fixes: 060406c61c ("block: add plug while submitting IO")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8dcc66ad37 ]
Handling of errors when reading status, temperature, and humidity returns
the error number as negative attribute value. Fix it up by returning
the error as return value.
Fixes: a0ac418c60 ("hwmon: (sht3x) convert some of sysfs interface to hwmon")
Cc: JuenKit Yip <JuenKit_Yip@hotmail.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b7776a802f ]
Resolve this smatch warning:
arch/riscv/kernel/sys_hwprobe.c:50 hwprobe_arch_id() error: uninitialized symbol 'cpu_id'.
This could happen if hwprobe_arch_id() was called with a key ID of
something other than MVENDORID, MIMPID, and MARCHID. This does not
happen in the current codebase. The only caller of hwprobe_arch_id()
is a function that only passes one of those three key IDs.
For the sake of reducing static analyzer warning noise, and in the
unlikely event that hwprobe_arch_id() is someday called with some
other key ID, validate hwprobe_arch_id()'s input to ensure that
'cpu_id' is always initialized before use.
Fixes: ea3de9ce8a ("RISC-V: Add a syscall for HW probing")
Cc: Evan Green <evan@rivosinc.com>
Signed-off-by: Paul Walmsley <pjw@kernel.org>
Link: https://lore.kernel.org/r/cf5a13ec-19d0-9862-059b-943f36107bf3@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ca525d53f9 ]
The pgprot_dmacoherent() is used when allocating memory for
non-coherent devices and by default pgprot_dmacoherent() is
same as pgprot_noncached() unless architecture overrides it.
Currently, there is no pgprot_dmacoherent() definition for
RISC-V hence non-coherent device memory is being mapped as
IO thereby making CPU access to such memory slow.
Define pgprot_dmacoherent() to be same as pgprot_writecombine()
for RISC-V so that CPU access non-coherent device memory as
NOCACHE which is better than accessing it as IO.
Fixes: ff689fd21c ("riscv: add RISC-V Svpbmt extension support")
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Tested-by: Han Gao <rabenda.cn@gmail.com>
Tested-by: Guo Ren (Alibaba DAMO Academy) <guoren@kernel.org>
Link: https://lore.kernel.org/r/20250820152316.1012757-1-apatel@ventanamicro.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0b7d9b25e4 ]
Attaching UBI on the flash with more than one plane per lun will lead to
the following error:
[ 2.980989] spi-nand spi0.0: Micron SPI NAND was found.
[ 2.986309] spi-nand spi0.0: 256 MiB, block size: 128 KiB, page size: 2048, OOB size: 128
[ 2.994978] 2 fixed-partitions partitions found on MTD device spi0.0
[ 3.001350] Creating 2 MTD partitions on "spi0.0":
[ 3.006159] 0x000000000000-0x000000020000 : "bl2"
[ 3.011663] 0x000000020000-0x000010000000 : "ubi"
...
[ 6.391748] ubi0: attaching mtd1
[ 6.412545] ubi0 error: ubi_attach: PEB 0 contains corrupted VID header, and the data does not contain all 0xFF
[ 6.422677] ubi0 error: ubi_attach: this may be a non-UBI PEB or a severe VID header corruption which requires manual inspection
[ 6.434249] Volume identifier header dump:
[ 6.438349] magic 55424923
[ 6.441482] version 1
[ 6.444007] vol_type 0
[ 6.446539] copy_flag 0
[ 6.449068] compat 0
[ 6.451594] vol_id 0
[ 6.454120] lnum 1
[ 6.456651] data_size 4096
[ 6.459442] used_ebs 1061644134
[ 6.462748] data_pad 0
[ 6.465274] sqnum 0
[ 6.467805] hdr_crc 61169820
[ 6.470943] Volume identifier header hexdump:
[ 6.475308] hexdump of PEB 0 offset 4096, length 126976
[ 6.507391] ubi0 warning: ubi_attach: valid VID header but corrupted EC header at PEB 4
[ 6.515415] ubi0 error: ubi_compare_lebs: unsupported on-flash UBI format
[ 6.522222] ubi0 error: ubi_attach_mtd_dev: failed to attach mtd1, error -22
[ 6.529294] UBI error: cannot attach mtd1
Non dirmap reading works good. Looking to spi_mem_no_dirmap_read() code we'll see:
static ssize_t spi_mem_no_dirmap_read(struct spi_mem_dirmap_desc *desc,
u64 offs, size_t len, void *buf)
{
struct spi_mem_op op = desc->info.op_tmpl;
int ret;
// --- see here ---
op.addr.val = desc->info.offset + offs;
//-----------------
op.data.buf.in = buf;
op.data.nbytes = len;
ret = spi_mem_adjust_op_size(desc->mem, &op);
if (ret)
return ret;
ret = spi_mem_exec_op(desc->mem, &op);
if (ret)
return ret;
return op.data.nbytes;
}
The similar happens for spi_mem_no_dirmap_write(). Thus the address
passed to the flash should take in the account the value of
desc->info.offset.
This patch fix dirmap reading/writing of flashes with more than one
plane per lun.
Fixes: a403997c12 ("spi: airoha: add SPI-NAND Flash controller driver")
Signed-off-by: Mikhail Kshevetskiy <mikhail.kshevetskiy@iopsys.eu>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://patch.msgid.link/20251012121707.2296160-7-mikhail.kshevetskiy@iopsys.eu
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit edd2e261b1 ]
Booting without this patch and disabled dirmap support results in
[ 2.980719] spi-nand spi0.0: Micron SPI NAND was found.
[ 2.986040] spi-nand spi0.0: 256 MiB, block size: 128 KiB, page size: 2048, OOB size: 128
[ 2.994709] 2 fixed-partitions partitions found on MTD device spi0.0
[ 3.001075] Creating 2 MTD partitions on "spi0.0":
[ 3.005862] 0x000000000000-0x000000020000 : "bl2"
[ 3.011272] 0x000000020000-0x000010000000 : "ubi"
...
[ 6.195594] ubi0: attaching mtd1
[ 13.338398] ubi0: scanning is finished
[ 13.342188] ubi0 error: ubi_read_volume_table: the layout volume was not found
[ 13.349784] ubi0 error: ubi_attach_mtd_dev: failed to attach mtd1, error -22
[ 13.356897] UBI error: cannot attach mtd1
If dirmap is disabled or not supported in the spi driver, the dirmap requests
will be executed via exec_op() handler. Thus, if the hardware supports
dual/quad spi modes, then corresponding requests will be sent to exec_op()
handler. Current driver does not support such requests, so error is arrised.
As result the flash can't be read/write.
This patch adds support of dual and quad wires spi modes to exec_op() handler.
Fixes: a403997c12 ("spi: airoha: add SPI-NAND Flash controller driver")
Signed-off-by: Mikhail Kshevetskiy <mikhail.kshevetskiy@iopsys.eu>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://patch.msgid.link/20251012121707.2296160-4-mikhail.kshevetskiy@iopsys.eu
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 20b93a0088 ]
The SCMI_XFER_FLAG_IS_RAW flag was being cleared prematurely in
scmi_xfer_raw_put() before the transfer completion was properly
acknowledged by the raw message handlers.
Move the clearing of SCMI_XFER_FLAG_IS_RAW and SCMI_XFER_FLAG_CHAN_SET
from scmi_xfer_raw_put() to __scmi_xfer_put() to ensure the flags remain
set throughout the entire raw message processing pipeline until the
transfer is returned to the free pool.
Fixes: 3095a3e25d ("firmware: arm_scmi: Add xfer helpers to provide raw access")
Suggested-by: Cristian Marussi <cristian.marussi@arm.com>
Signed-off-by: Artem Shimko <a.shimko.dev@gmail.com>
Reviewed-by: Cristian Marussi <cristian.marussi@arm.com>
Message-Id: <20251008091057.1969260-1-a.shimko.dev@gmail.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2290ab43b9 ]
When the SCMI debug subsystem fails to initialize, the related debug root
will be missing, and the underlying descriptor will be NULL.
Handle this fault condition in the SCMI debug helpers that maintain
metrics counters.
Fixes: 0b3d48c472 ("firmware: arm_scmi: Track basic SCMI communication debug metrics")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Message-Id: <20251014115346.2391418-1-cristian.marussi@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 278b6cabf1 ]
Add missing address-cells 0 to GIC interrupt node to silence W=1
warning:
bcm2712.dtsi:494.4-497.31: Warning (interrupt_map): /axi/pcie@1000110000:interrupt-map:
Missing property '#address-cells' in node /soc@107c000000/interrupt-controller@7fff9000, using 0 as fallback
Value '0' is correct because:
1. GIC interrupt controller does not have children,
2. interrupt-map property (in PCI node) consists of five components and
the fourth component "parent unit address", which size is defined by
'#address-cells' of the node pointed to by the interrupt-parent
component, is not used (=0)
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20250822133407.312505-2-krzysztof.kozlowski@linaro.org
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Stable-dep-of: aa960b5976 ("arm64: dts: broadcom: bcm2712: Define VGIC interrupt")
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 841a8bfcba upstream.
When performing memory error injection on a THP (Transparent Huge Page)
mapped to userspace on an x86 server, the kernel panics with the following
trace. The expected behavior is to terminate the affected process instead
of panicking the kernel, as the x86 Machine Check code can recover from an
in-userspace #MC.
mce: [Hardware Error]: CPU 0: Machine Check Exception: f Bank 3: bd80000000070134
mce: [Hardware Error]: RIP 10:<ffffffff8372f8bc> {memchr_inv+0x4c/0xf0}
mce: [Hardware Error]: TSC afff7bbff88a ADDR 1d301b000 MISC 80 PPIN 1e741e77539027db
mce: [Hardware Error]: PROCESSOR 0:d06d0 TIME 1758093249 SOCKET 0 APIC 0 microcode 80000320
mce: [Hardware Error]: Run the above through 'mcelog --ascii'
mce: [Hardware Error]: Machine check: Data load in unrecoverable area of kernel
Kernel panic - not syncing: Fatal local machine check
The root cause of this panic is that handling a memory failure triggered
by an in-userspace #MC necessitates splitting the THP. The splitting
process employs a mechanism, implemented in
try_to_map_unused_to_zeropage(), which reads the pages in the THP to
identify zero-filled pages. However, reading the pages in the THP results
in a second in-kernel #MC, occurring before the initial memory_failure()
completes, ultimately leading to a kernel panic. See the kernel panic
call trace on the two #MCs.
First Machine Check occurs // [1]
memory_failure() // [2]
try_to_split_thp_page()
split_huge_page()
split_huge_page_to_list_to_order()
__folio_split() // [3]
remap_page()
remove_migration_ptes()
remove_migration_pte()
try_to_map_unused_to_zeropage() // [4]
memchr_inv() // [5]
Second Machine Check occurs // [6]
Kernel panic
[1] Triggered by accessing a hardware-poisoned THP in userspace, which is
typically recoverable by terminating the affected process.
[2] Call folio_set_has_hwpoisoned() before try_to_split_thp_page().
[3] Pass the RMP_USE_SHARED_ZEROPAGE remap flag to remap_page().
[4] Try to map the unused THP to zeropage.
[5] Re-access pages in the hw-poisoned THP in the kernel.
[6] Triggered in-kernel, leading to a panic kernel.
In Step[2], memory_failure() sets the poisoned flag on the page in the THP
by TestSetPageHWPoison() before calling try_to_split_thp_page().
As suggested by David Hildenbrand, fix this panic by not accessing to the
poisoned page in the THP during zeropage identification, while continuing
to scan unaffected pages in the THP for possible zeropage mapping. This
prevents a second in-kernel #MC that would cause kernel panic in Step[4].
Thanks to Andrew Zaborowski for his initial work on fixing this issue.
Link: https://lkml.kernel.org/r/20251015064926.1887643-1-qiuxu.zhuo@intel.com
Link: https://lkml.kernel.org/r/20251011075520.320862-1-qiuxu.zhuo@intel.com
Fixes: b1f202060a ("mm: remap unused subpages to shared zeropage when splitting isolated thp")
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Reported-by: Farrah Chen <farrah.chen@intel.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Tested-by: Farrah Chen <farrah.chen@intel.com>
Tested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Acked-by: Lance Yang <lance.yang@linux.dev>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jiaqi Yan <jiaqiyan@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5370c31e84 upstream.
Ensure the TX descriptor type fields are published in a safe order so the
DMA engine never begins processing a descriptor chain before all descriptor
fields are fully initialised.
For multi-descriptor transmits the driver writes DT_FEND into the last
descriptor and DT_FSTART into the first. The DMA engine begins processing
when it observes DT_FSTART. Move the dma_wmb() barrier so it executes
immediately after DT_FEND and immediately before writing DT_FSTART
(and before DT_FSINGLE in the single-descriptor case). This guarantees
that all prior CPU writes to the descriptor memory are visible to the
device before DT_FSTART is seen.
This avoids a situation where compiler/CPU reordering could publish
DT_FSTART ahead of DT_FEND or other descriptor fields, allowing the DMA to
start on a partially initialised chain and causing corrupted transmissions
or TX timeouts. Such a failure was observed on RZ/G2L with an RT kernel as
transmit queue timeouts and device resets.
Fixes: 2f45d1902a ("ravb: minimize TX data copying")
Cc: stable@vger.kernel.org
Co-developed-by: Fabrizio Castro <fabrizio.castro.jz@renesas.com>
Signed-off-by: Fabrizio Castro <fabrizio.castro.jz@renesas.com>
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://patch.msgid.link/20251017151830.171062-4-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 75cea9860a upstream.
TX frames aren't padded and unknown memory is sent into the ether.
Theoretically, it isn't even guaranteed that the extra memory exists
and can be sent out, which could cause further problems. In practice,
I found that plenty of tailroom exists in the skb itself (in my test
with ping at least) and skb_padto() easily succeeds, so use it here.
In the event of -ENOMEM drop the frame like other drivers do.
The use of one more padding byte instead of a USB zero-length packet
is retained to avoid regression. I have a dodgy Etron xHCI controller
which doesn't seem to support sending ZLPs at all.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251014203528.3f9783c4.michal.pecio@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f7c877e753 upstream.
Syzbot reported a potential lock inversion deadlock between
vsock_register_mutex and sk_lock-AF_VSOCK when vsock_linger() is called.
The issue was introduced by commit 687aa0c558 ("vsock: Fix
transport_* TOCTOU") which added vsock_register_mutex locking in
vsock_assign_transport() around the transport->release() call, that can
call vsock_linger(). vsock_assign_transport() can be called with sk_lock
held. vsock_linger() calls sk_wait_event() that temporarily releases and
re-acquires sk_lock. During this window, if another thread hold
vsock_register_mutex while trying to acquire sk_lock, a circular
dependency is created.
Fix this by releasing vsock_register_mutex before calling
transport->release() and vsock_deassign_transport(). This is safe
because we don't need to hold vsock_register_mutex while releasing the
old transport, and we ensure the new transport won't disappear by
obtaining a module reference first via try_module_get().
Reported-by: syzbot+10e35716f8e4929681fa@syzkaller.appspotmail.com
Tested-by: syzbot+10e35716f8e4929681fa@syzkaller.appspotmail.com
Fixes: 687aa0c558 ("vsock: Fix transport_* TOCTOU")
Cc: mhal@rbox.co
Cc: stable@vger.kernel.org
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251021121718.137668-1-sgarzare@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bf5570590a upstream.
MIPS Malta platform code registers the PCI southbridge legacy port I/O
PS/2 keyboard range as a standard resource marked as busy. It prevents
the i8042 driver from registering as it fails to claim the resource in
a call to i8042_platform_init(). Consequently PS/2 keyboard and mouse
devices cannot be used with this platform.
Fix the issue by removing the busy marker from the standard reservation,
making the driver register successfully:
serio: i8042 KBD port at 0x60,0x64 irq 1
serio: i8042 AUX port at 0x60,0x64 irq 12
and the resource show up as expected among the legacy devices:
00000000-00ffffff : MSC PCI I/O
00000000-0000001f : dma1
00000020-00000021 : pic1
00000040-0000005f : timer
00000060-0000006f : keyboard
00000060-0000006f : i8042
00000070-00000077 : rtc0
00000080-0000008f : dma page reg
000000a0-000000a1 : pic2
000000c0-000000df : dma2
[...]
If the i8042 driver has not been configured, then the standard resource
will remain there preventing any conflicting dynamic assignment of this
PCI port I/O address range.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/alpine.DEB.2.21.2510211919240.8377@angie.orcam.me.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8e93ac51e4 upstream.
Since the commit c1f3f9797c ("can: netlink: can_changelink(): fix NULL
pointer deref of struct can_priv::do_set_mode"), the automatic restart
delay can only be set for devices that implement the restart handler struct
can_priv::do_set_mode. As it makes no sense to configure a automatic
restart for devices that doesn't support it.
However, since systemd commit 13ce5d4632e3 ("network/can: properly handle
CAN.RestartSec=0") [1], systemd-networkd correctly handles a restart delay
of "0" (i.e. the restart is disabled). Which means that a disabled restart
is always configured in the kernel.
On systems with both changes active this causes that CAN interfaces that
don't implement a restart handler cannot be brought up by systemd-networkd.
Solve this problem by allowing a delay of "0" to be configured, even if the
device does not implement a restart handler.
[1] 13ce5d4632
Cc: stable@vger.kernel.org
Cc: Andrei Lalaev <andrey.lalaev@gmail.com>
Reported-by: Marc Kleine-Budde <mkl@pengutronix.de>
Closes: https://lore.kernel.org/all/20251020-certain-arrogant-vole-of-sunshine-141841-mkl@pengutronix.de
Fixes: c1f3f9797c ("can: netlink: can_changelink(): fix NULL pointer deref of struct can_priv::do_set_mode")
Link: https://patch.msgid.link/20251020-netlink-fix-restart-v1-1-3f53c7f8520b@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7f434e1d9a upstream.
If two competing threads enter alloc_slab_obj_exts(), and the one that
allocates the vector wins the cmpxchg(), the other thread that failed
allocation mistakenly assumes that slab->obj_exts is still empty due to
its own allocation failure. This will then trigger warnings with
CONFIG_MEM_ALLOC_PROFILING_DEBUG checks in the subsequent free path.
Therefore, let's check the result of cmpxchg() to see if marking the
allocation as failed was successful. If it wasn't, check whether the
winning side has succeeded its allocation (it might have been also
marking it as failed) and if yes, return success.
Suggested-by: Harry Yoo <harry.yoo@oracle.com>
Fixes: f7381b9116 ("slab: mark slab->obj_exts allocation failures unconditionally")
Cc: <stable@vger.kernel.org>
Signed-off-by: Hao Ge <gehao@kylinos.cn>
Link: https://patch.msgid.link/20251023143313.1327968-1-hao.ge@linux.dev
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6ed8bfd24c upstream.
If two competing threads enter alloc_slab_obj_exts() and one of them
fails to allocate the object extension vector, it might override the
valid slab->obj_exts allocated by the other thread with
OBJEXTS_ALLOC_FAIL. This will cause the thread that lost this race and
expects a valid pointer to dereference a NULL pointer later on.
Update slab->obj_exts atomically using cmpxchg() to avoid
slab->obj_exts overrides by racing threads.
Thanks for Vlastimil and Suren's help with debugging.
Fixes: f7381b9116 ("slab: mark slab->obj_exts allocation failures unconditionally")
Cc: <stable@vger.kernel.org>
Suggested-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Hao Ge <gehao@kylinos.cn>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Link: https://patch.msgid.link/20251021010353.1187193-1-hao.ge@linux.dev
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 10fad40122 upstream.
It is reported that commit 85975daeaa ("cpuidle: menu: Avoid discarding
useful information") led to a performance regression on Intel Jasper Lake
systems because it reduced the time spent by CPUs in idle state C7 which
is correlated to the maximum frequency the CPUs can get to because of an
average running power limit [1].
Before that commit, get_typical_interval() would have returned UINT_MAX
whenever it had been unable to make a high-confidence prediction which
had led to selecting the deepest available idle state too often and
both power and performance had been inadequate as a result of that on
some systems. However, this had not been a problem on systems with
relatively aggressive average running power limits, like the Jasper Lake
systems in question, because on those systems it was compensated by the
ability to run CPUs faster.
It was addressed by causing get_typical_interval() to return a number
based on the recent idle duration information available to it even if it
could not make a high-confidence prediction, but that clearly did not
take the possible correlation between idle power and available CPU
capacity into account.
For this reason, revert most of the changes made by commit 85975daeaa,
except for one cosmetic cleanup, and add a comment explaining the
rationale for returning UINT_MAX from get_typical_interval() when it
is unable to make a high-confidence prediction.
Fixes: 85975daeaa ("cpuidle: menu: Avoid discarding useful information")
Closes: https://lore.kernel.org/linux-pm/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/ [1]
Reported-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/3663603.iIbC2pHGDl@rafael.j.wysocki
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f477af0cfa upstream.
On a filesystem with parent pointers, xchk_nlinks_collect_dir walks both
the directory entries (data fork) and the parent pointers (attr fork) to
determine the correct link count. Unfortunately I forgot to update the
lock mode logic to handle the case of a directory whose attr fork is in
btree format and has not yet been loaded *and* whose data fork doesn't
need loading.
This leads to a bunch of assertions from xfs/286 in xfs_iread_extents
because we only took ILOCK_SHARED, not ILOCK_EXCL. You'd need the rare
happenstance of a directory with a large number of non-pptr extended
attributes set and enough memory pressure to cause the directory to be
evicted and partially reloaded from disk.
I /think/ this only started in 6.18-rc1 because I've started seeing OOM
errors with the maple tree slab using 70% of memory, and this didn't
happen in 6.17. Yay dynamic systems!
Cc: stable@vger.kernel.org # v6.10
Fixes: 77ede5f44b ("xfs: walk directory parent pointers to determine backref count")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2eead19334 upstream.
Fix incorrect use of PTR_ERR_OR_ZERO() in topology_parse_cpu_capacity()
which causes the code to proceed with NULL clock pointers. The current
logic uses !PTR_ERR_OR_ZERO(cpu_clk) which evaluates to true for both
valid pointers and NULL, leading to potential NULL pointer dereference
in clk_get_rate().
Per include/linux/err.h documentation, PTR_ERR_OR_ZERO(ptr) returns:
"The error code within @ptr if it is an error pointer; 0 otherwise."
This means PTR_ERR_OR_ZERO() returns 0 for both valid pointers AND NULL
pointers. Therefore !PTR_ERR_OR_ZERO(cpu_clk) evaluates to true (proceed)
when cpu_clk is either valid or NULL, causing clk_get_rate(NULL) to be
called when of_clk_get() returns NULL.
Replace with !IS_ERR_OR_NULL(cpu_clk) which only proceeds for valid
pointers, preventing potential NULL pointer dereference in clk_get_rate().
Cc: stable <stable@kernel.org>
Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Fixes: b8fe128dad ("arch_topology: Adjust initial CPU capacities with current freq")
Link: https://patch.msgid.link/20250923174308.1771906-1-kaushlendra.kumar@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 03521c892b upstream.
Commit 370645f41e ("dma-mapping: force bouncing if the kmalloc() size is
not cache-line-aligned") introduced DMA_BOUNCE_UNALIGNED_KMALLOC feature
and permitted architecture specific code configure kmalloc slabs with
sizes smaller than the value of dma_get_cache_alignment().
When that feature is enabled, the physical address of some small
kmalloc()-ed buffers might be not aligned to the CPU cachelines, thus not
really suitable for typical DMA. To properly handle that case a SWIOTLB
buffer bouncing is used, so no CPU cache corruption occurs. When that
happens, there is no point reporting a false-positive DMA-API warning that
the buffer is not properly aligned, as this is not a client driver fault.
[m.szyprowski@samsung.com: replace is_swiotlb_allocated() with is_swiotlb_active(), per Catalin]
Link: https://lkml.kernel.org/r/20251010173009.3916215-1-m.szyprowski@samsung.com
Link: https://lkml.kernel.org/r/20251009141508.2342138-1-m.szyprowski@samsung.com
Fixes: 370645f41e ("dma-mapping: force bouncing if the kmalloc() size is not cache-line-aligned")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Inki Dae <m.szyprowski@samsung.com>
Cc: Robin Murohy <robin.murphy@arm.com>
Cc: "Isaac J. Manjarres" <isaacmanjarres@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a7c4bb43bf upstream.
Calling intotify_show_fdinfo() on fd watching an overlayfs inode, while
the overlayfs is being unmounted, can lead to dereferencing NULL ptr.
This issue was found by syzkaller.
Race Condition Diagram:
Thread 1 Thread 2
-------- --------
generic_shutdown_super()
shrink_dcache_for_umount
sb->s_root = NULL
|
| vfs_read()
| inotify_fdinfo()
| * inode get from mark *
| show_mark_fhandle(m, inode)
| exportfs_encode_fid(inode, ..)
| ovl_encode_fh(inode, ..)
| ovl_check_encode_origin(inode)
| * deref i_sb->s_root *
|
|
v
fsnotify_sb_delete(sb)
Which then leads to:
[ 32.133461] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
[ 32.134438] KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037]
[ 32.135032] CPU: 1 UID: 0 PID: 4468 Comm: systemd-coredum Not tainted 6.17.0-rc6 #22 PREEMPT(none)
<snip registers, unreliable trace>
[ 32.143353] Call Trace:
[ 32.143732] ovl_encode_fh+0xd5/0x170
[ 32.144031] exportfs_encode_inode_fh+0x12f/0x300
[ 32.144425] show_mark_fhandle+0xbe/0x1f0
[ 32.145805] inotify_fdinfo+0x226/0x2d0
[ 32.146442] inotify_show_fdinfo+0x1c5/0x350
[ 32.147168] seq_show+0x530/0x6f0
[ 32.147449] seq_read_iter+0x503/0x12a0
[ 32.148419] seq_read+0x31f/0x410
[ 32.150714] vfs_read+0x1f0/0x9e0
[ 32.152297] ksys_read+0x125/0x240
IOW ovl_check_encode_origin derefs inode->i_sb->s_root, after it was set
to NULL in the unmount path.
Fix it by protecting calling exportfs_encode_fid() from
show_mark_fhandle() with s_umount lock.
This form of fix was suggested by Amir in [1].
[1]: https://lore.kernel.org/all/CAOQ4uxhbDwhb+2Brs1UdkoF0a3NSdBAOQPNfEHjahrgoKJpLEw@mail.gmail.com/
Fixes: c45beebfde ("ovl: support encoding fid from inode with no alias")
Signed-off-by: Jakub Acs <acsjakub@amazon.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Christian Brauner <brauner@kernel.org>
Cc: linux-unionfs@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 399d109347 ]
Currently, during the LAN8814 PTP probe shared->phydev is only set if PTP
clock gets actually set, otherwise the function will return before setting
it.
This is an issue as shared->phydev is unconditionally being used when IRQ
is being handled, especially in lan8814_gpio_process_cap and since it was
not set it will cause a NULL pointer exception and crash the kernel.
So, simply always set shared->phydev to avoid the NULL pointer exception.
Fixes: b3f1a08fcf ("net: phy: micrel: Add support for PTP_PF_EXTTS for lan8814")
Signed-off-by: Robert Marko <robert.marko@sartura.hr>
Tested-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://patch.msgid.link/20251021132034.983936-1-robert.marko@sartura.hr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 143937ca51 ]
Current pte_mkwrite_novma() makes PTE dirty unconditionally. This may
mark some pages that are never written dirty wrongly. For example,
do_swap_page() may map the exclusive pages with writable and clean PTEs
if the VMA is writable and the page fault is for read access.
However, current pte_mkwrite_novma() implementation always dirties the
PTE. This may cause unnecessary disk writing if the pages are
never written before being reclaimed.
So, change pte_mkwrite_novma() to clear the PTE_RDONLY bit only if the
PTE_DIRTY bit is set to make it possible to make the PTE writable and
clean.
The current behavior was introduced in commit 73e86cb03c ("arm64:
Move PTE_RDONLY bit handling out of set_pte_at()"). Before that,
pte_mkwrite() only sets the PTE_WRITE bit, while set_pte_at() only
clears the PTE_RDONLY bit if both the PTE_WRITE and the PTE_DIRTY bits
are set.
To test the performance impact of the patch, on an arm64 server
machine, run 16 redis-server processes on socket 1 and 16
memtier_benchmark processes on socket 0 with mostly get
transactions (that is, redis-server will mostly read memory only).
The memory footprint of redis-server is larger than the available
memory, so swap out/in will be triggered. Test results show that the
patch can avoid most swapping out because the pages are mostly clean.
And the benchmark throughput improves ~23.9% in the test.
Fixes: 73e86cb03c ("arm64: Move PTE_RDONLY bit handling out of set_pte_at()")
Signed-off-by: Huang Ying <ying.huang@linux.alibaba.com>
Cc: Will Deacon <will@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Gavin Shan <gshan@redhat.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 49d34f3dd8 ]
Resolve race conditions in timestamp events list handling between TX
and RX paths causing missed timestamps.
The current implementation uses a single events list for both TX and RX
timestamps. The am65_cpts_find_ts() function acquires the lock,
splices all events (TX as well as RX events) to a temporary list,
and releases the lock. This function performs matching of timestamps
for TX packets only. Before it acquires the lock again to put the
non-TX events back to the main events list, a concurrent RX
processing thread could acquire the lock (as observed in practice),
find an empty events list, and fail to attach timestamp to it,
even though a relevant event exists in the spliced list which is yet to
be restored to the main list.
Fix this by creating separate events lists to handle TX and RX
timestamps independently.
Fixes: c459f606f6 ("net: ethernet: ti: am65-cpts: Enable RX HW timestamp for PTP packets using CPTS FIFO")
Signed-off-by: Aksh Garg <a-garg7@ti.com>
Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Link: https://patch.msgid.link/20251016115755.1123646-1-a-garg7@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 87bcef158a ]
XDP programs can change the layout of an xdp_buff through
bpf_xdp_adjust_tail() and bpf_xdp_adjust_head(). Therefore, the driver
cannot assume the size of the linear data area nor fragments. Fix the
bug in mlx5 by generating skb according to xdp_buff after XDP programs
run.
Currently, when handling multi-buf XDP, the mlx5 driver assumes the
layout of an xdp_buff to be unchanged. That is, the linear data area
continues to be empty and fragments remain the same. This may cause
the driver to generate erroneous skb or triggering a kernel
warning. When an XDP program added linear data through
bpf_xdp_adjust_head(), the linear data will be ignored as
mlx5e_build_linear_skb() builds an skb without linear data and then
pull data from fragments to fill the linear data area. When an XDP
program has shrunk the non-linear data through bpf_xdp_adjust_tail(),
the delta passed to __pskb_pull_tail() may exceed the actual nonlinear
data size and trigger the BUG_ON in it.
To fix the issue, first record the original number of fragments. If the
number of fragments changes after the XDP program runs, rewind the end
fragment pointer by the difference and recalculate the truesize. Then,
build the skb with the linear data area matching the xdp_buff. Finally,
only pull data in if there is non-linear data and fill the linear part
up to 256 bytes.
Fixes: f52ac7028b ("net/mlx5e: RX, Add XDP multi-buffer support in Striding RQ")
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1760644540-899148-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit afd5ba577c ]
XDP programs can release xdp_buff fragments when calling
bpf_xdp_adjust_tail(). The driver currently assumes the number of
fragments to be unchanged and may generate skb with wrong truesize or
containing invalid frags. Fix the bug by generating skb according to
xdp_buff after the XDP program runs.
Fixes: ea5d49bdae ("net/mlx5e: Add XDP multi buffer support to the non-linear legacy RQ")
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1760644540-899148-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b66b76a82c ]
CONFIG_INIT_STACK_ALL_ZERO introduces a performance cost by
zero-initializing all stack variables on function entry. The mlx5 XDP
RX path previously allocated a struct mlx5e_xdp_buff on the stack per
received CQE, resulting in measurable performance degradation under
this config.
This patch reuses a mlx5e_xdp_buff stored in the mlx5e_rq struct,
avoiding per-CQE stack allocations and repeated zeroing.
With this change, XDP_DROP and XDP_TX performance matches that of
kernels built without CONFIG_INIT_STACK_ALL_ZERO.
Performance was measured on a ConnectX-6Dx using a single RX channel
(1 CPU at 100% usage) at ~50 Mpps. The baseline results were taken from
net-next-6.15.
Stack zeroing disabled:
- XDP_DROP:
* baseline: 31.47 Mpps
* baseline + per-RQ allocation: 32.31 Mpps (+2.68%)
- XDP_TX:
* baseline: 12.41 Mpps
* baseline + per-RQ allocation: 12.95 Mpps (+4.30%)
Stack zeroing enabled:
- XDP_DROP:
* baseline: 24.32 Mpps
* baseline + per-RQ allocation: 32.27 Mpps (+32.7%)
- XDP_TX:
* baseline: 11.80 Mpps
* baseline + per-RQ allocation: 12.24 Mpps (+3.72%)
Reported-by: Sebastiano Miano <mianosebastiano@gmail.com>
Reported-by: Samuel Dobron <sdobron@redhat.com>
Link: https://lore.kernel.org/all/CAMENy5pb8ea+piKLg5q5yRTMZacQqYWAoVLE1FE9WhQPq92E0g@mail.gmail.com/
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://patch.msgid.link/1747253032-663457-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: afd5ba577c ("net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for legacy RQ")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a73ca0449b ]
sctp_vrf.sh could fail:
TEST 12: bind vrf-2 & 1 in server, connect from client 1 & 2, N [FAIL]
not ok 1 selftests: net: sctp_vrf.sh # exit=3
The failure happens when the server bind in a new run conflicts with an
existing association from the previous run:
[1] ip netns exec $SERVER_NS ./sctp_hello server ...
[2] ip netns exec $CLIENT_NS ./sctp_hello client ...
[3] ip netns exec $SERVER_NS pkill sctp_hello ...
[4] ip netns exec $SERVER_NS ./sctp_hello server ...
It occurs if the client in [2] sends a message and closes immediately.
With the message unacked, no SHUTDOWN is sent. Killing the server in [3]
triggers a SHUTDOWN the client also ignores due to the unacked message,
leaving the old association alive. This causes the bind at [4] to fail
until the message is acked and the client responds to a second SHUTDOWN
after the server’s T2 timer expires (3s).
This patch fixes the issue by preventing the client from sending data.
Instead, the client blocks on recv() and waits for the server to close.
It also waits until both the server and the client sockets are fully
released in stop_server and wait_client before restarting.
Additionally, replace 2>&1 >/dev/null with -q in sysctl and grep, and
drop other redundant 2>&1 >/dev/null redirections, and fix a typo from
N to Y (connect successfully) in the description of the last test.
Fixes: a61bd7b9fe ("selftests: add a selftest for sctp vrf")
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Tested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/be2dacf52d0917c4ba5e2e8c5a9cb640740ad2b6.1760731574.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 902e81e679 ]
The blamed commit increased the needed headroom to account for
alignment. This means that the size required to always align a Tx buffer
was added inside the dpaa2_eth_needed_headroom() function. By doing
that, a manual adjustment of the pointer passed to PTR_ALIGN() was no
longer correct since the 'buffer_start' variable was already pointing
to the start of the skb's memory.
The behavior of the dpaa2-eth driver without this patch was to drop
frames on Tx even when the headroom was matching the 128 bytes
necessary. Fix this by removing the manual adjust of 'buffer_start' from
the PTR_MODE call.
Closes: https://lore.kernel.org/netdev/70f0dcd9-1906-4d13-82df-7bbbbe7194c6@app.fastmail.com/T/#u
Fixes: f422abe3f2 ("dpaa2-eth: increase the needed headroom to account for alignment")
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Tested-by: Mathew McBride <matt@traverse.com.au>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251016135807.360978-1-ioana.ciornei@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e59bc32df2 ]
The ENETC RX ring uses the page halves flipping mechanism, each page is
split into two halves for the RX ring to use. And ENETC_RXB_TRUESIZE is
defined to 2048 to indicate the size of half a page. However, the page
size is configurable, for ARM64 platform, PAGE_SIZE is default to 4K,
but it could be configured to 16K or 64K.
When PAGE_SIZE is set to 16K or 64K, ENETC_RXB_TRUESIZE is not correct,
and the RX ring will always use the first half of the page. This is not
consistent with the description in the relevant kernel doc and commit
messages.
This issue is invisible in most cases, but if users want to increase
PAGE_SIZE to receive a Jumbo frame with a single buffer for some use
cases, it will not work as expected, because the buffer size of each
RX BD is fixed to 2048 bytes.
Based on the above two points, we expect to correct ENETC_RXB_TRUESIZE
to (PAGE_SIZE >> 1), as described in the comment.
Fixes: d4fd0404c1 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://patch.msgid.link/20251016080131.3127122-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 50bd33f6b3 ]
After applying the workaround for err050089, the LS1028A platform
experiences RCU stalls on RT kernel. This issue is caused by the
recursive acquisition of the read lock enetc_mdio_lock. Here list some
of the call stacks identified under the enetc_poll path that may lead to
a deadlock:
enetc_poll
-> enetc_lock_mdio
-> enetc_clean_rx_ring OR napi_complete_done
-> napi_gro_receive
-> enetc_start_xmit
-> enetc_lock_mdio
-> enetc_map_tx_buffs
-> enetc_unlock_mdio
-> enetc_unlock_mdio
After enetc_poll acquires the read lock, a higher-priority writer attempts
to acquire the lock, causing preemption. The writer detects that a
read lock is already held and is scheduled out. However, readers under
enetc_poll cannot acquire the read lock again because a writer is already
waiting, leading to a thread hang.
Currently, the deadlock is avoided by adjusting enetc_lock_mdio to prevent
recursive lock acquisition.
Fixes: 6d36ecdbc4 ("net: enetc: take the MDIO lock only once per NAPI poll cycle")
Signed-off-by: Jianpeng Chang <jianpeng.chang.cn@windriver.com>
Acked-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20251015021427.180757-1-jianpeng.chang.cn@windriver.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bf29555f5b ]
Creating FDB entries is possible from a non-initial user namespace when
having CAP_NET_ADMIN, yet, when deleting FDB entries, processes receive
an EPERM because the capability is always checked against the initial
user namespace. This restricts the FDB management from unprivileged
containers.
Drop the netlink_capable check in rtnl_fdb_del as it was originally
dropped in c5c351088a and reintroduced in 1690be63a2 without
intention.
This patch was tested using a container on GyroidOS, where it was
possible to delete FDB entries from an unprivileged user namespace and
private network namespace.
Fixes: 1690be63a2 ("bridge: Add vlan support to static neighbors")
Reviewed-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
Tested-by: Harshal Gohel <hg@simonwunderlich.de>
Signed-off-by: Johannes Wiesböck <johannes.wiesboeck@aisec.fraunhofer.de>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20251015201548.319871-1-johannes.wiesboeck@aisec.fraunhofer.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit aaf043a568 ]
When building with Clang 20 or newer, there are some objtool warnings
from unexpected fallthroughs to other functions:
vmlinux.o: warning: objtool: mlx5e_mpwrq_mtts_per_wqe() falls through to next function mlx5e_mpwrq_max_num_entries()
vmlinux.o: warning: objtool: mlx5e_mpwrq_max_log_rq_size() falls through to next function mlx5e_get_linear_rq_headroom()
LLVM 20 contains an (admittedly problematic [1]) optimization [2] to
convert divide by zero into the equivalent of __builtin_unreachable(),
which invokes undefined behavior and destroys code generation when it is
encountered in a control flow graph.
mlx5e_mpwrq_umr_entry_size() returns 0 in the default case of an
unrecognized mlx5e_mpwrq_umr_mode value. mlx5e_mpwrq_mtts_per_wqe(),
which is inlined into mlx5e_mpwrq_max_log_rq_size(), uses the result of
mlx5e_mpwrq_umr_entry_size() in a divide operation without checking for
zero, so LLVM is able to infer there will be a divide by zero in this
case and invokes undefined behavior. While there is some proposed work
to isolate this undefined behavior and avoid the destructive code
generation that results in these objtool warnings, code should still be
defensive against divide by zero.
As the WARN_ONCE() implies that an invalid value should be handled
gracefully, return 1 instead of 0 in the default case so that the
results of this division operation is always valid.
Fixes: 168723c1f8 ("net/mlx5e: xsk: Use umr_mode to calculate striding RQ parameters")
Link: https://lore.kernel.org/CAGG=3QUk8-Ak7YKnRziO4=0z=1C_7+4jF+6ZeDQ9yF+kuTOHOQ@mail.gmail.com/ [1]
Link: 37932643ab [2]
Closes: https://github.com/ClangBuiltLinux/linux/issues/2131
Closes: https://github.com/ClangBuiltLinux/linux/issues/2132
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20251014-mlx5e-avoid-zero-div-from-mlx5e_mpwrq_umr_entry_size-v1-1-dc186b8819ef@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3e3ba654d3 ]
Move the check of the CPU capacity currently stored in the energy model
against the arch_scale_cpu_capacity() value to em_adjust_new_capacity()
so it will be done regardless of where the latter is called from.
This will be useful when a new em_adjust_new_capacity() caller is added
subsequently.
While at it, move the pd local variable declaration in
em_check_capacity_update() into the loop in which it is used.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://patch.msgid.link/7810787.EvYhyI6sBW@rjwysocki.net
Stable-dep-of: 1ebe8f7e78 ("PM: EM: Fix late boot with holes in CPU topology")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a8e62726ac ]
Every iteration of the loop over all possible CPUs in
em_check_capacity_update() causes get_cpu_device() to be called twice
for the same CPU, once indirectly via em_cpu_get() and once directly.
Get rid of the indirect get_cpu_device() call by moving the direct
invocation of it earlier and using em_pd_get() instead of em_cpu_get()
to get a pd pointer for the dev one returned by it.
This also exposes the fact that dev is needed to get a pd, so the code
becomes somewhat easier to follow after it.
No functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/1925950.tdWV9SEqCh@rjwysocki.net
Stable-dep-of: 1ebe8f7e78 ("PM: EM: Fix late boot with holes in CPU topology")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ee916dccd4 ]
This pattern isn't very documented, and apparently not used much outside
of 'make tools/help', but it has existed for over a decade (since commit
ea01fa9f63: "tools: Connect to the kernel build system").
However, it doesn't work very well for most cases, particularly the
useful "tools/all" target, because it overrides the LDFLAGS value with
an empty one.
And once overridden, 'make' will then not honor the tooling makefiles
trying to change it - which then makes any LDFLAGS use in the tooling
directory break, typically causing odd link errors.
Remove that LDFLAGS override, since it seems to be entirely historical.
The core kernel makefiles no longer modify LDFLAGS as part of the build,
and use kernel-specific link flags instead (eg 'KBUILD_LDFLAGS' and
friends).
This allows more of the 'make tools/*' cases to work. I say 'more',
because some of the tooling build rules make various other assumptions
or have other issues, so it's still a bit hit-or-miss. But those issues
tend to show up with the 'make -C tools xyz' pattern too, so now it's no
longer an issue of this particular 'tools/*' build rule being special.
Acked-by: Nathan Chancellor <nathan@kernel.org>
Cc: Nicolas Schier <nicolas@fjasle.eu>
Cc: Borislav Petkov <bp@alien8.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 43de0ac332 ]
Event ID is only using the attr::config bit [7, 0] but we check the
event range using the whole 64bit field. It blocks the usage of the
rest field of attr::config. Relax the check by only using the
bit [7, 0].
Acked-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Yushan Wang <wangyushan12@huawei.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9316512b71 ]
PAGE_KERNEL_TEXT is an old macro that is used to tell kernel whether
kernel text has to be mapped read-only or read-write based on build
time options.
But nowadays, with functionnalities like jump_labels, static links,
etc ... more only less all kernels need to be read-write at some
point, and some combinations of configs failed to work due to
innacurate setting of PAGE_KERNEL_TEXT. On the other hand, today
we have CONFIG_STRICT_KERNEL_RWX which implements a more controlled
access to kernel modifications.
Instead of trying to keep PAGE_KERNEL_TEXT accurate with all
possible options that may imply kernel text modification, always
set kernel text read-write at startup and rely on
CONFIG_STRICT_KERNEL_RWX to provide accurate protection.
Do this by passing PAGE_KERNEL_X to map_kernel_page() in
__maping_ram_chunk() instead of passing PAGE_KERNEL_TEXT. Once
this is done, the only remaining user of PAGE_KERNEL_TEXT is
mmu_mark_initmem_nx() which uses it in a call to setibat().
As setibat() ignores the RW/RO, we can seamlessly replace
PAGE_KERNEL_TEXT by PAGE_KERNEL_X here as well and get rid of
PAGE_KERNEL_TEXT completely.
Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Closes: https://lore.kernel.org/all/342b4120-911c-4723-82ec-d8c9b03a8aef@mailbox.org/
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/8e2d793abf87ae3efb8f6dce10f974ac0eda61b8.1757412205.git.christophe.leroy@csgroup.eu
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 28c4d9bc07 ]
In gdlm_put_lock(), there is a small window of time in which the
DFL_UNMOUNT flag has been set but the lockspace hasn't been released,
yet. In that window, dlm may still call gdlm_ast() and gdlm_bast().
To prevent it from dereferencing freed glock objects, only free the
glock if the lockspace has actually been released.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f4d4ebc849 ]
The `ID_AA64MMFR4_EL1.EIESB` field, is an unsigned enumeration, but was
incorrectly defined as a `SignedEnum` when introduced in commit
cfc680bb04 ("arm64: sysreg: Add layout for ID_AA64MMFR4_EL1"). This is
corrected to `UnsignedEnum`.
Conversely, the `ID_AA64DFR0_EL1.DoubleLock` field, is a signed
enumeration, but was incorrectly defined as an `UnsignedEnum`. This is
corrected to `SignedEnum`, which wasn't correctly set when annotated as
such in commit ad16d4cf0b ("arm64/sysreg: Initial unsigned annotations
for ID registers").
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 00e58ff924 ]
In preparation for the future commit ("bitops: Add __attribute_const__ to generic
ffs()-family implementations"), which allows GCC's value range tracker
to see past ffs(), GCC 8 on ARM thinks that it might be possible that
"ffs(rq) - 8" used here:
v = FIELD_PREP(PCI_EXP_DEVCTL_READRQ, ffs(rq) - 8);
could wrap below 0, leading to a very large value, which would be out of
range for the FIELD_PREP() usage:
drivers/pci/pci.c: In function 'pcie_set_readrq':
include/linux/compiler_types.h:572:38: error: call to '__compiletime_assert_471' declared with attribute error: FIELD_PREP: value too large for the field
...
drivers/pci/pci.c:5896:6: note: in expansion of macro 'FIELD_PREP'
v = FIELD_PREP(PCI_EXP_DEVCTL_READRQ, ffs(rq) - 8);
^~~~~~~~~~
If the result of the ffs() is bounds checked before being used in
FIELD_PREP(), the value tracker seems happy again. :)
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Closes: https://lore.kernel.org/linux-pci/CA+G9fYuysVr6qT8bjF6f08WLyCJRG7aXAeSd2F7=zTaHHd7L+Q@mail.gmail.com/
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20250905052836.work.425-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2048ec5b98 ]
The syzbot reported issue in hfs_find_set_zero_bits():
=====================================================
BUG: KMSAN: uninit-value in hfs_find_set_zero_bits+0x74d/0xb60 fs/hfs/bitmap.c:45
hfs_find_set_zero_bits+0x74d/0xb60 fs/hfs/bitmap.c:45
hfs_vbm_search_free+0x13c/0x5b0 fs/hfs/bitmap.c:151
hfs_extend_file+0x6a5/0x1b00 fs/hfs/extent.c:408
hfs_get_block+0x435/0x1150 fs/hfs/extent.c:353
__block_write_begin_int+0xa76/0x3030 fs/buffer.c:2151
block_write_begin fs/buffer.c:2262 [inline]
cont_write_begin+0x10e1/0x1bc0 fs/buffer.c:2601
hfs_write_begin+0x85/0x130 fs/hfs/inode.c:52
cont_expand_zero fs/buffer.c:2528 [inline]
cont_write_begin+0x35a/0x1bc0 fs/buffer.c:2591
hfs_write_begin+0x85/0x130 fs/hfs/inode.c:52
hfs_file_truncate+0x1d6/0xe60 fs/hfs/extent.c:494
hfs_inode_setattr+0x964/0xaa0 fs/hfs/inode.c:654
notify_change+0x1993/0x1aa0 fs/attr.c:552
do_truncate+0x28f/0x310 fs/open.c:68
do_ftruncate+0x698/0x730 fs/open.c:195
do_sys_ftruncate fs/open.c:210 [inline]
__do_sys_ftruncate fs/open.c:215 [inline]
__se_sys_ftruncate fs/open.c:213 [inline]
__x64_sys_ftruncate+0x11b/0x250 fs/open.c:213
x64_sys_call+0xfe3/0x3db0 arch/x86/include/generated/asm/syscalls_64.h:78
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xd9/0x210 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Uninit was created at:
slab_post_alloc_hook mm/slub.c:4154 [inline]
slab_alloc_node mm/slub.c:4197 [inline]
__kmalloc_cache_noprof+0x7f7/0xed0 mm/slub.c:4354
kmalloc_noprof include/linux/slab.h:905 [inline]
hfs_mdb_get+0x1cc8/0x2a90 fs/hfs/mdb.c:175
hfs_fill_super+0x3d0/0xb80 fs/hfs/super.c:337
get_tree_bdev_flags+0x6e3/0x920 fs/super.c:1681
get_tree_bdev+0x38/0x50 fs/super.c:1704
hfs_get_tree+0x35/0x40 fs/hfs/super.c:388
vfs_get_tree+0xb0/0x5c0 fs/super.c:1804
do_new_mount+0x738/0x1610 fs/namespace.c:3902
path_mount+0x6db/0x1e90 fs/namespace.c:4226
do_mount fs/namespace.c:4239 [inline]
__do_sys_mount fs/namespace.c:4450 [inline]
__se_sys_mount+0x6eb/0x7d0 fs/namespace.c:4427
__x64_sys_mount+0xe4/0x150 fs/namespace.c:4427
x64_sys_call+0xfa7/0x3db0 arch/x86/include/generated/asm/syscalls_64.h:166
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xd9/0x210 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
CPU: 1 UID: 0 PID: 12609 Comm: syz.1.2692 Not tainted 6.16.0-syzkaller #0 PREEMPT(none)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025
=====================================================
The HFS_SB(sb)->bitmap buffer is allocated in hfs_mdb_get():
HFS_SB(sb)->bitmap = kmalloc(8192, GFP_KERNEL);
Finally, it can trigger the reported issue because kmalloc()
doesn't clear the allocated memory. If allocated memory contains
only zeros, then everything will work pretty fine.
But if the allocated memory contains the "garbage", then
it can affect the bitmap operations and it triggers
the reported issue.
This patch simply exchanges the kmalloc() on kzalloc()
with the goal to guarantee the correctness of bitmap operations.
Because, newly created allocation bitmap should have all
available blocks free. Potentially, initialization bitmap's read
operation could not fill the whole allocated memory and
"garbage" in the not initialized memory will be the reason of
volume coruptions and file system driver bugs.
Reported-by: syzbot <syzbot+773fa9d79b29bd8b6831@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=773fa9d79b29bd8b6831
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250820230636.179085-1-slava@dubeyko.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c62663a986 ]
Potenatially, __hfs_ext_read_extent() could operate by
not initialized values of fd->key after hfs_brec_find() call:
static inline int __hfs_ext_read_extent(struct hfs_find_data *fd, struct hfs_extent *extent,
u32 cnid, u32 block, u8 type)
{
int res;
hfs_ext_build_key(fd->search_key, cnid, block, type);
fd->key->ext.FNum = 0;
res = hfs_brec_find(fd);
if (res && res != -ENOENT)
return res;
if (fd->key->ext.FNum != fd->search_key->ext.FNum ||
fd->key->ext.FkType != fd->search_key->ext.FkType)
return -ENOENT;
if (fd->entrylength != sizeof(hfs_extent_rec))
return -EIO;
hfs_bnode_read(fd->bnode, extent, fd->entryoffset, sizeof(hfs_extent_rec));
return 0;
}
This patch changes kmalloc() on kzalloc() in hfs_find_init()
and intializes fd->record, fd->keyoffset, fd->keylength,
fd->entryoffset, fd->entrylength for the case if hfs_brec_find()
has been found nothing in the b-tree node.
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250818225252.126427-1-slava@dubeyko.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 18b07c44f2 ]
Currently, hfs_brec_remove() executes moving records
towards the location of deleted record and it updates
offsets of moved records. However, the hfs_brec_remove()
logic ignores the "mess" of b-tree node's free space and
it doesn't touch the offsets out of records number.
Potentially, it could confuse fsck or driver logic or
to be a reason of potential corruption cases.
This patch reworks the logic of hfs_brec_remove()
by means of clearing freed space of b-tree node
after the records moving. And it clear the last
offset that keeping old location of free space
because now the offset before this one is keeping
the actual offset to the free space after the record
deletion.
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250815194918.38165-1-slava@dubeyko.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a20b83cf45 ]
On nios2, with CONFIG_FLATMEM set, the kernel relies on
memblock_get_current_limit() to determine the limits of mem_map, in
particular for max_low_pfn.
Unfortunately, memblock.current_limit is only default initialized to
MEMBLOCK_ALLOC_ANYWHERE at this point of the bootup, potentially leading
to situations where max_low_pfn can erroneously exceed the value of
max_pfn and, thus, the valid range of available DRAM.
This can in turn cause kernel-level paging failures, e.g.:
[ 76.900000] Unable to handle kernel paging request at virtual address 20303000
[ 76.900000] ea = c0080890, ra = c000462c, cause = 14
[ 76.900000] Kernel panic - not syncing: Oops
[ 76.900000] ---[ end Kernel panic - not syncing: Oops ]---
This patch fixes this by pre-calculating memblock.current_limit
based on the upper limits of the available memory ranges via
adjust_lowmem_bounds, a simplified version of the equivalent
implementation within the arm architecture.
Signed-off-by: Simon Schuster <schuster.simon@siemens-energy.com>
Signed-off-by: Andreas Oetken <andreas.oetken@siemens-energy.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Don't mix dma fence lock with the active_job lock. Use fence_lock to
protect the dma fence used by drm scheduler when signalling a job
completion and queue_lock to protect concurrent access to active bin job
in OOM and stats collection for a given file priv. The issue was
uncovered when PREEMPT_RT on with a system freeze when opening multiple
Chromium tabs on Raspberry Pi 5.
Link: https://github.com/raspberrypi/linux/issues/7035
Fixes: fa6a20c874 ("drm/v3d: Address race-condition between per-fd GPU stats and fd release")
Signed-off-by: Melissa Wen <mwen@igalia.com>
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Maíra Canal <mcanal@igalia.com>
Signed-off-by: Melissa Wen <melissa.srw@gmail.com>
Link: https://lore.kernel.org/r/20250916172022.2779837-1-mwen@igalia.com
Pass-through mode disables all gamma and brightness processing, sending
the raw pixel data directly to the LEDs. It is enabled by setting the
brightness to zero, either in Device Tree or using the runtime method of
writing a single byte (in this case 0) to the device.
See: https://github.com/raspberrypi/linux/issues/7108
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Remove CONFIG_MEDIA_PCI_HAILO from all the arm64 defconfig files as this
driver is no longer built in the kernel tree.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
Upstream commit b7cb9a0343 ("dmaengine: idxd: Fix refcount underflow
on module unload") fixes a refcount underflow by replacing the call to
idxd_cleanup() in the remove function with direct cleanup calls. That works
fine upstream. However, upstream removed support for IOMMU_DEV_FEAT_IOPF,
which is still supported in v6.12.y. The backport of commit b7cb9a0343
into v6.12.y misses the call to disable it. This results in a warning
backtrace when unloading and reloading the module.
WARNING: CPU: 0 PID: 665849 at drivers/pci/ats.c:337 pci_reset_pri+0x4c/0x60
...
RIP: 0010:pci_reset_pri+0xa7/0x130
Add the missing cleanup call to fix the problem.
Fixes: ce81905bec ("dmaengine: idxd: Fix refcount underflow on module unload")
Cc: Yi Sun <yi.sun@intel.com>
Cc: Shuai Xue <xueshuai@linux.alibaba.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Cc: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f04aad36a0 upstream.
syzkaller discovered the following crash: (kernel BUG)
[ 44.607039] ------------[ cut here ]------------
[ 44.607422] kernel BUG at mm/userfaultfd.c:2067!
[ 44.608148] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
[ 44.608814] CPU: 1 UID: 0 PID: 2475 Comm: reproducer Not tainted 6.16.0-rc6 #1 PREEMPT(none)
[ 44.609635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 44.610695] RIP: 0010:userfaultfd_release_all+0x3a8/0x460
<snip other registers, drop unreliable trace>
[ 44.617726] Call Trace:
[ 44.617926] <TASK>
[ 44.619284] userfaultfd_release+0xef/0x1b0
[ 44.620976] __fput+0x3f9/0xb60
[ 44.621240] fput_close_sync+0x110/0x210
[ 44.622222] __x64_sys_close+0x8f/0x120
[ 44.622530] do_syscall_64+0x5b/0x2f0
[ 44.622840] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 44.623244] RIP: 0033:0x7f365bb3f227
Kernel panics because it detects UFFD inconsistency during
userfaultfd_release_all(). Specifically, a VMA which has a valid pointer
to vma->vm_userfaultfd_ctx, but no UFFD flags in vma->vm_flags.
The inconsistency is caused in ksm_madvise(): when user calls madvise()
with MADV_UNMEARGEABLE on a VMA that is registered for UFFD in MINOR mode,
it accidentally clears all flags stored in the upper 32 bits of
vma->vm_flags.
Assuming x86_64 kernel build, unsigned long is 64-bit and unsigned int and
int are 32-bit wide. This setup causes the following mishap during the &=
~VM_MERGEABLE assignment.
VM_MERGEABLE is a 32-bit constant of type unsigned int, 0x8000'0000.
After ~ is applied, it becomes 0x7fff'ffff unsigned int, which is then
promoted to unsigned long before the & operation. This promotion fills
upper 32 bits with leading 0s, as we're doing unsigned conversion (and
even for a signed conversion, this wouldn't help as the leading bit is 0).
& operation thus ends up AND-ing vm_flags with 0x0000'0000'7fff'ffff
instead of intended 0xffff'ffff'7fff'ffff and hence accidentally clears
the upper 32-bits of its value.
Fix it by changing `VM_MERGEABLE` constant to unsigned long, using the
BIT() macro.
Note: other VM_* flags are not affected: This only happens to the
VM_MERGEABLE flag, as the other VM_* flags are all constants of type int
and after ~ operation, they end up with leading 1 and are thus converted
to unsigned long with leading 1s.
Note 2:
After commit 31defc3b01 ("userfaultfd: remove (VM_)BUG_ON()s"), this is
no longer a kernel BUG, but a WARNING at the same place:
[ 45.595973] WARNING: CPU: 1 PID: 2474 at mm/userfaultfd.c:2067
but the root-cause (flag-drop) remains the same.
[akpm@linux-foundation.org: rust bindgen wasn't able to handle BIT(), from Miguel]
Link: https://lore.kernel.org/oe-kbuild-all/202510030449.VfSaAjvd-lkp@intel.com/
Link: https://lkml.kernel.org/r/20251001090353.57523-2-acsjakub@amazon.de
Fixes: 7677f7fd8b ("userfaultfd: add minor fault registration mode")
Signed-off-by: Jakub Acs <acsjakub@amazon.de>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: SeongJae Park <sj@kernel.org>
Tested-by: Alice Ryhl <aliceryhl@google.com>
Tested-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Xu Xin <xu.xin16@zte.com.cn>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[acsjakub@amazon.de: adjust context in bindgings_helper.h]
Signed-off-by: Jakub Acs <acsjakub@amazon.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2c27aaee93 ]
Do read-modify-write so that we re-use the characterized reset value as
specified in TRM [1] to program calibration wait time which defines number
of cycles to wait for after startup state machine is in bandgap enable
state.
This fixes PLL lock timeout error faced while using RPi DSI Panel on TI's
AM62L and J721E SoC since earlier calibration wait time was getting
overwritten to zero value thus failing the PLL to lockup and causing
timeout.
[1] AM62P TRM (Section 14.8.6.3.2.1.1 DPHY_TX_DPHYTX_CMN0_CMN_DIG_TBIT2):
Link: https://www.ti.com/lit/pdf/spruj83
Cc: stable@vger.kernel.org
Fixes: 7a343c8bf4 ("phy: Add Cadence D-PHY support")
Signed-off-by: Devarsh Thakkar <devarsht@ti.com>
Tested-by: Harikrishna Shenoy <h-shenoy@ti.com>
Reviewed-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Link: https://lore.kernel.org/r/20250704125915.1224738-3-devarsht@ti.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1dbf1d590d ]
In ip_output() skb->dev is updated from the skb_dst(skb)->dev
this can become invalid when the interface is unregistered and freed,
Introduced new skb_dst_dev_rcu() function to be used instead of
skb_dst_dev() within rcu_locks in ip_output.This will ensure that
all the skb's associated with the dev being deregistered will
be transnmitted out first, before freeing the dev.
Given that ip_output() is called within an rcu_read_lock()
critical section or from a bottom-half context, it is safe to introduce
an RCU read-side critical section within it.
Multiple panic call stacks were observed when UL traffic was run
in concurrency with device deregistration from different functions,
pasting one sample for reference.
[496733.627565][T13385] Call trace:
[496733.627570][T13385] bpf_prog_ce7c9180c3b128ea_cgroupskb_egres+0x24c/0x7f0
[496733.627581][T13385] __cgroup_bpf_run_filter_skb+0x128/0x498
[496733.627595][T13385] ip_finish_output+0xa4/0xf4
[496733.627605][T13385] ip_output+0x100/0x1a0
[496733.627613][T13385] ip_send_skb+0x68/0x100
[496733.627618][T13385] udp_send_skb+0x1c4/0x384
[496733.627625][T13385] udp_sendmsg+0x7b0/0x898
[496733.627631][T13385] inet_sendmsg+0x5c/0x7c
[496733.627639][T13385] __sys_sendto+0x174/0x1e4
[496733.627647][T13385] __arm64_sys_sendto+0x28/0x3c
[496733.627653][T13385] invoke_syscall+0x58/0x11c
[496733.627662][T13385] el0_svc_common+0x88/0xf4
[496733.627669][T13385] do_el0_svc+0x2c/0xb0
[496733.627676][T13385] el0_svc+0x2c/0xa4
[496733.627683][T13385] el0t_64_sync_handler+0x68/0xb4
[496733.627689][T13385] el0t_64_sync+0x1a4/0x1a8
Changes in v3:
- Replaced WARN_ON() with WARN_ON_ONCE(), as suggested by Willem de Bruijn.
- Dropped legacy lines mistakenly pulled in from an outdated branch.
Changes in v2:
- Addressed review comments from Eric Dumazet
- Used READ_ONCE() to prevent potential load/store tearing
- Added skb_dst_dev_rcu() and used along with rcu_read_lock() in ip_output
Signed-off-by: Sharath Chandra Vurukala <quic_sharathv@quicinc.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250730105118.GA26100@hu-sharathv-hyd.qualcomm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: 833d4313bc ("mptcp: reset blackhole on success with non-loopback ifaces")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 88fe14253e ]
dst->dev is read locklessly in many contexts,
and written in dst_dev_put().
Fixing all the races is going to need many changes.
We probably will have to add full RCU protection.
Add three helpers to ease this painful process.
static inline struct net_device *dst_dev(const struct dst_entry *dst)
{
return READ_ONCE(dst->dev);
}
static inline struct net_device *skb_dst_dev(const struct sk_buff *skb)
{
return dst_dev(skb_dst(skb));
}
static inline struct net *skb_dst_dev_net(const struct sk_buff *skb)
{
return dev_net(skb_dst_dev(skb));
}
static inline struct net *skb_dst_dev_net_rcu(const struct sk_buff *skb)
{
return dev_net_rcu(skb_dst_dev(skb));
}
Fixes: 4a6ce2b6f2 ("net: introduce a new function dst_dev_put()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250630121934.3399505-7-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: 833d4313bc ("mptcp: reset blackhole on success with non-loopback ifaces")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 15492700ac ]
tcp_in_quickack_mode() is called from input path for small packets.
It calls __sk_dst_get() which reads sk->sk_dst_cache which has been
put in sock_read_tx group (for good reasons).
Then dst_metric(dst, RTAX_QUICKACK) also needs extra cache line misses.
Cache RTAX_QUICKACK in icsk->icsk_ack.dst_quick_ack to no longer pull
these cache lines for the cases a delayed ACK is scheduled.
After this patch TCP receive path does not longer access sock_read_tx
group.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250312083907.1931644-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Stable-dep-of: 833d4313bc ("mptcp: reset blackhole on success with non-loopback ifaces")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a7075f501b ]
There was backward compatibility in the terms of mailbox API. Various
drivers from various OSes supporting 10G adapters from Intel portfolio
could easily negotiate mailbox API.
This convention has been broken since introducing API 1.4.
Commit 0062e7cc95 ("ixgbevf: add VF IPsec offload code") added support
for IPSec which is specific only for the kernel ixgbe driver. None of the
rest of the Intel 10G PF/VF drivers supports it. And actually lack of
support was not included in the IPSec implementation - there were no such
code paths. No possibility to negotiate support for the feature was
introduced along with introduction of the feature itself.
Commit 339f289641 ("ixgbevf: Add support for new mailbox communication
between PF and VF") increasing API version to 1.5 did the same - it
introduced code supported specifically by the PF ESX driver. It altered API
version for the VF driver in the same time not touching the version
defined for the PF ixgbe driver. It led to additional discrepancies,
as the code provided within API 1.6 cannot be supported for Linux ixgbe
driver as it causes crashes.
The issue was noticed some time ago and mitigated by Jake within the commit
d0725312ad ("ixgbevf: stop attempting IPSEC offload on Mailbox API 1.5").
As a result we have regression for IPsec support and after increasing API
to version 1.6 ixgbevf driver stopped to support ESX MBX.
To fix this mess add new mailbox op asking PF driver about supported
features. Basing on a response determine whether to set support for IPSec
and ESX-specific enhanced mailbox.
New mailbox op, for compatibility purposes, must be added within new API
revision, as API version of OOT PF & VF drivers is already increased to
1.6 and doesn't incorporate features negotiate op.
Features negotiation mechanism gives possibility to be extended with new
features when needed in the future.
Reported-by: Jacob Keller <jacob.e.keller@intel.com>
Closes: https://lore.kernel.org/intel-wired-lan/20241101-jk-ixgbevf-mailbox-v1-5-fixes-v1-0-f556dc9a66ed@intel.com/
Fixes: 0062e7cc95 ("ixgbevf: add VF IPsec offload code")
Fixes: 339f289641 ("ixgbevf: Add support for new mailbox communication between PF and VF")
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20251009-jk-iwl-net-2025-10-01-v3-4-ef32a425b92a@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 56094ad3ea ]
When user calls open_by_handle_at() on some inode that is not cached, we
will create disconnected dentry for it. If such dentry is a directory,
exportfs_decode_fh_raw() will then try to connect this dentry to the
dentry tree through reconnect_path(). It may happen for various reasons
(such as corrupted fs or race with rename) that the call to
lookup_one_unlocked() in reconnect_one() will fail to find the dentry we
are trying to reconnect and instead create a new dentry under the
parent. Now this dentry will not be marked as disconnected although the
parent still may well be disconnected (at least in case this
inconsistency happened because the fs is corrupted and .. doesn't point
to the real parent directory). This creates inconsistency in
disconnected flags but AFAICS it was mostly harmless. At least until
commit f1ee616214 ("VFS: don't keep disconnected dentries on d_anon")
which removed adding of most disconnected dentries to sb->s_anon list.
Thus after this commit cleanup of disconnected dentries implicitely
relies on the fact that dput() will immediately reclaim such dentries.
However when some leaf dentry isn't marked as disconnected, as in the
scenario described above, the reclaim doesn't happen and the dentries
are "leaked". Memory reclaim can eventually reclaim them but otherwise
they stay in memory and if umount comes first, we hit infamous "Busy
inodes after unmount" bug. Make sure all dentries created under a
disconnected parent are marked as disconnected as well.
Reported-by: syzbot+1d79ebe5383fc016cf07@syzkaller.appspotmail.com
Fixes: f1ee616214 ("VFS: don't keep disconnected dentries on d_anon")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e95db51c81 ]
Do that before new dentry is visible anywhere. It does create
a new possible state for dentries present in ->d_children/->d_sib -
DCACHE_PAR_LOOKUP present, negative, unhashed, not in in-lookup
hash chains, refcount positive. Those are going to be skipped
by all tree-walkers (both d_walk() callbacks in fs/dcache.c and
explicit loops over children/sibling lists elsewhere) and
dput() is fine with those.
NOTE: dropping the final reference to a "normal" in-lookup dentry
(in in-lookup hash) is a bug - somebody must've forgotten to
call d_lookup_done() on it and bad things will happen. With those
it's OK; if/when we get around to making __dentry_kill() complain
about such breakage, remember that predicate to check should
*not* be just d_in_lookup(victim) but rather a combination of that
with !hlist_bl_unhashed(&victim->d_u.d_in_lookup_hash). Might
be worth considering later...
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Stable-dep-of: 56094ad3ea ("vfs: Don't leak disconnected dentries on umount")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 15292f1b4c ]
Users can create as many monitoring groups as the number of RMIDs supported
by the hardware. However, on AMD systems, only a limited number of RMIDs
are guaranteed to be actively tracked by the hardware. RMIDs that exceed
this limit are placed in an "Unavailable" state.
When a bandwidth counter is read for such an RMID, the hardware sets
MSR_IA32_QM_CTR.Unavailable (bit 62). When such an RMID starts being tracked
again the hardware counter is reset to zero. MSR_IA32_QM_CTR.Unavailable
remains set on first read after tracking re-starts and is clear on all
subsequent reads as long as the RMID is tracked.
resctrl miscounts the bandwidth events after an RMID transitions from the
"Unavailable" state back to being tracked. This happens because when the
hardware starts counting again after resetting the counter to zero, resctrl
in turn compares the new count against the counter value stored from the
previous time the RMID was tracked.
This results in resctrl computing an event value that is either undercounting
(when new counter is more than stored counter) or a mistaken overflow (when
new counter is less than stored counter).
Reset the stored value (arch_mbm_state::prev_msr) of MSR_IA32_QM_CTR to
zero whenever the RMID is in the "Unavailable" state to ensure accurate
counting after the RMID resets to zero when it starts to be tracked again.
Example scenario that results in mistaken overflow
==================================================
1. The resctrl filesystem is mounted, and a task is assigned to a
monitoring group.
$mount -t resctrl resctrl /sys/fs/resctrl
$mkdir /sys/fs/resctrl/mon_groups/test1/
$echo 1234 > /sys/fs/resctrl/mon_groups/test1/tasks
$cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
21323 <- Total bytes on domain 0
"Unavailable" <- Total bytes on domain 1
Task is running on domain 0. Counter on domain 1 is "Unavailable".
2. The task runs on domain 0 for a while and then moves to domain 1. The
counter starts incrementing on domain 1.
$cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
7345357 <- Total bytes on domain 0
4545 <- Total bytes on domain 1
3. At some point, the RMID in domain 0 transitions to the "Unavailable"
state because the task is no longer executing in that domain.
$cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
"Unavailable" <- Total bytes on domain 0
434341 <- Total bytes on domain 1
4. Since the task continues to migrate between domains, it may eventually
return to domain 0.
$cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
17592178699059 <- Overflow on domain 0
3232332 <- Total bytes on domain 1
In this case, the RMID on domain 0 transitions from "Unavailable" state to
active state. The hardware sets MSR_IA32_QM_CTR.Unavailable (bit 62) when
the counter is read and begins tracking the RMID counting from 0.
Subsequent reads succeed but return a value smaller than the previously
saved MSR value (7345357). Consequently, the resctrl's overflow logic is
triggered, it compares the previous value (7345357) with the new, smaller
value and incorrectly interprets this as a counter overflow, adding a large
delta.
In reality, this is a false positive: the counter did not overflow but was
simply reset when the RMID transitioned from "Unavailable" back to active
state.
Here is the text from APM [1] available from [2].
"In PQOS Version 2.0 or higher, the MBM hardware will set the U bit on the
first QM_CTR read when it begins tracking an RMID that it was not
previously tracking. The U bit will be zero for all subsequent reads from
that RMID while it is still tracked by the hardware. Therefore, a QM_CTR
read with the U bit set when that RMID is in use by a processor can be
considered 0 when calculating the difference with a subsequent read."
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3 Monitoring L3 Memory
Bandwidth (MBM).
[ bp: Split commit message into smaller paragraph chunks for better
consumption. ]
Fixes: 4d05bf71f1 ("x86/resctrl: Introduce AMD QOS feature")
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Cc: stable@vger.kernel.org # needs adjustments for <= v6.17
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [2]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7c9ac605e2 ]
resctrl_arch_rmid_read() adjusts the value obtained from MSR_IA32_QM_CTR to
account for the overflow for MBM events and apply counter scaling for all the
events. This logic is common to both reading an RMID and reading a hardware
counter directly.
Refactor the hardware value adjustment logic into get_corrected_val() to
prepare for support of reading a hardware counter.
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
Stable-dep-of: 15292f1b4c ("x86/resctrl: Fix miscount of bandwidth event when reactivating previously unavailable RMID")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 22f166218f ]
If bio is split by internal handling like chunksize or badblocks, the
corresponding trace_block_split() is missing, resulting in blktrace
inability to catch BIO split events and making it harder to analyze the
BIO sequence.
Cc: stable@vger.kernel.org
Fixes: 4b1faf9316 ("block: Kill bio_pair_split()")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 501302d5ce ]
When seq_nr wraps around, the next reorder job with seq 0 is hashed to
the first CPU in padata_do_serial(). Correspondingly, need reset pd->cpu
to the first one when pd->processed wraps around. Otherwise, if the
number of used CPUs is not a power of 2, padata_find_next() will be
checking a wrong list, hence deadlock.
Fixes: 6fc4dbcf02 ("padata: Replace delayed timer with immediate workqueue in padata_reorder")
Cc: <stable@vger.kernel.org>
Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[ relocated fix to padata_find_next() using pd->processed and pd->cpu structure fields ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cd32a0c0dc ]
When we're removing rmap records for crosslinked blocks, use deferred
intent items so that we can try to free/unmap as many of the old data
structure's blocks as we can in the same transaction as the commit.
Cc: <stable@vger.kernel.org> # v6.6
Fixes: 1c7ce115e5 ("xfs: reap large AG metadata extents when possible")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[ adapted xfs_refcount_free_cow_extent() and xfs_rmap_free_extent() ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c24248ed78 ]
The value of skb_data->wait indicates whether skb is passed on to the
core mac80211 stack or released by the driver itself. Make sure that by
the time skb is added to txwd queue and becomes visible to the completing
side, it has already allocated and initialized TX wait related data (in
case it's needed).
This is found by code review and addresses a possible race scenario
described below:
Waiting thread Completing thread
rtw89_core_send_nullfunc()
rtw89_core_tx_write_link()
...
rtw89_pci_txwd_submit()
skb_data->wait = NULL
/* add skb to the queue */
skb_queue_tail(&txwd->queue, skb)
/* another thread (e.g. rtw89_ops_tx) performs TX kick off for the same queue */
rtw89_pci_napi_poll()
...
rtw89_pci_release_txwd_skb()
/* get skb from the queue */
skb_unlink(skb, &txwd->queue)
rtw89_pci_tx_status()
rtw89_core_tx_wait_complete()
/* use incorrect skb_data->wait */
rtw89_core_tx_kick_off_and_wait()
/* assign skb_data->wait but too late */
Found by Linux Verification Center (linuxtesting.org).
Fixes: 1ae5ca6152 ("wifi: rtw89: add function to wait for completion of TX skbs")
Cc: stable@vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20250919210852.823912-3-pchelkin@ispras.ru
[ adapted rtw89_core_tx_write_link() modifications to rtw89_core_tx_write() ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d68886bae7 ]
The data type of loca_last_write_offset is newoffset4 and is switched
on a boolean value, no_newoffset, that indicates if a previous write
occurred or not. If no_newoffset is FALSE, an offset is not given.
This means that client does not try to update the file size. Thus,
server should not try to calculate new file size and check if it fits
into the segment range. See RFC 8881, section 12.5.4.2.
Sometimes the current incorrect logic may cause clients to hang when
trying to sync an inode. If layoutcommit fails, the client marks the
inode as dirty again.
Fixes: 9cf514ccfa ("nfsd: implement pNFS operations")
Cc: stable@vger.kernel.org
Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f963cf2b91 ]
When pNFS client in the block or scsi layout mode sends layoutcommit
to MDS, a variable length array of modified extents is supplied within
the request. This patch allows the server to accept such extent arrays
if they do not fit within single memory page.
The issue can be reproduced when writing to a 1GB file using FIO with
O_DIRECT, 4K block and large I/O depth without preallocation of the
file. In this case, the server returns NFSERR_BADXDR to the client.
Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Stable-dep-of: d68886bae7 ("NFSD: Fix last write offset handling in layoutcommit")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 274365a51d ]
Remove dprintk in nfsd4_layoutcommit. These are not needed
in day to day usage, and the information is also available
in Wireshark when capturing NFS traffic.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Stable-dep-of: d68886bae7 ("NFSD: Fix last write offset handling in layoutcommit")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 832738e4b3 ]
Compilers may optimize the layout of C structures, so we should not rely
on sizeof struct and memcpy to encode and decode XDR structures. The byte
order of the fields should also be taken into account.
This patch adds the correct functions to handle the deviceid4 structure
and removes the pad field, which is currently not used by NFSD, from the
runtime state. The server's byte order is preserved because the deviceid4
blob on the wire is only used as a cookie by the client.
Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Stable-dep-of: d68886bae7 ("NFSD: Fix last write offset handling in layoutcommit")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 26d05e1c37 ]
Update error codes in decoding functions of block and scsi layout
drivers to match the core nfsd code. NFS4ERR_EINVAL means that the
server was able to decode the request, but the decoded values are
invalid. Use NFS4ERR_BADXDR instead to indicate a decoding error.
And ENOMEM is changed to nfs code NFS4ERR_DELAY.
Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Stable-dep-of: d68886bae7 ("NFSD: Fix last write offset handling in layoutcommit")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0792c1984a ]
Rework the power management in inv_icm42600_core_probe() to use
devm_pm_runtime_set_active_enabled(), which simplifies the runtime PM
setup by handling activation and enabling in one step.
Remove the separate inv_icm42600_disable_pm callback, as it's no longer
needed with the devm-managed approach.
Using devm_pm_runtime_enable() also fixes the missing disable of
autosuspend.
Update inv_icm42600_disable_vddio_reg() to only disable the regulator if
the device is not suspended i.e. powered-down, preventing unbalanced
disables.
Also remove redundant error msg on regulator_disable(), the regulator
framework already emits an error message when regulator_disable() fails.
This simplifies the PM setup and avoids manipulating the usage counter
unnecessarily.
Fixes: 31c24c1e93 ("iio: imu: inv_icm42600: add core of new inv_icm42600 driver")
Signed-off-by: Sean Nyekjaer <sean@geanix.com>
Link: https://patch.msgid.link/20250901-icm42pmreg-v3-1-ef1336246960@geanix.com
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 284fb19a3f ]
PLL lockup and O_CMN_READY assertion can only happen after common state
machine gets enabled by programming DPHY_CMN_SSM register, but driver was
polling them before the common state machine was enabled which is
incorrect. This is as per the DPHY initialization sequence as mentioned in
J721E TRM [1] at section "12.7.2.4.1.2.1 Start-up Sequence Timing Diagram".
It shows O_CMN_READY polling at the end after common configuration pin
setup where the common configuration pin setup step enables state machine
as referenced in "Table 12-1533. Common Configuration-Related Setup
mentions state machine"
To fix this :
- Add new function callbacks for polling on PLL lock and O_CMN_READY
assertion.
- As state machine and clocks get enabled in power_on callback only, move
the clock related programming part from configure callback to power_on
callback and poll for the PLL lockup and O_CMN_READY assertion after state
machine gets enabled.
- The configure callback only saves the PLL configuration received from the
client driver which will be applied later on in power_on callback.
- Add checks to ensure configure is called before power_on and state
machine is in disabled state before power_on callback is called.
- Disable state machine in power_off so that client driver can re-configure
the PLL by following up a power_off, configure, power_on sequence.
[1]: https://www.ti.com/lit/zip/spruil1
Cc: stable@vger.kernel.org
Fixes: 7a343c8bf4 ("phy: Add Cadence D-PHY support")
Signed-off-by: Devarsh Thakkar <devarsht@ti.com>
Tested-by: Harikrishna Shenoy <h-shenoy@ti.com>
Reviewed-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Link: https://lore.kernel.org/r/20250704125915.1224738-2-devarsht@ti.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e747883c7d ]
When mounting file systems with a log that was dirtied on i386 on
other architectures or vice versa, log recovery is unhappy:
[ 11.068052] XFS (vdb): Torn write (CRC failure) detected at log block 0x2. Truncating head block from 0xc.
This is because the CRCs generated by i386 and other architectures
always diff. The reason for that is that sizeof(struct xlog_rec_header)
returns different values for i386 vs the rest (324 vs 328), because the
struct is not sizeof(uint64_t) aligned, and i386 has odd struct size
alignment rules.
This issue goes back to commit 13cdc853c519 ("Add log versioning, and new
super block field for the log stripe") in the xfs-import tree, which
adds log v2 support and the h_size field that causes the unaligned size.
At that time it only mattered for the crude debug only log header
checksum, but with commit 0e446be448 ("xfs: add CRC checks to the log")
it became a real issue for v5 file system, because now there is a proper
CRC, and regular builds actually expect it match.
Fix this by allowing checksums with and without the padding.
Fixes: 0e446be448 ("xfs: add CRC checks to the log")
Cc: <stable@vger.kernel.org> # v3.8
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0b737f4ac1 ]
old_crc is a very misleading name. Rename it to expected_crc as that
described the usage much better.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
Stable-dep-of: e747883c7d ("xfs: fix log CRC mismatches between i386 and other architectures")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5a869d0177 ]
With TLS enabled, records that are encrypted and appended to TLS TX
list can fail to see a retry if the underlying TCP socket is busy, for
example, hitting an EAGAIN from tcp_sendmsg_locked(). This is not known
to the NVMe TCP driver, as the TLS layer successfully generated a record.
Typically, the TLS write_space() callback would ensure such records are
retried, but in the NVMe TCP Host driver, write_space() invokes
nvme_tcp_write_space(). This causes a partially sent record in the TLS TX
list to timeout after not being retried.
This patch fixes the above by calling queue->write_space(), which calls
into the TLS layer to retry any pending records.
Fixes: be8e82caa6 ("nvme-tcp: enable TLS handshake upcall")
Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0c1999ed33 ]
test_parse_test_list_file writes some data to
/tmp/bpf_arg_parsing_test.XXXXXX and parse_test_list_file() will read
the data back. However, after writing data to that file, we forget to
call fsync() and it's causing testing failure in my laptop. This patch
helps fix it by adding the missing fsync() call.
Fixes: 64276f01dc ("selftests/bpf: Test_progs can read test lists from file")
Signed-off-by: Xing Guo <higuoxing@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251016035330.3217145-1-higuoxing@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit aa4daea418 ]
HID_DG_PEN devices should have a suffix of "Stylus", as pointed out by
commit c0ee1d5716 ("HID: hid-input: Add suffix also for HID_DG_PEN").
However, on multitouch devices, these suffixes may be overridden. Before
that commit, HID_DG_PEN devices would get the "Stylus" suffix, but after
that, multitouch would override them to have an "UNKNOWN" suffix. Just add
HID_DG_PEN to the list of non-overriden suffixes in multitouch.
Before this fix:
[ 0.470981] input: ELAN9008:00 04F3:2E14 UNKNOWN as /devices/pci0000:00/0000:00:15.1/i2c_designware.1/i2c-16/i2c-ELAN9008:00/0018:04F3:2E14.0001/input/input8
ELAN9008:00 04F3:2E14 UNKNOWN
After this fix:
[ 0.474332] input: ELAN9008:00 04F3:2E14 Stylus as /devices/pci0000:00/0000:00:15.1/i2c_designware.1/i2c-16/i2c-ELAN9008:00/0018:04F3:2E14.0001/input/input8
ELAN9008:00 04F3:2E14 Stylus
Fixes: c0ee1d5716 ("HID: hid-input: Add suffix also for HID_DG_PEN")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0187c08058 ]
Commit 581c448476 ("HID: input: map digitizer battery usage") added
handling of battery events for digitizers (typically for batteries
presented in stylii). Digitizers typically report correct battery levels
only when stylus is actively touching the surface, and in other cases
they may report battery level of 0. To avoid confusing consumers of the
battery information the code was added to filer out reports with 0
battery levels.
However there exist other kinds of devices that may legitimately report
0 battery levels. Fix this by filtering out 0-level reports only for
digitizer usages, and continue reporting them for other kinds of devices
(Smart Batteries, etc).
Reported-by: 卢国宏 <luguohong@xiaomi.com>
Fixes: 581c448476 ("HID: input: map digitizer battery usage")
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 28412b489b ]
In try_to_register_card(), the return value of usb_ifnum_to_if() is
passed directly to usb_interface_claimed() without a NULL check, which
will lead to a NULL pointer dereference when creating an invalid
USB audio device. Fix this by adding a check to ensure the interface
pointer is valid before passing it to usb_interface_claimed().
Fixes: 39efc9c8a9 ("ALSA: usb-audio: Fix last interface check for registration")
Closes: https://lore.kernel.org/all/CANypQFYtQxHL5ghREs-BujZG413RPJGnO5TH=xjFBKpPts33tA@mail.gmail.com/
Signed-off-by: Jiaming Zhang <r772577952@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e603a342cf ]
We started getting a crash in BPF CI, which seems to originate from
test_parse_test_list_file() test and is happening at this line:
ASSERT_OK(strcmp("test_with_spaces", set.tests[0].name), "test 0 name");
One way we can crash there is if set.cnt zero, which is checked for with
ASSERT_EQ() above, but we proceed after this regardless of the outcome.
Instead of crashing, we should bail out with test failure early.
Similarly, if parse_test_list_file() fails, we shouldn't be even looking
at set, so bail even earlier if ASSERT_OK() fails.
Fixes: 64276f01dc ("selftests/bpf: Test_progs can read test lists from file")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20251014202037.72922-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 11f08c30a3 ]
Currently, if find_and_map_user_pages() takes a DMA xfer request from the
user with a length field set to 0, or in a rare case, the host receives
QAIC_TRANS_DMA_XFER_CONT from the device where resources->xferred_dma_size
is equal to the requested transaction size, the function will return 0
before allocating an sgt or setting the fields of the dma_xfer struct.
In that case, encode_addr_size_pairs() will try to access the sgt which
will lead to a general protection fault.
Return an EINVAL in case the user provides a zero-sized ALP, or the device
requests continuation after all of the bytes have been transferred.
Fixes: 96d3c1cade ("accel/qaic: Clean up integer overflow checking in map_user_pages()")
Signed-off-by: Youssef Samir <quic_yabdulra@quicinc.com>
Signed-off-by: Youssef Samir <youssef.abdulrahman@oss.qualcomm.com>
Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Reviewed-by: Carl Vanderlip <carl.vanderlip@oss.qualcomm.com>
Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251007122320.339654-1-youssef.abdulrahman@oss.qualcomm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 17e3e88ed0 ]
The check for some lost idle pelt time should be always done when
pick_next_task_fair() fails to pick a task and not only when we call it
from the fair fast-path.
The case happens when the last running task on rq is a RT or DL task. When
the latter goes to sleep and the /Sum of util_sum of the rq is at the max
value, we don't account the lost of idle time whereas we should.
Fixes: 67692435c4 ("sched: Rework pick_next_task() slow-path")
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6917112af2 ]
Remove extra multiplication.
CIK GPUs such as Hawaii appear to use PP_TABLE_V0 in which case
the shutdown temperature is hardcoded in smu7_init_dpm_defaults
and is already multiplied by 1000. The value was mistakenly
multiplied another time by smu7_get_thermal_temperature_range.
Fixes: 4ba082572a ("drm/amd/powerplay: export the thermal ranges of VI asics (V2)")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1676
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 357d90be2c ]
Chips which use the IP discovery firmware loaded by the driver
reported incorrect harvesting information in the ip discovery
table in sysfs because the driver only uses the ip discovery
firmware for populating sysfs and not for direct parsing for the
driver itself as such, the fields that are used to print the
harvesting info in sysfs report incorrect data for some IPs. Populate
the relevant fields for this case as well.
Fixes: 514678da56 ("drm/amdgpu/discovery: fix fw based ip discovery")
Acked-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9e6a5cf1a2 ]
For platforms without an IP discovery table.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stable-dep-of: 357d90be2c ("drm/amdgpu: fix handling of harvesting for ip_discovery firmware")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e8529dbc75 ]
For chips that don't have IP discovery tables.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stable-dep-of: 357d90be2c ("drm/amdgpu: fix handling of harvesting for ip_discovery firmware")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2b4eda7bf7 ]
Stress testing the audio jack hotplug handling on a few Steam Deck units
revealed that the debounce circuit is responsible for having a negative
impact on the detection reliability, e.g. in some cases the ejection
interrupt is not fired, while in other instances it goes into a kind of
invalid state and generates a flood of misleading interrupts.
Add new entries to the DMI table introduced via commit 1bc40efdaf
("ASoC: nau8821: Add DMI quirk mechanism for active-high jack-detect")
and extend the quirk logic to allow bypassing the debounce circuit used
for jack detection on Valve Steam Deck LCD and OLED models.
While at it, rename existing NAU8821_JD_ACTIVE_HIGH quirk bitfield to
NAU8821_QUIRK_JD_ACTIVE_HIGH. This should help improve code readability
by differentiating from similarly named register bits.
Fixes: aab1ad11d6 ("ASoC: nau8821: new driver")
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Link: https://patch.msgid.link/20251003-nau8821-jdet-fixes-v1-4-f7b0e2543f09@collabora.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9273aa85b3 ]
Instead of adding yet another utility function for dealing with the
interrupt clearing register, generalize nau8821_int_status_clear_all()
by renaming it to nau8821_irq_status_clear(), whilst introducing a
second parameter to allow restricting the operation scope to a single
interrupt instead of the whole range of active IRQs.
While at it, also fix a spelling typo in the comment block.
Note this is mainly a prerequisite for subsequent patches aiming to
address some deficiencies in the implementation of the interrupt
handler. Thus the presence of the Fixes tag below is intentional, to
facilitate backporting.
Fixes: aab1ad11d6 ("ASoC: nau8821: new driver")
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Link: https://patch.msgid.link/20251003-nau8821-jdet-fixes-v1-2-f7b0e2543f09@collabora.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6e54919cb5 ]
The microphone detection work scheduled by a prior jack insertion
interrupt may still be in a pending state or under execution when a jack
ejection interrupt has been fired.
This might lead to a racing condition or nau8821_jdet_work() completing
after nau8821_eject_jack(), which will override the currently
disconnected state of the jack and incorrectly report the headphone or
the headset as being connected.
Cancel any pending jdet_work or wait for its execution to finish before
attempting to handle the ejection interrupt.
Proceed similarly before launching the eject handler as a consequence of
detecting an invalid insert interrupt.
Fixes: aab1ad11d6 ("ASoC: nau8821: new driver")
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Link: https://patch.msgid.link/20251003-nau8821-jdet-fixes-v1-1-f7b0e2543f09@collabora.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit db74b04edc ]
There is now a new LT9211 rev. U5, which reports chip ID 0x18 0x01 0xe4 .
The previous LT9211 reported chip ID 0x18 0x01 0xe3 , which is what the
driver checks for right now. Since there is a possibility there will be
yet another revision of the LT9211 in the future, drop the last version
nibble check to allow all future revisions of the chip to work with this
driver.
This fix makes LT9211 rev. U5 work with this driver.
Fixes: 8ce4129e3d ("drm/bridge: lt9211: Add Lontium LT9211 bridge driver")
Signed-off-by: Marek Vasut <marek.vasut@mailbox.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251011110017.12521-1-marek.vasut@mailbox.org
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9e68bd803f ]
When adding a kprobe such as "p:probe/tcp_sendmsg _text+15392192",
arch_check_kprobe would start iterating all instructions starting from
_text until the probed address. Not only is this very inefficient, but
literal values in there (e.g. left by function patching) are
misinterpreted in a way that causes a desync.
Fix this by doing it like x86: start the iteration at the closest
preceding symbol instead of the given starting point.
Fixes: 87f48c7ccc ("riscv: kprobe: Fixup kernel panic when probing an illegal position")
Signed-off-by: Fabian Vogt <fvogt@suse.de>
Signed-off-by: Marvin Friedrich <marvin.friedrich@suse.com>
Acked-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/6191817.lOV4Wx5bFT@fvogt-thinkpad
Signed-off-by: Paul Walmsley <pjw@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bb642e2d30 ]
For queue-depth I/O policy, this patch fixes unbalanced I/Os across
nvme multipaths.
Issue Description:
The RETRY disposition incorrectly increments ns->ctrl->nr_active
counter and reinitializes iostat start-time. In such cases nr_active
counter never goes back to zero until that path disconnects and
reconnects.
Such a path is not chosen for new I/Os if multiple RETRY cases on a given
a path cause its queue-depth counter to be artificially higher compared
to other paths. This leads to unbalanced I/Os across paths.
The patch skips incrementing nr_active if NVME_MPATH_CNT_ACTIVE is already
set. And it skips restarting io stats if NVME_MPATH_IO_STATS is already set.
base-commit: e989a3da2d371a4b6597ee8dee5c72e407b4db7a
Fixes: d4d957b53d ("nvme-multipath: support io stats on the mpath device")
Signed-off-by: Amit Chaudhary <achaudhary@purestorage.com>
Reviewed-by: Randy Jennings <randyj@purestorage.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e07e10ae83 ]
Currently the Panthor driver needs the GPU to be powered down
between suspend and resume. If this is not done, then the
MCU_CONTROL register will be preserved as AUTO, which again will
cause a premature FW boot on resume. The FW will go directly into
fatal state in this case.
This case needs to be handled as there is no guarantee that the
GPU will be powered down after the suspend callback on all platforms.
The fix is to call panthor_fw_stop() in "pre-reset" path to ensure
the MCU_CONTROL register is cleared (set DISABLE). This matches
well with the already existing call to panthor_fw_start() from the
"post-reset" path.
Signed-off-by: Ketil Johnsen <ketil.johnsen@arm.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Fixes: 2718d91816 ("drm/panthor: Add the FW logical block")
Signed-off-by: Steven Price <steven.price@arm.com>
Link: https://lore.kernel.org/r/20251008105112.4077015-1-ketil.johnsen@arm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8d93ff40d4 ]
dev->chipid is used in lan78xx_init_mac_address before it's initialized:
lan78xx_reset() {
lan78xx_init_mac_address()
lan78xx_read_eeprom()
lan78xx_read_raw_eeprom() <- dev->chipid is used here
dev->chipid = ... <- dev->chipid is initialized correctly here
}
Reorder initialization so that dev->chipid is set before calling
lan78xx_init_mac_address().
Fixes: a0db7d10b7 ("lan78xx: Add to handle mux control per chip id")
Signed-off-by: I Viswanath <viswanathiyyappan@gmail.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Khalid Aziz <khalid@kernel.org>
Link: https://patch.msgid.link/20251013181648.35153-1-viswanathiyyappan@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1a8fed52f7 ]
Bringing a linked netdevsim device down and then up causes communication
failure because both interfaces lack carrier. Basically a ifdown/ifup on
the interface make the link broken.
Commit 3762ec05a9 ("netdevsim: add NAPI support") added supported
for NAPI, calling netif_carrier_off() in nsim_stop(). This patch
re-enables the carrier symmetrically on nsim_open(), in case the device
is linked and the peer is up.
Signed-off-by: Breno Leitao <leitao@debian.org>
Fixes: 3762ec05a9 ("netdevsim: add NAPI support")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251014-netdevsim_fix-v2-1-53b40590dae1@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7f846c65ca ]
With async crypto, we rely on tx_work to actually transmit records
once encryption completes. But while send() is running, both the
tx_lock and socket lock are held, so tx_work_handler cannot process
the queue of encrypted records, and simply reschedules itself. During
a large send(), this could last a long time, and use a lot of memory.
Transmit any pending encrypted records before restarting the main
loop of tls_sw_sendmsg_locked.
Fixes: a42055e8d2 ("net/tls: Add support for async encryption of records for performance")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/8396631478f70454b44afb98352237d33f48d34d.1760432043.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b8a6ff84ab ]
Async decryption calls tls_strp_msg_hold to create a clone of the
input skb to hold references to the memory it uses. If we fail to
allocate that clone, proceeding with async decryption can lead to
various issues (UAF on the skb, writing into userspace memory after
the recv() call has returned).
In this case, wait for all pending decryption requests.
Fixes: 84c61fe1a7 ("tls: rx: do not use the standard strparser")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/b9fe61dcc07dab15da9b35cf4c7d86382a98caf2.1760432043.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b6fe4c29bb ]
When userspace wants to send a non-DATA record (via the
TLS_SET_RECORD_TYPE cmsg), we need to send any pending data from a
previous MSG_MORE send() as a separate DATA record. If that DATA record
is encrypted asynchronously, tls_handle_open_record will return
-EINPROGRESS. This is currently treated as an error by
tls_process_cmsg, and it will skip setting record_type to the correct
value, but the caller (tls_sw_sendmsg_locked) handles that return
value correctly and proceeds with sending the new message with an
incorrect record_type (DATA instead of whatever was requested in the
cmsg).
Always set record_type before handling the open record. If
tls_handle_open_record returns an error, record_type will be
ignored. If it succeeds, whether with synchronous crypto (returning 0)
or asynchronous (returning -EINPROGRESS), the caller will proceed
correctly.
Fixes: a42055e8d2 ("net/tls: Add support for async encryption of records for performance")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/0457252e578a10a94e40c72ba6288b3a64f31662.1760432043.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b014a4e066 ]
If we hit an error during the main loop of tls_sw_sendmsg_locked (eg
failed allocation), we jump to send_end and immediately
return. Previous iterations may have queued async encryption requests
that are still pending. We should wait for those before returning, as
we could otherwise be reading from memory that userspace believes
we're not using anymore, which would be a sort of use-after-free.
This is similar to what tls_sw_recvmsg already does: failures during
the main loop jump to the "wait for async" code, not straight to the
unlock/return.
Fixes: a42055e8d2 ("net/tls: Add support for async encryption of records for performance")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/c793efe9673b87f808d84fdefc0f732217030c52.1760432043.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ce5af41e32 ]
During tls_sw_sendmsg_locked, we pre-allocate the encrypted message
for the size we're expecting to send during the current iteration, but
we may end up sending less, for example when splicing: if we're
getting the data from small fragments of memory, we may fill up all
the slots in the skmsg with less data than expected.
In this case, we need to trim the encrypted message to only the length
we actually need, to avoid pushing uninitialized bytes down the
underlying TCP socket.
Fixes: fe1e81d4f7 ("tls/sw: Support MSG_SPLICE_PAGES")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/66a0ae99c9efc15f88e9e56c1f58f902f442ce86.1760432043.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 88f170814f ]
Since commit 305853cce3 ("ksmbd: Fix race condition in RPC handle list
access"), ksmbd_session_rpc_method() attempts to lock sess->rpc_lock.
This causes hung connections / tasks when a client attempts to open
a named pipe. Using Samba's rpcclient tool:
$ rpcclient //192.168.1.254 -U user%password
$ rpcclient $> srvinfo
<connection hung here>
Kernel side:
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/0:0 state:D stack:0 pid:5021 tgid:5021 ppid:2 flags:0x00200000
Workqueue: ksmbd-io handle_ksmbd_work
Call trace:
__schedule from schedule+0x3c/0x58
schedule from schedule_preempt_disabled+0xc/0x10
schedule_preempt_disabled from rwsem_down_read_slowpath+0x1b0/0x1d8
rwsem_down_read_slowpath from down_read+0x28/0x30
down_read from ksmbd_session_rpc_method+0x18/0x3c
ksmbd_session_rpc_method from ksmbd_rpc_open+0x34/0x68
ksmbd_rpc_open from ksmbd_session_rpc_open+0x194/0x228
ksmbd_session_rpc_open from create_smb2_pipe+0x8c/0x2c8
create_smb2_pipe from smb2_open+0x10c/0x27ac
smb2_open from handle_ksmbd_work+0x238/0x3dc
handle_ksmbd_work from process_scheduled_works+0x160/0x25c
process_scheduled_works from worker_thread+0x16c/0x1e8
worker_thread from kthread+0xa8/0xb8
kthread from ret_from_fork+0x14/0x38
Exception stack(0x8529ffb0 to 0x8529fff8)
The task deadlocks because the lock is already held:
ksmbd_session_rpc_open
down_write(&sess->rpc_lock)
ksmbd_rpc_open
ksmbd_session_rpc_method
down_read(&sess->rpc_lock) <-- deadlock
Adjust ksmbd_session_rpc_method() callers to take the lock when necessary.
Fixes: 305853cce3 ("ksmbd: Fix race condition in RPC handle list access")
Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 295ce1eb36 ]
Neal reported that using neper tcp_stream with TCP_TX_DELAY
set to 50ms would often lead to flows stuck in a small cwnd mode,
regardless of the congestion control.
While tcp_stream sets TCP_TX_DELAY too late after the connect(),
it highlighted two kernel bugs.
The following heuristic in tcp_tso_should_defer() seems wrong
for large RTT:
delta = tp->tcp_clock_cache - head->tstamp;
/* If next ACK is likely to come too late (half srtt), do not defer */
if ((s64)(delta - (u64)NSEC_PER_USEC * (tp->srtt_us >> 4)) < 0)
goto send_now;
If next ACK is expected to come in more than 1 ms, we should
not defer because we prefer a smooth ACK clocking.
While blamed commit was a step in the good direction, it was not
generic enough.
Another patch fixing TCP_TX_DELAY for established flows
will be proposed when net-next reopens.
Fixes: 50c8339e92 ("tcp: tso: restore IW10 after TSO autosizing")
Reported-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20251011115742.1245771-1-edumazet@google.com
[pabeni@redhat.com: fixed whitespace issue]
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2616222e42 ]
During interface toggle operations (ifdown/ifup), the driver currently
resets the local helper variable 'phy_link' to -1. This causes the link
state machine to incorrectly interpret the state as a link change event,
resulting in spurious "Link is down" messages being logged when the
interface is brought back up.
Preserve the phy_link state across interface toggles to avoid treating
the -1 sentinel value as a legitimate link state transition.
Fixes: 88131a812b ("amd-xgbe: Perform phy connect/disconnect at dev open/stop")
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Link: https://patch.msgid.link/20251010065142.1189310-1-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 21f4d45eba ]
Similarly to ipv4 tunnel, ipv6 version updates dev->needed_headroom, too.
While ipv4 tunnel headroom adjustment growth was limited in
commit 5ae1e9922b ("net: ip_tunnel: prevent perpetual headroom growth"),
ipv6 tunnel yet increases the headroom without any ceiling.
Reflect ipv4 tunnel headroom adjustment limit on ipv6 version.
Credits to Francesco Ruggeri, who was originally debugging this issue
and wrote local Arista-specific patch and a reproducer.
Fixes: 8eb30be035 ("ipv6: Create ip6_tnl_xmit")
Cc: Florian Westphal <fw@strlen.de>
Cc: Francesco Ruggeri <fruggeri05@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Link: https://patch.msgid.link/20251009-ip6_tunnel-headroom-v2-1-8e4dbd8f7e35@arista.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 70f92ab970 ]
After resume from S4 (hibernate), RTL8168H/RTL8111H truncates incoming
packets. Packet captures show messages like "IP truncated-ip - 146 bytes
missing!".
The issue is caused by RxConfig not being properly re-initialized after
resume. Re-initializing the RxConfig register before the chip
re-initialization sequence avoids the truncation and restores correct
packet reception.
This follows the same pattern as commit ef9da46dde ("r8169: fix data
corruption issue on RTL8402").
Fixes: 6e1d0b8988 ("r8169:add support for RTL8168H and RTL8107E")
Signed-off-by: Linmao Li <lilinmao@kylinos.cn>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/20251009122549.3955845-1-lilinmao@kylinos.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 65946eac6d ]
There is no error handling for `dma_map_single()` failures.
Add error handling by checking `dma_mapping_error()` and freeing
the `skb` using `dev_kfree_skb()` (process context) when it fails.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Yeounsu Moon <yyyynoom@gmail.com>
Tested-on: D-Link DGE-550T Rev-A3
Suggested-by: Simon Horman <horms@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a9e30a22d6 ]
A suspend/resume cycle on a down interface results in the interface
coming up in Error Active state. A suspend/resume cycle on an Up
interface will always result in Error Active state, regardless of the
actual CAN state.
During suspend, only set running interfaces to CAN_STATE_SLEEPING.
During resume only touch the CAN state of running interfaces. For
wakeup sources, set the CAN state depending on the Protocol Status
Regitser (PSR), for non wakeup source interfaces m_can_start() will do
the same.
Fixes: e0d1f4816f ("can: m_can: add Bosch M_CAN controller support")
Reviewed-by: Markus Schneider-Pargmann <msp@baylibre.com>
Link: https://patch.msgid.link/20250929-m_can-fix-state-handling-v4-4-682b49b49d9a@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4942c42fe1 ]
In some SoCs (observed on the STM32MP15) the M_CAN IP core keeps the
CAN state and CAN error counters over an internal reset cycle. An
external reset is not always possible, due to the shared reset with
the other CAN core. This caused the core not always be in Error Active
state when bringing up the controller.
Instead of always setting the CAN state to Error Active in
m_can_chip_config(), fix this by reading and decoding the Protocol
Status Regitser (PSR) and set the CAN state accordingly.
Fixes: e0d1f4816f ("can: m_can: add Bosch M_CAN controller support")
Reviewed-by: Markus Schneider-Pargmann <msp@baylibre.com>
Link: https://patch.msgid.link/20250929-m_can-fix-state-handling-v4-3-682b49b49d9a@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3d9db29b45 ]
The CAN Error State is determined by the receive and transmit error
counters. The CAN error counters decrease when reception/transmission
is successful, so that a status transition back to the Error Active
status is possible. This transition is not handled by
m_can_handle_state_errors().
Add the missing detection of the Error Active state to
m_can_handle_state_errors() and extend the handling of this state in
m_can_handle_state_change().
Fixes: e0d1f4816f ("can: m_can: add Bosch M_CAN controller support")
Fixes: cd0d83eab2 ("can: m_can: m_can_handle_state_change(): fix state change")
Reviewed-by: Markus Schneider-Pargmann <msp@baylibre.com>
Link: https://patch.msgid.link/20250929-m_can-fix-state-handling-v4-2-682b49b49d9a@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Stable-dep-of: 4942c42fe1 ("can: m_can: m_can_chip_config(): bring up interface in correct state")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ba569fb07a ]
Commit 227619c3ff ("can: m_can: move runtime PM enable/disable to
m_can_platform") moved the PM runtime enable from the m_can core
driver into the m_can_platform.
That patch forgot to move the pm_runtime_disable() to
m_can_plat_remove(), so that unloading the m_can_platform driver
causes an "Unbalanced pm_runtime_enable!" error message.
Add the missing pm_runtime_disable() to m_can_plat_remove() to fix the
problem.
Cc: Patrik Flykt <patrik.flykt@linux.intel.com>
Fixes: 227619c3ff ("can: m_can: move runtime PM enable/disable to m_can_platform")
Reviewed-by: Markus Schneider-Pargmann <msp@baylibre.com>
Link: https://patch.msgid.link/20250929-m_can-fix-state-handling-v4-1-682b49b49d9a@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 154d1e7ad9 ]
The commit 168316db3583("dax: assert that i_rwsem is held
exclusive for writes") added lock assertions to ensure proper
locking in DAX operations. However, these assertions trigger
false-positive lockdep warnings since read lock is unnecessary
on read-only filesystems(e.g., erofs).
This patch skips the read lock assertion for read-only filesystems,
eliminating the spurious warnings while maintaining the integrity
checks for writable filesystems.
Fixes: 168316db35 ("dax: assert that i_rwsem is held exclusive for writes")
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Friendy Su <friendy.su@sony.com>
Reviewed-by: Daniel Palmer <daniel.palmer@sony.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 46f781e0d1 upstream.
The sticky fingers quirk (MT_QUIRK_STICKY_FINGERS) was only considering
the case when slots were not released during the last report.
This can be problematic if the firmware forgets to release a finger
while others are still present.
This was observed on the Synaptics DLL0945 touchpad found on the Dell
XPS 9310 and the Dell Inspiron 5406.
Fixes: 4f4001bc76 ("HID: multitouch: fix rare Win 8 cases when the touch up event gets missing")
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 927069c4ac upstream.
This reverts commit 90bfb28d5f.
Kevin reports that this commit causes an issue for him with LVM
snapshots, most likely because of turning off NOWAIT support while a
snapshot is being created. This makes -EOPNOTSUPP bubble back through
the completion handler, where io_uring read/write handling should just
retry it.
Reinstate the previous check removed by the referenced commit.
Cc: stable@vger.kernel.org
Fixes: 90bfb28d5f ("io_uring/rw: drop -EOPNOTSUPP check in __io_complete_rw_common()")
Reported-by: Salvatore Bonaccorso <carnil@debian.org>
Reported-by: Kevin Lumik <kevin@xf.ee>
Link: https://lore.kernel.org/io-uring/cceb723c-051b-4de2-9a4c-4aa82e1619ee@kernel.dk/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f965d111e6 ]
If cppc_get_transition_latency() returns CPUFREQ_ETERNAL to indicate a
failure to retrieve the transition latency value from the platform
firmware, the CPPC cpufreq driver will use that value (converted to
microseconds) as the policy transition delay, but it is way too large
for any practical use.
Address this by making the driver use the cpufreq's default
transition latency value (in microseconds) as the transition delay
if CPUFREQ_ETERNAL is returned by cppc_get_transition_latency().
Fixes: d4f3388afd ("cpufreq / CPPC: Set platform specific transition_delay_us")
Cc: 5.19+ <stable@vger.kernel.org> # 5.19
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Reviewed-by: Jie Zhan <zhanjie9@hisilicon.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Qais Yousef <qyousef@layalina.io>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 47b2116e54 ]
After an bind/unbind cycle, the acm->notify_req is left stale. If a
subsequent bind fails, the unified error label attempts to free this
stale request, leading to a NULL pointer dereference when accessing
ep->ops->free_request.
Refactor the error handling in the bind path to use the __free()
automatic cleanup mechanism.
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
Call trace:
usb_ep_free_request+0x2c/0xec
gs_free_req+0x30/0x44
acm_bind+0x1b8/0x1f4
usb_add_function+0xcc/0x1f0
configfs_composite_bind+0x468/0x588
gadget_bind_driver+0x104/0x270
really_probe+0x190/0x374
__driver_probe_device+0xa0/0x12c
driver_probe_device+0x3c/0x218
__device_attach_driver+0x14c/0x188
bus_for_each_drv+0x10c/0x168
__device_attach+0xfc/0x198
device_initial_probe+0x14/0x24
bus_probe_device+0x94/0x11c
device_add+0x268/0x48c
usb_add_gadget+0x198/0x28c
dwc3_gadget_init+0x700/0x858
__dwc3_set_mode+0x3cc/0x664
process_scheduled_works+0x1d8/0x488
worker_thread+0x244/0x334
kthread+0x114/0x1bc
ret_from_fork+0x10/0x20
Fixes: 1f1ba11b64 ("usb gadget: issue notifications from ACM function")
Cc: stable@kernel.org
Signed-off-by: Kuen-Han Tsai <khtsai@google.com>
Link: https://lore.kernel.org/r/20250916-ready-v1-4-4997bf277548@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250916-ready-v1-4-4997bf277548@google.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 75a5b8d4dd ]
After an bind/unbind cycle, the ncm->notify_req is left stale. If a
subsequent bind fails, the unified error label attempts to free this
stale request, leading to a NULL pointer dereference when accessing
ep->ops->free_request.
Refactor the error handling in the bind path to use the __free()
automatic cleanup mechanism.
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
Call trace:
usb_ep_free_request+0x2c/0xec
ncm_bind+0x39c/0x3dc
usb_add_function+0xcc/0x1f0
configfs_composite_bind+0x468/0x588
gadget_bind_driver+0x104/0x270
really_probe+0x190/0x374
__driver_probe_device+0xa0/0x12c
driver_probe_device+0x3c/0x218
__device_attach_driver+0x14c/0x188
bus_for_each_drv+0x10c/0x168
__device_attach+0xfc/0x198
device_initial_probe+0x14/0x24
bus_probe_device+0x94/0x11c
device_add+0x268/0x48c
usb_add_gadget+0x198/0x28c
dwc3_gadget_init+0x700/0x858
__dwc3_set_mode+0x3cc/0x664
process_scheduled_works+0x1d8/0x488
worker_thread+0x244/0x334
kthread+0x114/0x1bc
ret_from_fork+0x10/0x20
Fixes: 9f6ce4240a ("usb: gadget: f_ncm.c added")
Cc: stable@kernel.org
Signed-off-by: Kuen-Han Tsai <khtsai@google.com>
Link: https://lore.kernel.org/r/20250916-ready-v1-3-4997bf277548@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250916-ready-v1-3-4997bf277548@google.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bfb1d99d96 ]
Gadget function drivers often have goto-based error handling in their
bind paths, which can be bug-prone. Refactoring these paths to use
__free() scope-based cleanup is desirable, but currently blocked.
The blocker is that usb_ep_free_request(ep, req) requires two
parameters, while the __free() mechanism can only pass a pointer to the
request itself.
Store an endpoint pointer in the struct usb_request. The pointer is
populated centrally in usb_ep_alloc_request() on every successful
allocation, making the request object self-contained.
Signed-off-by: Kuen-Han Tsai <khtsai@google.com>
Link: https://lore.kernel.org/r/20250916-ready-v1-1-4997bf277548@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250916-ready-v1-1-4997bf277548@google.com
Stable-dep-of: 75a5b8d4dd ("usb: gadget: f_ncm: Refactor bind path to use __free()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e1361a4f1b ]
Condition guards are found to be redundant, as the call flow is properly
managed now, as also observed in the Exynos5433 DECON driver. Since
state checking is no longer necessary, remove it.
This also fixes an issue which prevented decon_commit() from
decon_atomic_enable() due to an incorrect state change setting.
Fixes: 96976c3d9a ("drm/exynos: Add DECON driver")
Cc: stable@vger.kernel.org
Suggested-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Kaustabh Chakraborty <kauschluss@disroot.org>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5f1a453974 ]
The DECON channels are not cleared properly as the windows aren't
shadow protected. When accompanied with an IOMMU, it pagefaults, and
the kernel panics.
Implement shadow protect/unprotect, along with a standalone update,
for channel clearing to properly take effect.
Signed-off-by: Kaustabh Chakraborty <kauschluss@disroot.org>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Stable-dep-of: e1361a4f1b ("drm/exynos: exynos7_drm_decon: remove ctx->suspended")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f248d5d515 ]
Since the PDC resides out of the GPU subsystem and cannot be reset in
case it enters bad state, utmost care must be taken to trigger the PDC
wake/sleep routines in the correct order.
The PDC wake sequence can be exercised only after a PDC sleep sequence.
Additionally, GMU firmware should initialize a few registers before the
KMD can trigger a PDC sleep sequence. So PDC sleep can't be done if the
GMU firmware has not initialized. Track these dependencies using a new
status variable and trigger PDC sleep/wake sequences appropriately.
Cc: stable@vger.kernel.org
Fixes: 4b565ca5a2 ("drm/msm: Add A6XX device support")
Signed-off-by: Akhil P Oommen <akhilpo@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/673362/
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
[ Adjust context ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fa4f4bae89 upstream.
Some file systems like FUSE-based ones or overlayfs may record the backing
file in struct vm_area_struct vm_file, instead of the user file that the
user mmapped.
That causes perf to misreport the device major/minor numbers of the file
system of the file, and the generation of the file, and potentially other
inode details. There is an existing helper file_user_inode() for that
situation.
Use file_user_inode() instead of file_inode() to get the inode for MMAP2
events.
Example:
Setup:
# cd /root
# mkdir test ; cd test ; mkdir lower upper work merged
# cp `which cat` lower
# mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work merged
# perf record -e cycles:u -- /root/test/merged/cat /proc/self/maps
...
55b2c91d0000-55b2c926b000 r-xp 00018000 00:1a 3419 /root/test/merged/cat
...
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.004 MB perf.data (5 samples) ]
#
# stat /root/test/merged/cat
File: /root/test/merged/cat
Size: 1127792 Blocks: 2208 IO Block: 4096 regular file
Device: 0,26 Inode: 3419 Links: 1
Access: (0755/-rwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2025-09-08 12:23:59.453309624 +0000
Modify: 2025-09-08 12:23:59.454309624 +0000
Change: 2025-09-08 12:23:59.454309624 +0000
Birth: 2025-09-08 12:23:59.453309624 +0000
Before:
Device reported 00:02 differs from stat output and /proc/self/maps
# perf script --show-mmap-events | grep /root/test/merged/cat
cat 377 [-01] 243.078558: PERF_RECORD_MMAP2 377/377: [0x55b2c91d0000(0x9b000) @ 0x18000 00:02 3419 2068525940]: r-xp /root/test/merged/cat
After:
Device reported 00:1a is the same as stat output and /proc/self/maps
# perf script --show-mmap-events | grep /root/test/merged/cat
cat 362 [-01] 127.755167: PERF_RECORD_MMAP2 362/362: [0x55ba6e781000(0x9b000) @ 0x18000 00:1a 3419 0]: r-xp /root/test/merged/cat
With respect to stable kernels, overlayfs mmap function ovl_mmap() was
added in v4.19 but file_user_inode() was not added until v6.8 and never
back-ported to stable kernels. FMODE_BACKING that it depends on was added
in v6.5. This issue has gone largely unnoticed, so back-porting before
v6.8 is probably not worth it, so put 6.8 as the stable kernel prerequisite
version, although in practice the next long term kernel is 6.12.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Amir Goldstein <amir73il@gmail.com>
Cc: stable@vger.kernel.org # 6.8
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8818f507a9 upstream.
Some file systems like FUSE-based ones or overlayfs may record the backing
file in struct vm_area_struct vm_file, instead of the user file that the
user mmapped.
Since commit def3ae83da ("fs: store real path instead of fake path in
backing file f_path"), file_path() no longer returns the user file path
when applied to a backing file. There is an existing helper
file_user_path() for that situation.
Use file_user_path() instead of file_path() to get the path for MMAP
and MMAP2 events.
Example:
Setup:
# cd /root
# mkdir test ; cd test ; mkdir lower upper work merged
# cp `which cat` lower
# mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work merged
# perf record -e intel_pt//u -- /root/test/merged/cat /proc/self/maps
...
55b0ba399000-55b0ba434000 r-xp 00018000 00:1a 3419 /root/test/merged/cat
...
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.060 MB perf.data ]
#
Before:
File name is wrong (/cat), so decoding fails:
# perf script --no-itrace --show-mmap-events
cat 367 [016] 100.491492: PERF_RECORD_MMAP2 367/367: [0x55b0ba399000(0x9b000) @ 0x18000 00:02 3419 489959280]: r-xp /cat
...
# perf script --itrace=e | wc -l
Warning:
19 instruction trace errors
19
#
After:
File name is correct (/root/test/merged/cat), so decoding is ok:
# perf script --no-itrace --show-mmap-events
cat 364 [016] 72.153006: PERF_RECORD_MMAP2 364/364: [0x55ce4003d000(0x9b000) @ 0x18000 00:02 3419 3132534314]: r-xp /root/test/merged/cat
# perf script --itrace=e
# perf script --itrace=e | wc -l
0
#
Fixes: def3ae83da ("fs: store real path instead of fake path in backing file f_path")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Amir Goldstein <amir73il@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ebfc8542ad upstream.
It was reported that Intel PT address filters do not work in Docker
containers. That relates to the use of overlayfs.
overlayfs records the backing file in struct vm_area_struct vm_file,
instead of the user file that the user mmapped. In order for an address
filter to match, it must compare to the user file inode. There is an
existing helper file_user_inode() for that situation.
Use file_user_inode() instead of file_inode() to get the inode for address
filter matching.
Example:
Setup:
# cd /root
# mkdir test ; cd test ; mkdir lower upper work merged
# cp `which cat` lower
# mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work merged
# perf record --buildid-mmap -e intel_pt//u --filter 'filter * @ /root/test/merged/cat' -- /root/test/merged/cat /proc/self/maps
...
55d61d246000-55d61d2e1000 r-xp 00018000 00:1a 3418 /root/test/merged/cat
...
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.015 MB perf.data ]
# perf buildid-cache --add /root/test/merged/cat
Before:
Address filter does not match so there are no control flow packets
# perf script --itrace=e
# perf script --itrace=b | wc -l
0
# perf script -D | grep 'TIP.PGE' | wc -l
0
#
After:
Address filter does match so there are control flow packets
# perf script --itrace=e
# perf script --itrace=b | wc -l
235
# perf script -D | grep 'TIP.PGE' | wc -l
57
#
With respect to stable kernels, overlayfs mmap function ovl_mmap() was
added in v4.19 but file_user_inode() was not added until v6.8 and never
back-ported to stable kernels. FMODE_BACKING that it depends on was added
in v6.5. This issue has gone largely unnoticed, so back-porting before
v6.8 is probably not worth it, so put 6.8 as the stable kernel prerequisite
version, although in practice the next long term kernel is 6.12.
Closes: https://lore.kernel.org/linux-perf-users/aBCwoq7w8ohBRQCh@fremen.lan
Reported-by: Edd Barrett <edd@theunixzoo.co.uk>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Amir Goldstein <amir73il@gmail.com>
Cc: stable@vger.kernel.org # 6.8
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6df8e84aa6 upstream.
The atomic variable vm_fault_info_updated is used to synchronize access to
adev->gmc.vm_fault_info between the interrupt handler and
get_vm_fault_info().
The default atomic functions like atomic_set() and atomic_read() do not
provide memory barriers. This allows for CPU instruction reordering,
meaning the memory accesses to vm_fault_info and the vm_fault_info_updated
flag are not guaranteed to occur in the intended order. This creates a
race condition that can lead to inconsistent or stale data being used.
The previous implementation, which used an explicit mb(), was incomplete
and inefficient. It failed to account for all potential CPU reorderings,
such as the access of vm_fault_info being reordered before the atomic_read
of the flag. This approach is also more verbose and less performant than
using the proper atomic functions with acquire/release semantics.
Fix this by switching to atomic_set_release() and atomic_read_acquire().
These functions provide the necessary acquire and release semantics,
which act as memory barriers to ensure the correct order of operations.
It is also more efficient and idiomatic than using explicit full memory
barriers.
Fixes: b97dfa27ef ("drm/amdgpu: save vm fault information for amdkfd")
Cc: stable@vger.kernel.org
Signed-off-by: Gui-Dong Han <hanguidong02@gmail.com>
Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5801e65206 upstream.
When adding dependencies with drm_sched_job_add_dependency(), that
function consumes the fence reference both on success and failure, so in
the latter case the dma_fence_put() on the error path (xarray failed to
expand) is a double free.
Interestingly this bug appears to have been present ever since
commit ebd5f74255 ("drm/sched: Add dependency tracking"), since the code
back then looked like this:
drm_sched_job_add_implicit_dependencies():
...
for (i = 0; i < fence_count; i++) {
ret = drm_sched_job_add_dependency(job, fences[i]);
if (ret)
break;
}
for (; i < fence_count; i++)
dma_fence_put(fences[i]);
Which means for the failing 'i' the dma_fence_put was already a double
free. Possibly there were no users at that time, or the test cases were
insufficient to hit it.
The bug was then only noticed and fixed after
commit 9c2ba26535 ("drm/scheduler: use new iterator in drm_sched_job_add_implicit_dependencies v2")
landed, with its fixup of
commit 4eaf02d607 ("drm/scheduler: fix drm_sched_job_add_implicit_dependencies").
At that point it was a slightly different flavour of a double free, which
commit 963d0b3569 ("drm/scheduler: fix drm_sched_job_add_implicit_dependencies harder")
noticed and attempted to fix.
But it only moved the double free from happening inside the
drm_sched_job_add_dependency(), when releasing the reference not yet
obtained, to the caller, when releasing the reference already released by
the former in the failure case.
As such it is not easy to identify the right target for the fixes tag so
lets keep it simple and just continue the chain.
While fixing we also improve the comment and explain the reason for taking
the reference and not dropping it.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Fixes: 963d0b3569 ("drm/scheduler: fix drm_sched_job_add_implicit_dependencies harder")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/dri-devel/aNFbXq8OeYl3QSdm@stanley.mountain/
Cc: Christian König <christian.koenig@amd.com>
Cc: Rob Clark <robdclark@chromium.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Philipp Stanner <phasta@kernel.org>
Cc: Christian König <ckoenig.leichtzumerken@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Cc: stable@vger.kernel.org # v5.16+
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Link: https://lore.kernel.org/r/20251015084015.6273-1-tvrtko.ursulin@igalia.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6447b0e355 upstream.
Malicious SMB server can send invalid reply to FSCTL_DFS_GET_REFERRALS
- reply smaller than sizeof(struct get_dfs_referral_rsp)
- reply with number of referrals smaller than NumberOfReferrals in the
header
Processing of such replies will cause oob.
Return -EINVAL error on such replies to prevent oob-s.
Signed-off-by: Eugene Korenevsky <ekorenevsky@aliyun.com>
Cc: stable@vger.kernel.org
Suggested-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a5a51bf4e9 upstream.
Currently, when building a free space tree at populate_free_space_tree(),
if we are not using the block group tree feature, we always expect to find
block group items (either extent items or a block group item with key type
BTRFS_BLOCK_GROUP_ITEM_KEY) when we search the extent tree with
btrfs_search_slot_for_read(), so we assert that we found an item. However
this expectation is wrong since we can have a new block group created in
the current transaction which is still empty and for which we still have
not added the block group's item to the extent tree, in which case we do
not have any items in the extent tree associated to the block group.
The insertion of a new block group's block group item in the extent tree
happens at btrfs_create_pending_block_groups() when it calls the helper
insert_block_group_item(). This typically is done when a transaction
handle is released, committed or when running delayed refs (either as
part of a transaction commit or when serving tickets for space reservation
if we are low on free space).
So remove the assertion at populate_free_space_tree() even when the block
group tree feature is not enabled and update the comment to mention this
case.
Syzbot reported this with the following stack trace:
BTRFS info (device loop3 state M): rebuilding free space tree
assertion failed: ret == 0 :: 0, in fs/btrfs/free-space-tree.c:1115
------------[ cut here ]------------
kernel BUG at fs/btrfs/free-space-tree.c:1115!
Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 1 UID: 0 PID: 6352 Comm: syz.3.25 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/18/2025
RIP: 0010:populate_free_space_tree+0x700/0x710 fs/btrfs/free-space-tree.c:1115
Code: ff ff e8 d3 (...)
RSP: 0018:ffffc9000430f780 EFLAGS: 00010246
RAX: 0000000000000043 RBX: ffff88805b709630 RCX: fea61d0e2e79d000
RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
RBP: ffffc9000430f8b0 R08: ffffc9000430f4a7 R09: 1ffff92000861e94
R10: dffffc0000000000 R11: fffff52000861e95 R12: 0000000000000001
R13: 1ffff92000861f00 R14: dffffc0000000000 R15: 0000000000000000
FS: 00007f424d9fe6c0(0000) GS:ffff888125afc000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fd78ad212c0 CR3: 0000000076d68000 CR4: 00000000003526f0
Call Trace:
<TASK>
btrfs_rebuild_free_space_tree+0x1ba/0x6d0 fs/btrfs/free-space-tree.c:1364
btrfs_start_pre_rw_mount+0x128f/0x1bf0 fs/btrfs/disk-io.c:3062
btrfs_remount_rw fs/btrfs/super.c:1334 [inline]
btrfs_reconfigure+0xaed/0x2160 fs/btrfs/super.c:1559
reconfigure_super+0x227/0x890 fs/super.c:1076
do_remount fs/namespace.c:3279 [inline]
path_mount+0xd1a/0xfe0 fs/namespace.c:4027
do_mount fs/namespace.c:4048 [inline]
__do_sys_mount fs/namespace.c:4236 [inline]
__se_sys_mount+0x313/0x410 fs/namespace.c:4213
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f424e39066a
Code: d8 64 89 02 (...)
RSP: 002b:00007f424d9fde68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00007f424d9fdef0 RCX: 00007f424e39066a
RDX: 0000200000000180 RSI: 0000200000000380 RDI: 0000000000000000
RBP: 0000200000000180 R08: 00007f424d9fdef0 R09: 0000000000000020
R10: 0000000000000020 R11: 0000000000000246 R12: 0000200000000380
R13: 00007f424d9fdeb0 R14: 0000000000000000 R15: 00002000000002c0
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
Reported-by: syzbot+884dc4621377ba579a6f@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/68dc3dab.a00a0220.102ee.004e.GAE@google.com/
Fixes: a5ed918285 ("Btrfs: implement the free space B-tree")
CC: <stable@vger.kernel.org> # 6.1.x: 1961d20f6f: btrfs: fix assertion when building free space tree
CC: <stable@vger.kernel.org> # 6.1.x
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fec9b9d3ce upstream.
At the end of btrfs_load_block_group_zone_info() the first thing we do
is to ensure that if the mapping type is not a SINGLE one and there is
no RAID stripe tree, then we return early with an error.
Doing that, though, prevents the code from running the last calls from
this function which are about freeing memory allocated during its
run. Hence, in this case, instead of returning early, we set the ret
value and fall through the rest of the cleanup code.
Fixes: 5906333cc4 ("btrfs: zoned: don't skip block group profile checks on conventional zones")
CC: stable@vger.kernel.org # 6.8+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Miquel Sabaté Solà <mssola@mssola.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8ab2fa6969 upstream.
The intent of btrfs_readahead_expand() was to expand to the length of
the current compressed extent being read. However, "ram_bytes" is *not*
that, in the case where a single physical compressed extent is used for
multiple file extents.
Consider this case with a large compressed extent C and then later two
non-compressed extents N1 and N2 written over C, leaving C1 and C2
pointing to offset/len pairs of C:
[ C ]
[ N1 ][ C1 ][ N2 ][ C2 ]
In such a case, ram_bytes for both C1 and C2 is the full uncompressed
length of C. So starting readahead in C1 will expand the readahead past
the end of C1, past N2, and into C2. This will then expand readahead
again, to C2_start + ram_bytes, way past EOF. First of all, this is
totally undesirable, we don't want to read the whole file in arbitrary
chunks of the large underlying extent if it happens to exist. Secondly,
it results in zeroing the range past the end of C2 up to ram_bytes. This
is particularly unpleasant with fs-verity as it can zero and set
uptodate pages in the verity virtual space past EOF. This incorrect
readahead behavior can lead to verity verification errors, if we iterate
in a way that happens to do the wrong readahead.
Fix this by using em->len for readahead expansion, not em->ram_bytes,
resulting in the expected behavior of stopping readahead at the extent
boundary.
Reported-by: Max Chernoff <git@maxchernoff.ca>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2399898
Fixes: 9e9ff875e4 ("btrfs: use readahead_expand() on compressed extents")
CC: stable@vger.kernel.org # 6.17
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 53a4acbfc1 upstream.
On 'btrfs_ioctl_qgroup_assign' we first duplicate the argument as
provided by the user, which is kfree'd in the end. But this was not the
case when allocating memory for 'prealloc'. In this case, if it somehow
failed, then the previous code would go directly into calling
'mnt_drop_write_file', without freeing the string duplicated from the
user space.
Fixes: 4addc1ffd6 ("btrfs: qgroup: preallocate memory before adding a relation")
CC: stable@vger.kernel.org # 6.12+
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Miquel Sabaté Solà <mssola@mssola.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7e5a5983ed upstream.
When starting relocation, at reloc_chunk_start(), if we happen to find
the flag BTRFS_FS_RELOC_RUNNING is already set we return an error
(-EINPROGRESS) to the callers, however the callers call reloc_chunk_end()
which will clear the flag BTRFS_FS_RELOC_RUNNING, which is wrong since
relocation was started by another task and still running.
Finding the BTRFS_FS_RELOC_RUNNING flag already set is an unexpected
scenario, but still our current behaviour is not correct.
Fix this by never calling reloc_chunk_end() if reloc_chunk_start() has
returned an error, which is what logically makes sense, since the general
widespread pattern is to have end functions called only if the counterpart
start functions succeeded. This requires changing reloc_chunk_start() to
clear BTRFS_FS_RELOC_RUNNING if there's a pending cancel request.
Fixes: 907d2710d7 ("btrfs: add cancellable chunk relocation support")
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1d3ad18394 upstream.
syzbot reported a BUG_ON in ext4_es_cache_extent() when opening a verity
file on a corrupted ext4 filesystem mounted without a journal.
The issue is that the filesystem has an inode with both the INLINE_DATA
and EXTENTS flags set:
EXT4-fs error (device loop0): ext4_cache_extents:545: inode #15:
comm syz.0.17: corrupted extent tree: lblk 0 < prev 66
Investigation revealed that the inode has both flags set:
DEBUG: inode 15 - flag=1, i_inline_off=164, has_inline=1, extents_flag=1
This is an invalid combination since an inode should have either:
- INLINE_DATA: data stored directly in the inode
- EXTENTS: data stored in extent-mapped blocks
Having both flags causes ext4_has_inline_data() to return true, skipping
extent tree validation in __ext4_iget(). The unvalidated out-of-order
extents then trigger a BUG_ON in ext4_es_cache_extent() due to integer
underflow when calculating hole sizes.
Fix this by detecting this invalid flag combination early in ext4_iget()
and rejecting the corrupted inode.
Cc: stable@kernel.org
Reported-and-tested-by: syzbot+038b7bf43423e132b308@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=038b7bf43423e132b308
Suggested-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Message-ID: <20250930112810.315095-1-kartikey406@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 328a782cb1 upstream.
When freeing metadata blocks in nojournal mode, ext4_forget() calls
bforget() to clear the dirty flag on the buffer_head and remvoe
associated mappings. This is acceptable if the metadata has not yet
begun to be written back. However, if the write-back has already started
but is not yet completed, ext4_forget() will have no effect.
Subsequently, ext4_mb_clear_bb() will immediately return the block to
the mb allocator. This block can then be reallocated immediately,
potentially causing an data corruption issue.
Fix this by clearing the buffer's dirty flag and waiting for the ongoing
I/O to complete, ensuring that no further writes to stale data will
occur.
Fixes: 16e08b14a4 ("ext4: cleanup clean_bdev_aliases() calls")
Cc: stable@kernel.org
Reported-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Closes: https://lore.kernel.org/linux-ext4/a9417096-9549-4441-9878-b1955b899b4e@huaweicloud.com/
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-ID: <20250916093337.3161016-3-yi.zhang@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3c652c3a71 upstream.
When releasing file system metadata blocks in jbd2_journal_forget(), if
this buffer has not yet been checkpointed, it may have already been
written back, currently be in the process of being written back, or has
not yet written back. jbd2_journal_forget() calls
jbd2_journal_try_remove_checkpoint() to check the buffer's status and
add it to the current transaction if it has not been written back. This
buffer can only be reallocated after the transaction is committed.
jbd2_journal_try_remove_checkpoint() attempts to lock the buffer and
check its dirty status while holding the buffer lock. If the buffer has
already been written back, everything proceeds normally. However, there
are two issues. First, the function returns immediately if the buffer is
locked by the write-back process. It does not wait for the write-back to
complete. Consequently, until the current transaction is committed and
the block is reallocated, there is no guarantee that the I/O will
complete. This means that ongoing I/O could write stale metadata to the
newly allocated block, potentially corrupting data. Second, the function
unlocks the buffer as soon as it detects that the buffer is still dirty.
If a concurrent write-back occurs immediately after this unlocking and
before clear_buffer_dirty() is called in jbd2_journal_forget(), data
corruption can theoretically still occur.
Although these two issues are unlikely to occur in practice since the
undergoing metadata writeback I/O does not take this long to complete,
it's better to explicitly ensure that all ongoing I/O operations are
completed.
Fixes: 597599268e ("jbd2: discard dirty data when forgetting an un-journalled buffer")
Cc: stable@kernel.org
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-ID: <20250916093337.3161016-2-yi.zhang@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9d5c4f5c7a upstream.
Assuming the disk layout as below,
disk0: 0 --- 0x00035abfff
disk1: 0x00035ac000 --- 0x00037abfff
disk2: 0x00037ac000 --- 0x00037ebfff
and we want to read data from offset=13568 having len=128 across the block
devices, we can illustrate the block addresses like below.
0 .. 0x00037ac000 ------------------- 0x00037ebfff, 0x00037ec000 -------
| ^ ^ ^
| fofs 0 13568 13568+128
| ------------------------------------------------------
| LBA 0x37e8aa9 0x37ebfa9 0x37ec029
--- map 0x3caa9 0x3ffa9
In this example, we should give the relative map of the target block device
ranging from 0x3caa9 to 0x3ffa9 where the length should be calculated by
0x37ebfff + 1 - 0x37ebfa9.
In the below equation, however, map->m_pblk was supposed to be the original
address instead of the one from the target block address.
- map->m_len = min(map->m_len, dev->end_blk + 1 - map->m_pblk);
Cc: stable@vger.kernel.org
Fixes: 71f2c82062 ("f2fs: multidevice: support direct IO")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 75527d61d6 upstream.
rtl8152_driver_init() is missing the error handling.
When rtl8152_driver registration fails, rtl8152_cfgselector_driver
should be deregistered.
Fixes: ec51fbd1b8 ("r8152: add USB device driver for config selection")
Cc: stable@vger.kernel.org
Signed-off-by: Yi Cong <yicong@kylinos.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251011082415.580740-1-yicongsrfy@163.com
[pabeni@redhat.com: clarified the commit message]
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 86f54f9b6c upstream.
If obj_exts allocation failed, slab->obj_exts is set to OBJEXTS_ALLOC_FAIL,
But we do not clear it when freeing the slab. Since OBJEXTS_ALLOC_FAIL and
MEMCG_DATA_OBJEXTS currently share the same bit position, during the
release of the associated folio, a VM_BUG_ON_FOLIO() check in
folio_memcg_kmem() is triggered because the OBJEXTS_ALLOC_FAIL flag was
not cleared, causing it to be interpreted as a kmem folio (non-slab)
with MEMCG_OBJEXTS_DATA flag set, which is invalid because
MEMCG_OBJEXTS_DATA is supposed to be set only on slabs.
Another problem that predates sharing the OBJEXTS_ALLOC_FAIL and
MEMCG_DATA_OBJEXTS bits is that on configurations with
is_check_pages_enabled(), the non-cleared bit in page->memcg_data will
trigger a free_page_is_bad() failure "page still charged to cgroup"
When freeing a slab, we clear slab->obj_exts if the obj_ext array has
been successfully allocated. So let's clear it also when the allocation
has failed.
Fixes: 09c46563ff ("codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext allocations")
Fixes: 7612833192 ("slab: Reuse first bit for OBJEXTS_ALLOC_FAIL")
Link: https://lore.kernel.org/all/20251015141642.700170-1-hao.ge@linux.dev/
Cc: <stable@vger.kernel.org>
Signed-off-by: Hao Ge <gehao@kylinos.cn>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c2b77f4220 upstream.
Fix three refcount inconsistency issues related to `cifs_sb_tlink`.
Comments for `cifs_sb_tlink` state that `cifs_put_tlink()` needs to be
called after successful calls to `cifs_sb_tlink()`. Three calls fail to
update refcount accordingly, leading to possible resource leaks.
Fixes: 8ceb984379 ("CIFS: Move rename to ops struct")
Fixes: 2f1afe2599 ("cifs: Use smb 2 - 3 and cifsacl mount options getacl functions")
Fixes: 366ed846df ("cifs: Use smb 2 - 3 and cifsacl mount options setacl function")
Cc: stable@vger.kernel.org
Signed-off-by: Shuhao Fu <sfual@cse.ust.hk>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 812258ff41 upstream.
The kernel uses the standard rustc targets for non-x86 targets, and out
of those only 64-bit arm's target has kcfi support enabled. For x86, the
custom 64-bit target enables kcfi.
The HAVE_CFI_ICALL_NORMALIZE_INTEGERS_RUSTC config option that allows
CFI_CLANG to be used in combination with RUST does not check whether the
rustc target supports kcfi. This breaks the build on riscv (and
presumably 32-bit arm) when CFI_CLANG and RUST are enabled at the same
time.
Ordinarily, a rustc-option check would be used to detect target support
but unfortunately rustc-option filters out the target for reasons given
in commit 46e24a545c ("rust: kasan/kbuild: fix missing flags on first
build"). As a result, if the host supports kcfi but the target does not,
e.g. when building for riscv on x86_64, the build would remain broken.
Instead, make HAVE_CFI_ICALL_NORMALIZE_INTEGERS_RUSTC depend on the only
two architectures where the target used supports it to fix the build.
CC: stable@vger.kernel.org
Fixes: ca627e6365 ("rust: cfi: add support for CFI_CLANG with Rust")
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20250908-distill-lint-1ae78bcf777c@spud
Signed-off-by: Paul Walmsley <pjw@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9f64b3cd05 upstream.
In normal operation, a registered exec queue is disabled and
deregistered through the GuC, and freed only after the GuC confirms
completion. However, if the driver is forced to unbind while the exec
queue is still running, the user may call exec_destroy() after the GuC
has already been stopped and CT communication disabled.
In this case, the driver cannot receive a response from the GuC,
preventing proper cleanup of exec queue resources. Fix this by directly
releasing the resources when GuC is not running.
Here is the failure dmesg log:
"
[ 468.089581] ---[ end trace 0000000000000000 ]---
[ 468.089608] pci 0000:03:00.0: [drm] *ERROR* GT0: GUC ID manager unclean (1/65535)
[ 468.090558] pci 0000:03:00.0: [drm] GT0: total 65535
[ 468.090562] pci 0000:03:00.0: [drm] GT0: used 1
[ 468.090564] pci 0000:03:00.0: [drm] GT0: range 1..1 (1)
[ 468.092716] ------------[ cut here ]------------
[ 468.092719] WARNING: CPU: 14 PID: 4775 at drivers/gpu/drm/xe/xe_ttm_vram_mgr.c:298 ttm_vram_mgr_fini+0xf8/0x130 [xe]
"
v2: use xe_uc_fw_is_running() instead of xe_guc_ct_enabled().
As CT may go down and come back during VF migration.
Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: stable@vger.kernel.org
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20251010172529.2967639-2-shuicheng.lin@intel.com
(cherry picked from commit 9b42321a02c50a12b2beb6ae9469606257fbecea)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Modify the PDAF Datatype of the Arducam 64MP camera from 0x30 to 0x12
so that the Raspberry Pi 5 cfe driver can receive PDAF data.
Signed-off-by: Lee Jackson <info@arducam.com>
There were various points where the loader was using uninitialised
data, had the potential to run off the end of an array, or was
handling core functions incorrectly. Fix these up.
Also handle 24bpp and 32bpp framebuffers.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
commit 3776c685eb upstream.
Currently, whenever there is a need to transmit an Action frame,
the brcmfmac driver always uses the P2P vif to send the "actframe" IOVAR to
firmware. The P2P interfaces were available when wpa_supplicant is managing
the wlan interface.
However, the P2P interfaces are not created/initialized when only hostapd
is managing the wlan interface. And if hostapd receives an ANQP Query REQ
Action frame even from an un-associated STA, the brcmfmac driver tries
to use an uninitialized P2P vif pointer for sending the IOVAR to firmware.
This NULL pointer dereferencing triggers a driver crash.
[ 1417.074538] Unable to handle kernel NULL pointer dereference at virtual
address 0000000000000000
[...]
[ 1417.075188] Hardware name: Raspberry Pi 4 Model B Rev 1.5 (DT)
[...]
[ 1417.075653] Call trace:
[ 1417.075662] brcmf_p2p_send_action_frame+0x23c/0xc58 [brcmfmac]
[ 1417.075738] brcmf_cfg80211_mgmt_tx+0x304/0x5c0 [brcmfmac]
[ 1417.075810] cfg80211_mlme_mgmt_tx+0x1b0/0x428 [cfg80211]
[ 1417.076067] nl80211_tx_mgmt+0x238/0x388 [cfg80211]
[ 1417.076281] genl_family_rcv_msg_doit+0xe0/0x158
[ 1417.076302] genl_rcv_msg+0x220/0x2a0
[ 1417.076317] netlink_rcv_skb+0x68/0x140
[ 1417.076330] genl_rcv+0x40/0x60
[ 1417.076343] netlink_unicast+0x330/0x3b8
[ 1417.076357] netlink_sendmsg+0x19c/0x3f8
[ 1417.076370] __sock_sendmsg+0x64/0xc0
[ 1417.076391] ____sys_sendmsg+0x268/0x2a0
[ 1417.076408] ___sys_sendmsg+0xb8/0x118
[ 1417.076427] __sys_sendmsg+0x90/0xf8
[ 1417.076445] __arm64_sys_sendmsg+0x2c/0x40
[ 1417.076465] invoke_syscall+0x50/0x120
[ 1417.076486] el0_svc_common.constprop.0+0x48/0xf0
[ 1417.076506] do_el0_svc+0x24/0x38
[ 1417.076525] el0_svc+0x30/0x100
[ 1417.076548] el0t_64_sync_handler+0x100/0x130
[ 1417.076569] el0t_64_sync+0x190/0x198
[ 1417.076589] Code: f9401e80 aa1603e2 f9403be1 5280e483 (f9400000)
Fix this, by always using the vif corresponding to the wdev on which the
Action frame Transmission request was initiated by the userspace. This way,
even if P2P vif is not available, the IOVAR is sent to firmware on AP vif
and the ANQP Query RESP Action frame is transmitted without crashing the
driver.
Move init_completion() for "send_af_done" from brcmf_p2p_create_p2pdev()
to brcmf_p2p_attach(). Because the former function would not get executed
when only hostapd is managing wlan interface, and it is not safe to do
reinit_completion() later in brcmf_p2p_tx_action_frame(), without any prior
init_completion().
And in the brcmf_p2p_tx_action_frame() function, the condition check for
P2P Presence response frame is not needed, since the wpa_supplicant is
properly sending the P2P Presense Response frame on the P2P-GO vif instead
of the P2P-Device vif.
Cc: stable@vger.kernel.org
Fixes: 18e2f61db3 ("brcmfmac: P2P action frame tx")
Signed-off-by: Gokul Sivakumar <gokulkumar.sivakumar@infineon.com>
Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Link: https://patch.msgid.link/20251013102819.9727-1-gokulkumar.sivakumar@infineon.com
[Cc stable]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
commit e4f574ca9c upstream.
A while back I had reported that an NFSv3 client could successfully
mount using '-o xprtsec=none' an export that had been exported with
'xprtsec=tls:mtls'. By "successfully" I mean that the mount command
would succeed and the mount would show up in /proc/mount. Attempting
to do anything futher with the mount would be met with NFS3ERR_ACCES.
This was fixed (albeit accidentally) by commit bb4f07f240 ("nfsd:
Fix NFSD_MAY_BYPASS_GSS and NFSD_MAY_BYPASS_GSS_ON_ROOT") and was
subsequently re-broken by commit 0813c5f012 ("nfsd: fix access
checking for NLM under XPRTSEC policies").
Transport Layer Security isn't an RPC security flavor or pseudo-flavor,
so we shouldn't be conflating them when determining whether the access
checks can be bypassed. Split check_nfsd_access() into two helpers, and
have __fh_verify() call the helpers directly since __fh_verify() has
logic that allows one or both of the checks to be skipped. All other
sites will continue to call check_nfsd_access().
Link: https://lore.kernel.org/linux-nfs/ZjO3Qwf_G87yNXb2@aion/
Fixes: 9280c57743 ("NFSD: Handle new xprtsec= export option")
Cc: stable@vger.kernel.org
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bace10b596 upstream.
Assumption that chain DMA module starts the link DMA when 1ms of
data is available from host is not correct. Instead the firmware
chain DMA module fills the link DMA with initial buffer of zeroes
and the host and link DMAs are started at the same time.
This results in a small error in delay calculation. This can become a
more severe problem if host DMA has delays that exceed 1ms. This results
in negative delay to be calculated and bogus values reported to
applications. This can confuse some applications like
alsa_conformance_test.
Fix the issue by correctly calculating the firmware chain DMA
preamble size and initializing the start offset to this value.
Cc: stable@vger.kernel.org
Fixes: a1d203d390 ("ASoC: SOF: ipc4-pcm: Enable delay reporting for ChainDMA streams")
Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Link: https://patch.msgid.link/20251002074719.2084-3-peter.ujfalusi@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0813c5f012 upstream.
When an export policy with xprtsec policy is set with "tls"
and/or "mtls", but an NFS client is doing a v3 xprtsec=tls
mount, then NLM locking calls fail with an error because
there is currently no support for NLM with TLS.
Until such support is added, allow NLM calls under TLS-secured
policy.
Fixes: 4cc9b9f2bf ("nfsd: refine and rename NFSD_MAY_LOCK")
Cc: stable@vger.kernel.org
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Reviewed-by: NeilBrown <neil@brown.name>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d9d6b74e4b upstream.
__fh_verify() added a call to svc_xprt_set_valid() to help do connection
management but during LOCALIO path rqstp argument is NULL, leading to
NULL pointer dereferencing and a crash.
Fixes: eccbbc7c00 ("nfsd: don't use sv_nrthreads in connection limiting calculations.")
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dequeuing a fair task on a throttled hierarchy returns early on
encountering a throttled cfs_rq since the throttle path has already
dequeued the hierarchy above and has adjusted the h_nr_* accounting till
the root cfs_rq.
dequeue_entities() crucially misses calling __block_task() for delayed
tasks being dequeued on the throttled hierarchies, but this was mostly
harmless until commit b7ca5743a2 ("sched/core: Tweak
wait_task_inactive() to force dequeue sched_delayed tasks") since all
existing cases would re-enqueue the task if task_on_rq_queued() returned
true and the task would eventually be blocked at pick after the
hierarchy was unthrottled.
wait_task_inactive() is special as it expects the delayed task on
throttled hierarchy to reach the blocked state on dequeue but since
__block_task() is never called, task_on_rq_queued() continues to return
true. Furthermore, since the task is now off the hierarchy, the pick
never reaches it to fully block the task even after unthrottle leading
to wait_task_inactive() looping endlessly.
Remedy this by calling __block_task() if a delayed task is being
dequeued on a throttled hierarchy.
This fix is only required for stabled kernels implementing delay dequeue
(>= v6.12) before v6.18 since upstream commit e1fad12dcb ("sched/fair:
Switch to task based throttle model") indirectly fixes this by removing
the early return conditions in dequeue_entities() as part of the per-task
throttle feature.
Cc: stable@vger.kernel.org
Reported-by: Matt Fleming <matt@readmodwrite.com>
Closes: https://lore.kernel.org/all/20250925133310.1843863-1-matt@readmodwrite.com/
Fixes: b7ca5743a2 ("sched/core: Tweak wait_task_inactive() to force dequeue sched_delayed tasks")
Tested-by: Matt Fleming <mfleming@cloudflare.com>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9a6ebbdbd4 ]
With lazytime mount option enabled we can be switching many dirty inodes
on cgroup exit to the parent cgroup. The numbers observed in practice
when systemd slice of a large cron job exits can easily reach hundreds
of thousands or millions. The logic in inode_do_switch_wbs() which sorts
the inode into appropriate place in b_dirty list of the target wb
however has linear complexity in the number of dirty inodes thus overall
time complexity of switching all the inodes is quadratic leading to
workers being pegged for hours consuming 100% of the CPU and switching
inodes to the parent wb.
Simple reproducer of the issue:
FILES=10000
# Filesystem mounted with lazytime mount option
MNT=/mnt/
echo "Creating files and switching timestamps"
for (( j = 0; j < 50; j ++ )); do
mkdir $MNT/dir$j
for (( i = 0; i < $FILES; i++ )); do
echo "foo" >$MNT/dir$j/file$i
done
touch -a -t 202501010000 $MNT/dir$j/file*
done
wait
echo "Syncing and flushing"
sync
echo 3 >/proc/sys/vm/drop_caches
echo "Reading all files from a cgroup"
mkdir /sys/fs/cgroup/unified/mycg1 || exit
echo $$ >/sys/fs/cgroup/unified/mycg1/cgroup.procs || exit
for (( j = 0; j < 50; j ++ )); do
cat /mnt/dir$j/file* >/dev/null &
done
wait
echo "Switching wbs"
# Now rmdir the cgroup after the script exits
We need to maintain b_dirty list ordering to keep writeback happy so
instead of sorting inode into appropriate place just append it at the
end of the list and clobber dirtied_time_when. This may result in inode
writeback starting later after cgroup switch however cgroup switches are
rare so it shouldn't matter much. Since the cgroup had write access to
the inode, there are no practical concerns of the possible DoS issues.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 66c14dccd8 ]
process_inode_switch_wbs_work() can be switching over 100 inodes to a
different cgroup. Since switching an inode requires counting all dirty &
under-writeback pages in the address space of each inode, this can take
a significant amount of time. Add a possibility to reschedule after
processing each inode to avoid softlockups.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 278033a225 ]
When CONFIG_TMPFS is enabled, the initial root filesystem is a tmpfs.
By default, a tmpfs mount is limited to using 50% of the available RAM
for its content. This can be problematic in memory-constrained
environments, particularly during a kdump capture.
In a kdump scenario, the capture kernel boots with a limited amount of
memory specified by the 'crashkernel' parameter. If the initramfs is
large, it may fail to unpack into the tmpfs rootfs due to insufficient
space. This is because to get X MB of usable space in tmpfs, 2*X MB of
memory must be available for the mount. This leads to an OOM failure
during the early boot process, preventing a successful crash dump.
This patch introduces a new kernel command-line parameter,
initramfs_options, which allows passing specific mount options directly
to the rootfs when it is first mounted. This gives users control over
the rootfs behavior.
For example, a user can now specify initramfs_options=size=75% to allow
the tmpfs to use up to 75% of the available memory. This can
significantly reduce the memory pressure for kdump.
Consider a practical example:
To unpack a 48MB initramfs, the tmpfs needs 48MB of usable space. With
the default 50% limit, this requires a memory pool of 96MB to be
available for the tmpfs mount. The total memory requirement is therefore
approximately: 16MB (vmlinuz) + 48MB (loaded initramfs) + 48MB (unpacked
kernel) + 96MB (for tmpfs) + 12MB (runtime overhead) ≈ 220MB.
By using initramfs_options=size=75%, the memory pool required for the
48MB tmpfs is reduced to 48MB / 0.75 = 64MB. This reduces the total
memory requirement by 32MB (96MB - 64MB), allowing the kdump to succeed
with a smaller crashkernel size, such as 192MB.
An alternative approach of reusing the existing rootflags parameter was
considered. However, a new, dedicated initramfs_options parameter was
chosen to avoid altering the current behavior of rootflags (which
applies to the final root filesystem) and to prevent any potential
regressions.
Also add documentation for the new kernel parameter "initramfs_options"
This approach is inspired by prior discussions and patches on the topic.
Ref: https://www.lightofdawn.org/blog/?viewDetailed=00128
Ref: https://landley.net/notes-2015.html#01-01-2015
Ref: https://lkml.org/lkml/2021/6/29/783
Ref: https://www.kernel.org/doc/html/latest/filesystems/ramfs-rootfs-initramfs.html#what-is-rootfs
Signed-off-by: Lichen Liu <lichliu@redhat.com>
Link: https://lore.kernel.org/20250815121459.3391223-1-lichliu@redhat.com
Tested-by: Rob Landley <rob@landley.net>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f75e07bf52 ]
According to the PLIC specification[1], global interrupt sources are
assigned small unsigned integer identifiers beginning at the value 1.
An interrupt ID of 0 is reserved to mean "no interrupt".
The current plic_irq_resume() and plic_irq_suspend() functions incorrectly
start the loop from index 0, which accesses the register space for the
reserved interrupt ID 0.
Change the loop to start from index 1, skipping the reserved
interrupt ID 0 as per the PLIC specification.
This prevents potential undefined behavior when accessing the reserved
register space during suspend/resume cycles.
Fixes: e80f0b6a2c ("irqchip/irq-sifive-plic: Add syscore callbacks for hibernation")
Co-developed-by: Jia Wang <wangjia@ultrarisc.com>
Signed-off-by: Jia Wang <wangjia@ultrarisc.com>
Co-developed-by: Charles Mirabile <cmirabil@redhat.com>
Signed-off-by: Charles Mirabile <cmirabil@redhat.com>
Signed-off-by: Lucas Zampieri <lzampier@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://github.com/riscv/riscv-plic-spec/releases/tag/1.0.0
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit bc3905a71f upstream.
The tailcall_bpf2bpf_hierarchy_fentry test hangs on s390. Its call
graph is as follows:
entry()
subprog_tail()
trampoline()
fentry()
the rest of subprog_tail() # via BPF_TRAMP_F_CALL_ORIG
return to entry()
The problem is that the rest of subprog_tail() increments the tail call
counter, but the trampoline discards the incremented value. This
results in an astronomically large number of tail calls.
Fix by making the trampoline write the incremented tail call counter
back.
Fixes: 528eb2cb87 ("s390/bpf: Implement arch_prepare_bpf_trampoline()")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20250813121016.163375-4-iii@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c861a6b147 upstream.
The tailcall_bpf2bpf_hierarchy_1 test hangs on s390. Its call graph is
as follows:
entry()
subprog_tail()
bpf_tail_call_static(0) -> entry + tail_call_start
subprog_tail()
bpf_tail_call_static(0) -> entry + tail_call_start
entry() copies its tail call counter to the subprog_tail()'s frame,
which then increments it. However, the incremented result is discarded,
leading to an astronomically large number of tail calls.
Fix by writing the incremented counter back to the entry()'s frame.
Fixes: dd691e847d ("s390/bpf: Implement bpf_jit_supports_subprog_tailcalls()")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20250813121016.163375-3-iii@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e26d523edf upstream.
Currently the caller-allocated portion of the stack frame is described
using constants, hardcoded values, and an ASCII drawing, making it
harder than necessary to ensure that everything is in sync.
Declare a struct and use offsetof() and offsetofend() macros to refer
to various values stored within the frame.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20250624121501.50536-3-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e2c69490dd upstream
Prior to commit b52da4054e ("ipmi: Rework user message limit handling"),
i_ipmi_request() used to increase the user reference counter if the receive
message is provided by the caller of IPMI API functions. This is no longer
the case. However, ipmi_free_recv_msg() is still called and decreases the
reference counter. This results in the reference counter reaching zero,
the user data pointer is released, and all kinds of interesting crashes are
seen.
Fix the problem by increasing user reference counter if the receive message
has been provided by the caller.
Fixes: b52da4054e ("ipmi: Rework user message limit handling")
Reported-by: Eric Dumazet <edumazet@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Thelen <gthelen@google.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Message-ID: <20251006201857.3433837-1-linux@roeck-us.net>
Signed-off-by: Corey Minyard <corey@minyard.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b52da4054e upstream
This patch required quite a bit of work to backport due to a number
of unrelated changes that do not make sense to backport. This has
been run against my test suite and passes all tests.
The limit on the number of user messages had a number of issues,
improper counting in some cases and a use after free.
Restructure how this is all done to handle more in the receive message
allocation routine, so all refcouting and user message limit counts
are done in that routine. It's a lot cleaner and safer.
Reported-by: Gilles BULOZ <gilles.buloz@kontron.com>
Closes: https://lore.kernel.org/lkml/aLsw6G0GyqfpKs2S@mail.minyard.net/
Fixes: 8e76741c3d ("ipmi: Add a limit on the number of users that may use IPMI")
Cc: <stable@vger.kernel.org> # 4.19
Signed-off-by: Corey Minyard <corey@minyard.net>
Tested-by: Gilles BULOZ <gilles.buloz@kontron.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4b1ff850e0 upstream.
When servers set the C-flag in their MP_CAPABLE to tell clients not to
create subflows to the initial address and port, clients will likely not
use their other endpoints. That's because the in-kernel path-manager
uses the 'subflow' endpoints to create subflows only to the initial
address and port.
If the limits have not been modified to accept ADD_ADDR, the client
doesn't try to establish new subflows. If the limits accept ADD_ADDR,
the routing routes will be used to select the source IP.
The C-flag is typically set when the server is operating behind a legacy
Layer 4 load balancer, or using anycast IP address. Clients having their
different 'subflow' endpoints setup, don't end up creating multiple
subflows as expected, and causing some deployment issues.
A special case is then added here: when servers set the C-flag in the
MPC and directly sends an ADD_ADDR, this single ADD_ADDR is accepted.
The 'subflows' endpoints will then be used with this new remote IP and
port. This exception is only allowed when the ADD_ADDR is sent
immediately after the 3WHS, and makes the client switching to the 'fully
established' mode. After that, 'select_local_address()' will not be able
to find any subflows, because 'id_avail_bitmap' will be filled in
mptcp_pm_create_subflow_or_signal_addr(), when switching to 'fully
established' mode.
Fixes: df377be387 ("mptcp: add deny_join_id0 in mptcp_options_received")
Cc: stable@vger.kernel.org
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/536
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250925-net-next-mptcp-c-flag-laminar-v1-1-ad126cc47c6b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[ Conflict in pm.c, because commit 498d7d8b75 ("mptcp: pm: remove
'_nl' from mptcp_pm_nl_is_init_remote_addr") renamed an helper in the
context, and it is not in this version. The same new code can be
applied at the same place.
Conflict in pm_kernel.c, because the modified code has been moved from
pm_netlink.c to pm_kernel.c in commit 8617e85e04 ("mptcp: pm: split
in-kernel PM specific code"), which is not in this version. The
resolution is easy: simply by applying the patch where 'pm_kernel.c'
has been replaced 'pm_netlink.c'. 'patch --merge' managed to apply
this modified patch without creating any conflicts. ]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit baf60d5cb8 ]
In certain circumstances, the ACPI handle of a data-only node may be
NULL, in which case it does not make sense to attempt to attach that
node to an ACPI namespace object, so update the code to avoid attempts
to do so.
This prevents confusing and unuseful error messages from being printed.
Also document the fact that the ACPI handle of a data-only node may be
NULL and when that happens in a code comment. In addition, make
acpi_add_nondev_subnodes() print a diagnostic message for each data-only
node with an unknown ACPI namespace scope.
Fixes: 1d52f10917 ("ACPI: property: Tie data nodes to acpi handles")
Cc: 6.0+ <stable@vger.kernel.org> # 6.0+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Tested-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 737c3a09dc ]
In some places in the ACPI device properties handling code, it is
unclear why the code is what it is. Some assumptions are not documented
and some pieces of code are based on knowledge that is not mentioned
anywhere.
Add code comments explaining these things.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Tested-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Stable-dep-of: baf60d5cb8 ("ACPI: property: Do not pass NULL handles to acpi_attach_data()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d06118fe9b ]
Data-only subnode links following the ACPI data subnode GUID in a _DSD
package are expected to point to named objects returning _DSD-equivalent
packages. If a reference to such an object is used in the target field
of any of those links, that object will be evaluated in place (as a
named object) and its return data will be embedded in the outer _DSD
package.
For this reason, it is not expected to see a subnode link with the
target field containing a local reference (that would mean pointing
to a device or another object that cannot be evaluated in place and
therefore cannot return a _DSD-equivalent package).
Accordingly, simplify the code parsing data-only subnode links to
simply print a message when it encounters a local reference in the
target field of one of those links.
Moreover, since acpi_nondev_subnode_data_ok() would only have one
caller after the change above, fold it into that caller.
Link: https://lore.kernel.org/linux-acpi/CAJZ5v0jVeSrDO6hrZhKgRZrH=FpGD4vNUjFD8hV9WwN9TLHjzQ@mail.gmail.com/
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Tested-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Stable-dep-of: baf60d5cb8 ("ACPI: property: Do not pass NULL handles to acpi_attach_data()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 399dbcadc0 ]
There is no synchronization between different code paths in the ACPI
battery driver that update its sysfs interface or its power supply
class device interface. In some cases this results to functional
failures due to race conditions.
One example of this is when two ACPI notifications:
- ACPI_BATTERY_NOTIFY_STATUS (0x80)
- ACPI_BATTERY_NOTIFY_INFO (0x81)
are triggered (by the platform firmware) in a row with a little delay
in between after removing and reinserting a laptop battery. Both
notifications cause acpi_battery_update() to be called and if the delay
between them is sufficiently small, sysfs_add_battery() can be re-entered
before battery->bat is set which leads to a duplicate sysfs entry error:
sysfs: cannot create duplicate filename '/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0A:00/power_supply/BAT1'
CPU: 1 UID: 0 PID: 185 Comm: kworker/1:4 Kdump: loaded Not tainted 6.12.38+deb13-amd64 #1 Debian 6.12.38-1
Hardware name: Gateway NV44 /SJV40-MV , BIOS V1.3121 04/08/2009
Workqueue: kacpi_notify acpi_os_execute_deferred
Call Trace:
<TASK>
dump_stack_lvl+0x5d/0x80
sysfs_warn_dup.cold+0x17/0x23
sysfs_create_dir_ns+0xce/0xe0
kobject_add_internal+0xba/0x250
kobject_add+0x96/0xc0
? get_device_parent+0xde/0x1e0
device_add+0xe2/0x870
__power_supply_register.part.0+0x20f/0x3f0
? wake_up_q+0x4e/0x90
sysfs_add_battery+0xa4/0x1d0 [battery]
acpi_battery_update+0x19e/0x290 [battery]
acpi_battery_notify+0x50/0x120 [battery]
acpi_ev_notify_dispatch+0x49/0x70
acpi_os_execute_deferred+0x1a/0x30
process_one_work+0x177/0x330
worker_thread+0x251/0x390
? __pfx_worker_thread+0x10/0x10
kthread+0xd2/0x100
? __pfx_kthread+0x10/0x10
ret_from_fork+0x34/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
kobject: kobject_add_internal failed for BAT1 with -EEXIST, don't try to register things with the same name in the same directory.
There are also other scenarios in which analogous issues may occur.
Address this by using a common lock in all of the code paths leading
to updates of driver interfaces: ACPI Notify () handler, system resume
callback and post-resume notification, device addition and removal.
This new lock replaces sysfs_lock that has been used only in
sysfs_remove_battery() which now is going to be always called under
the new lock, so it doesn't need any internal locking any more.
Fixes: 1066625155 ("ACPI: battery: Install Notify() handler directly")
Closes: https://lore.kernel.org/linux-acpi/20250910142653.313360-1-luogf2025@163.com/
Reported-by: GuangFei Luo <luogf2025@163.com>
Tested-by: GuangFei Luo <luogf2025@163.com>
Cc: 6.6+ <stable@vger.kernel.org> # 6.6+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 898374fdd7 ]
When a listener is added, a part of creation of transport also registers
program/port with rpcbind. However, when the listener is removed,
while transport goes away, rpcbind still has the entry for that
port/type.
When deleting the transport, unregister with rpcbind when appropriate.
---v2 created a new xpt_flag XPT_RPCB_UNREG to mark TCP and UDP
transport and at xprt destroy send rpcbind unregister if flag set.
Suggested-by: Chuck Lever <chuck.lever@oracle.com>
Fixes: d093c90892 ("nfsd: fix management of listener transports")
Cc: stable@vger.kernel.org
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit eccbbc7c00 ]
The heuristic for limiting the number of incoming connections to nfsd
currently uses sv_nrthreads - allowing more connections if more threads
were configured.
A future patch will allow number of threads to grow dynamically so that
there will be no need to configure sv_nrthreads. So we need a different
solution for limiting connections.
It isn't clear what problem is solved by limiting connections (as
mentioned in a code comment) but the most likely problem is a connection
storm - many connections that are not doing productive work. These will
be closed after about 6 minutes already but it might help to slow down a
storm.
This patch adds a per-connection flag XPT_PEER_VALID which indicates
that the peer has presented a filehandle for which it has some sort of
access. i.e the peer is known to be trusted in some way. We now only
count connections which have NOT been determined to be valid. There
should be relative few of these at any given time.
If the number of non-validated peer exceed a limit - currently 64 - we
close the oldest non-validated peer to avoid having too many of these
useless connections.
Note that this patch significantly changes the meaning of the various
configuration parameters for "max connections". The next patch will
remove all of these.
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Stable-dep-of: 898374fdd7 ("nfsd: unregister with rpcbind when deleting a transport")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4cc9b9f2bf ]
NFSD_MAY_LOCK means a few different things.
- it means that GSS is not required.
- it means that with NFSEXP_NOAUTHNLM, authentication is not required
- it means that OWNER_OVERRIDE is allowed.
None of these are specific to locking, they are specific to the NLM
protocol.
So:
- rename to NFSD_MAY_NLM
- set NFSD_MAY_OWNER_OVERRIDE and NFSD_MAY_BYPASS_GSS in nlm_fopen()
so that NFSD_MAY_NLM doesn't need to imply these.
- move the test on NFSEXP_NOAUTHNLM out of nfsd_permission() and
into fh_verify where other special-case tests on the MAY flags
happen. nfsd_permission() can be called from other places than
fh_verify(), but none of these will have NFSD_MAY_NLM.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Stable-dep-of: 898374fdd7 ("nfsd: unregister with rpcbind when deleting a transport")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6640556b0c ]
NFSv4 LOCK operations should not avoid the set of authorization
checks that apply to all other NFSv4 operations. Also, the
"no_auth_nlm" export option should apply only to NLM LOCK requests.
It's not necessary or sensible to apply it to NFSv4 LOCK operations.
Instead, set no permission bits when calling fh_verify(). Subsequent
stateid processing handles authorization checks.
Reported-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Stable-dep-of: 898374fdd7 ("nfsd: unregister with rpcbind when deleting a transport")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bb4f07f240 ]
Currently NFSD_MAY_BYPASS_GSS and NFSD_MAY_BYPASS_GSS_ON_ROOT do not bypass
only GSS, but bypass any method. This is a problem specially for NFS3
AUTH_NULL-only exports.
The purpose of NFSD_MAY_BYPASS_GSS_ON_ROOT is described in RFC 2623,
section 2.3.2, to allow mounting NFS2/3 GSS-only export without
authentication. So few procedures which do not expose security risk used
during mount time can be called also with AUTH_NONE or AUTH_SYS, to allow
client mount operation to finish successfully.
The problem with current implementation is that for AUTH_NULL-only exports,
the NFSD_MAY_BYPASS_GSS_ON_ROOT is active also for NFS3 AUTH_UNIX mount
attempts which confuse NFS3 clients, and make them think that AUTH_UNIX is
enabled and is working. Linux NFS3 client never switches from AUTH_UNIX to
AUTH_NONE on active mount, which makes the mount inaccessible.
Fix the NFSD_MAY_BYPASS_GSS and NFSD_MAY_BYPASS_GSS_ON_ROOT implementation
and really allow to bypass only exports which have enabled some real
authentication (GSS, TLS, or any other).
The result would be: For AUTH_NULL-only export if client attempts to do
mount with AUTH_UNIX flavor then it will receive access errors, which
instruct client that AUTH_UNIX flavor is not usable and will either try
other auth flavor (AUTH_NULL if enabled) or fails mount procedure.
Similarly if client attempt to do mount with AUTH_NULL flavor and only
AUTH_UNIX flavor is enabled then the client will receive access error.
This should fix problems with AUTH_NULL-only or AUTH_UNIX-only exports if
client attempts to mount it with other auth flavor (e.g. with AUTH_NULL for
AUTH_UNIX-only export, or with AUTH_UNIX for AUTH_NULL-only export).
Signed-off-by: Pali Rohár <pali@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Stable-dep-of: 898374fdd7 ("nfsd: unregister with rpcbind when deleting a transport")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0dccbc75e1 ]
When running as an SNP or TDX guest under KVM, force the legacy PCI hole,
i.e. memory between Top of Lower Usable DRAM and 4GiB, to be mapped as UC
via a forced variable MTRR range.
In most KVM-based setups, legacy devices such as the HPET and TPM are
enumerated via ACPI. ACPI enumeration includes a Memory32Fixed entry, and
optionally a SystemMemory descriptor for an OperationRegion, e.g. if the
device needs to be accessed via a Control Method.
If a SystemMemory entry is present, then the kernel's ACPI driver will
auto-ioremap the region so that it can be accessed at will. However, the
ACPI spec doesn't provide a way to enumerate the memory type of
SystemMemory regions, i.e. there's no way to tell software that a region
must be mapped as UC vs. WB, etc. As a result, Linux's ACPI driver always
maps SystemMemory regions using ioremap_cache(), i.e. as WB on x86.
The dedicated device drivers however, e.g. the HPET driver and TPM driver,
want to map their associated memory as UC or WC, as accessing PCI devices
using WB is unsupported.
On bare metal and non-CoCO, the conflicting requirements "work" as firmware
configures the PCI hole (and other device memory) to be UC in the MTRRs.
So even though the ACPI mappings request WB, they are forced to UC- in the
kernel's tracking due to the kernel properly handling the MTRR overrides,
and thus are compatible with the drivers' requested WC/UC-.
With force WB MTRRs on SNP and TDX guests, the ACPI mappings get their
requested WB if the ACPI mappings are established before the dedicated
driver code attempts to initialize the device. E.g. if acpi_init()
runs before the corresponding device driver is probed, ACPI's WB mapping
will "win", and result in the driver's ioremap() failing because the
existing WB mapping isn't compatible with the requested WC/UC-.
E.g. when a TPM is emulated by the hypervisor (ignoring the security
implications of relying on what is allegedly an untrusted entity to store
measurements), the TPM driver will request UC and fail:
[ 1.730459] ioremap error for 0xfed40000-0xfed45000, requested 0x2, got 0x0
[ 1.732780] tpm_tis MSFT0101:00: probe with driver tpm_tis failed with error -12
Note, the '0x2' and '0x0' values refer to "enum page_cache_mode", not x86's
memtypes (which frustratingly are an almost pure inversion; 2 == WB, 0 == UC).
E.g. tracing mapping requests for TPM TIS yields:
Mapping TPM TIS with req_type = 0
WARNING: CPU: 22 PID: 1 at arch/x86/mm/pat/memtype.c:530 memtype_reserve+0x2ab/0x460
Modules linked in:
CPU: 22 UID: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.16.0-rc7+ #2 VOLUNTARY
Tainted: [W]=WARN
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/29/2025
RIP: 0010:memtype_reserve+0x2ab/0x460
__ioremap_caller+0x16d/0x3d0
ioremap_cache+0x17/0x30
x86_acpi_os_ioremap+0xe/0x20
acpi_os_map_iomem+0x1f3/0x240
acpi_os_map_memory+0xe/0x20
acpi_ex_system_memory_space_handler+0x273/0x440
acpi_ev_address_space_dispatch+0x176/0x4c0
acpi_ex_access_region+0x2ad/0x530
acpi_ex_field_datum_io+0xa2/0x4f0
acpi_ex_extract_from_field+0x296/0x3e0
acpi_ex_read_data_from_field+0xd1/0x460
acpi_ex_resolve_node_to_value+0x2ee/0x530
acpi_ex_resolve_to_value+0x1f2/0x540
acpi_ds_evaluate_name_path+0x11b/0x190
acpi_ds_exec_end_op+0x456/0x960
acpi_ps_parse_loop+0x27a/0xa50
acpi_ps_parse_aml+0x226/0x600
acpi_ps_execute_method+0x172/0x3e0
acpi_ns_evaluate+0x175/0x5f0
acpi_evaluate_object+0x213/0x490
acpi_evaluate_integer+0x6d/0x140
acpi_bus_get_status+0x93/0x150
acpi_add_single_object+0x43a/0x7c0
acpi_bus_check_add+0x149/0x3a0
acpi_bus_check_add_1+0x16/0x30
acpi_ns_walk_namespace+0x22c/0x360
acpi_walk_namespace+0x15c/0x170
acpi_bus_scan+0x1dd/0x200
acpi_scan_init+0xe5/0x2b0
acpi_init+0x264/0x5b0
do_one_initcall+0x5a/0x310
kernel_init_freeable+0x34f/0x4f0
kernel_init+0x1b/0x200
ret_from_fork+0x186/0x1b0
ret_from_fork_asm+0x1a/0x30
</TASK>
The above traces are from a Google-VMM based VM, but the same behavior
happens with a QEMU based VM that is modified to add a SystemMemory range
for the TPM TIS address space.
The only reason this doesn't cause problems for HPET, which appears to
require a SystemMemory region, is because HPET gets special treatment via
x86_init.timers.timer_init(), and so gets a chance to create its UC-
mapping before acpi_init() clobbers things. Disabling the early call to
hpet_time_init() yields the same behavior for HPET:
[ 0.318264] ioremap error for 0xfed00000-0xfed01000, requested 0x2, got 0x0
Hack around the ACPI gap by forcing the legacy PCI hole to UC when
overriding the (virtual) MTRRs for CoCo guest, so that ioremap handling
of MTRRs naturally kicks in and forces the ACPI mappings to be UC.
Note, the requested/mapped memtype doesn't actually matter in terms of
accessing the device. In practically every setup, legacy PCI devices are
emulated by the hypervisor, and accesses are intercepted and handled as
emulated MMIO, i.e. never access physical memory and thus don't have an
effective memtype.
Even in a theoretical setup where such devices are passed through by the
host, i.e. point at real MMIO memory, it is KVM's (as the hypervisor)
responsibility to force the memory to be WC/UC, e.g. via EPT memtype
under TDX or real hardware MTRRs under SNP. Not doing so cannot work,
and the hypervisor is highly motivated to do the right thing as letting
the guest access hardware MMIO with WB would likely result in a variety
of fatal #MCs.
In other words, forcing the range to be UC is all about coercing the
kernel's tracking into thinking that it has established UC mappings, so
that the ioremap code doesn't reject mappings from e.g. the TPM driver and
thus prevent the driver from loading and the device from functioning.
Note #2, relying on guest firmware to handle this scenario, e.g. by setting
virtual MTRRs and then consuming them in Linux, is not a viable option, as
the virtual MTRR state is managed by the untrusted hypervisor, and because
OVMF at least has stopped programming virtual MTRRs when running as a TDX
guest.
Link: https://lore.kernel.org/all/8137d98e-8825-415b-9282-1d2a115bb51a@linux.intel.com
Fixes: 8e690b817e ("x86/kvm: Override default caching mode for SEV-SNP and TDX")
Cc: stable@vger.kernel.org
Cc: Peter Gonda <pgonda@google.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Jürgen Groß <jgross@suse.com>
Cc: Korakit Seemakhupt <korakit@google.com>
Cc: Jianxiong Gao <jxgao@google.com>
Cc: Nikolay Borisov <nik.borisov@suse.com>
Suggested-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Tested-by: Korakit Seemakhupt <korakit@google.com>
Link: https://lore.kernel.org/r/20250828005249.39339-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f620d66af3 ]
Commit 68d54ceeec ("arm64: mte: Allow PTRACE_PEEKMTETAGS access to the
zero page") attempted to fix ptrace() reading of tags from the zero page
by marking it as PG_mte_tagged during cpu_enable_mte(). The same commit
also changed the ptrace() tag access permission check to the VM_MTE vma
flag while turning the page flag test into a WARN_ON_ONCE().
Attempting to set the PG_mte_tagged flag early with
CONFIG_DEFERRED_STRUCT_PAGE_INIT enabled may either hang (after commit
d77e59a8fc "arm64: mte: Lock a page for MTE tag initialisation") or
have the flags cleared later during page_alloc_init_late(). In addition,
pages_identical() -> memcmp_pages() will reject any comparison with the
zero page as it is marked as tagged.
Partially revert the above commit to avoid setting PG_mte_tagged on the
zero page. Update the __access_remote_tags() warning on untagged pages
to ignore the zero page since it is known to have the tags initialised.
Note that all user mapping of the zero page are marked as pte_special().
The arm64 set_pte_at() will not call mte_sync_tags() on such pages, so
PG_mte_tagged will remain cleared.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Fixes: 68d54ceeec ("arm64: mte: Allow PTRACE_PEEKMTETAGS access to the zero page")
Reported-by: Gergely Kovacs <Gergely.Kovacs2@arm.com>
Cc: stable@vger.kernel.org # 5.10.x
Cc: Will Deacon <will@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Lance Yang <lance.yang@linux.dev>
Acked-by: Lance Yang <lance.yang@linux.dev>
Reviewed-by: David Hildenbrand <david@redhat.com>
Tested-by: Lance Yang <lance.yang@linux.dev>
Signed-off-by: Will Deacon <will@kernel.org>
[ Adjust context ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f97aef092e ]
Commit a755d0e2d4 ("cpufreq: Honour transition_latency over
transition_delay_us") caused platforms where cpuinfo.transition_latency
is CPUFREQ_ETERNAL to get a very large transition latency whereas
previously it had been capped at 10 ms (and later at 2 ms).
This led to a user-observable regression between 6.6 and 6.12 as
described by Shawn:
"The dbs sampling_rate was 10000 us on 6.6 and suddently becomes
6442450 us (4294967295 / 1000 * 1.5) on 6.12 for these platforms
because the default transition delay was dropped [...].
It slows down dbs governor's reacting to CPU loading change
dramatically. Also, as transition_delay_us is used by schedutil
governor as rate_limit_us, it shows a negative impact on device
idle power consumption, because the device gets slightly less time
in the lowest OPP."
Evidently, the expectation of the drivers using CPUFREQ_ETERNAL as
cpuinfo.transition_latency was that it would be capped by the core,
but they may as well return a default transition latency value instead
of CPUFREQ_ETERNAL and the core need not do anything with it.
Accordingly, introduce CPUFREQ_DEFAULT_TRANSITION_LATENCY_NS and make
all of the drivers in question use it instead of CPUFREQ_ETERNAL. Also
update the related Rust binding.
Fixes: a755d0e2d4 ("cpufreq: Honour transition_latency over transition_delay_us")
Closes: https://lore.kernel.org/linux-pm/20250922125929.453444-1-shawnguo2@yeah.net/
Reported-by: Shawn Guo <shawnguo@kernel.org>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Reviewed-by: Jie Zhan <zhanjie9@hisilicon.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 6.6+ <stable@vger.kernel.org> # 6.6+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2264949.irdbgypaU6@rafael.j.wysocki
[ rjw: Fix typo in new symbol name, drop redundant type cast from Rust binding ]
Tested-by: Shawn Guo <shawnguo@kernel.org> # with cpufreq-dt driver
Reviewed-by: Qais Yousef <qyousef@layalina.io>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[ omitted Rust changes ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7b26da4074 ]
[BUG]
With my local branch to enable bs > ps support for btrfs, sometimes I
hit the following ASSERT() inside submit_one_sector():
ASSERT(block_start != EXTENT_MAP_HOLE);
Please note that it's not yet possible to hit this ASSERT() in the wild
yet, as it requires btrfs bs > ps support, which is not even in the
development branch.
But on the other hand, there is also a very low chance to hit above
ASSERT() with bs < ps cases, so this is an existing bug affect not only
the incoming bs > ps support but also the existing bs < ps support.
[CAUSE]
Firstly that ASSERT() means we're trying to submit a dirty block but
without a real extent map nor ordered extent map backing it.
Furthermore with extra debugging, the folio triggering such ASSERT() is
always larger than the fs block size in my bs > ps case.
(8K block size, 4K page size)
After some more debugging, the ASSERT() is trigger by the following
sequence:
extent_writepage()
| We got a 32K folio (4 fs blocks) at file offset 0, and the fs block
| size is 8K, page size is 4K.
| And there is another 8K folio at file offset 32K, which is also
| dirty.
| So the filemap layout looks like the following:
|
| "||" is the filio boundary in the filemap.
| "//| is the dirty range.
|
| 0 8K 16K 24K 32K 40K
| |////////| |//////////////////////||////////|
|
|- writepage_delalloc()
| |- find_lock_delalloc_range() for [0, 8K)
| | Now range [0, 8K) is properly locked.
| |
| |- find_lock_delalloc_range() for [16K, 40K)
| | |- btrfs_find_delalloc_range() returned range [16K, 40K)
| | |- lock_delalloc_folios() locked folio 0 successfully
| | |
| | | The filemap range [32K, 40K) got dropped from filemap.
| | |
| | |- lock_delalloc_folios() failed with -EAGAIN on folio 32K
| | | As the folio at 32K is dropped.
| | |
| | |- loops = 1;
| | |- max_bytes = PAGE_SIZE;
| | |- goto again;
| | | This will re-do the lookup for dirty delalloc ranges.
| | |
| | |- btrfs_find_delalloc_range() called with @max_bytes == 4K
| | | This is smaller than block size, so
| | | btrfs_find_delalloc_range() is unable to return any range.
| | \- return false;
| |
| \- Now only range [0, 8K) has an OE for it, but for dirty range
| [16K, 32K) it's dirty without an OE.
| This breaks the assumption that writepage_delalloc() will find
| and lock all dirty ranges inside the folio.
|
|- extent_writepage_io()
|- submit_one_sector() for [0, 8K)
| Succeeded
|
|- submit_one_sector() for [16K, 24K)
Triggering the ASSERT(), as there is no OE, and the original
extent map is a hole.
Please note that, this also exposed the same problem for bs < ps
support. E.g. with 64K page size and 4K block size.
If we failed to lock a folio, and falls back into the "loops = 1;"
branch, we will re-do the search using 64K as max_bytes.
Which may fail again to lock the next folio, and exit early without
handling all dirty blocks inside the folio.
[FIX]
Instead of using the fixed size PAGE_SIZE as @max_bytes, use
@sectorsize, so that we are ensured to find and lock any remaining
blocks inside the folio.
And since we're here, add an extra ASSERT() to
before calling btrfs_find_delalloc_range() to make sure the @max_bytes is
at least no smaller than a block to avoid false negative.
Cc: stable@vger.kernel.org # 5.15+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 64e0d839c5 ]
Testing has shown that reading multiple registers at once (for 10-bit
ADC values) does not work. Set the use_single_read regmap_config flag
to make regmap split these for us.
This should fix temperature opregion accesses done by
drivers/acpi/pmic/intel_pmic_chtdc_ti.c and is also necessary for
the upcoming drivers for the ADC and battery MFD cells.
Fixes: 6bac0606fd ("mfd: Add support for Cherry Trail Dollar Cove TI PMIC")
Cc: stable@vger.kernel.org
Reviewed-by: Andy Shevchenko <andy@kernel.org>
Signed-off-by: Hans de Goede <hansg@kernel.org>
Link: https://lore.kernel.org/r/20250804133240.312383-1-hansg@kernel.org
Signed-off-by: Lee Jones <lee@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 85afa9ea12 ]
The fields dma_chan_tx and dma_chan_rx of the struct pci_epf_test can be
NULL even after EPF initialization. Then it is prudent to check that
they have non-NULL values before releasing the channels. Add the checks
in pci_epf_test_clean_dma_chan().
Without the checks, NULL pointer dereferences happen and they can lead
to a kernel panic in some cases:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000050
Call trace:
dma_release_channel+0x2c/0x120 (P)
pci_epf_test_epc_deinit+0x94/0xc0 [pci_epf_test]
pci_epc_deinit_notify+0x74/0xc0
tegra_pcie_ep_pex_rst_irq+0x250/0x5d8
irq_thread_fn+0x34/0xb8
irq_thread+0x18c/0x2e8
kthread+0x14c/0x210
ret_from_fork+0x10/0x20
Fixes: 8353813c88 ("PCI: endpoint: Enable DMA tests for endpoints with DMA capabilities")
Fixes: 5ebf3fc59b ("PCI: endpoint: functions/pci-epf-test: Add DMA support to transfer data")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
[mani: trimmed the stack trace]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250916025756.34807-1-shinichiro.kawasaki@wdc.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9b80bdb10a ]
Remove a surplus return statement from the void function that has been
added in the commit commit 8353813c88 ("PCI: endpoint: Enable DMA
tests for endpoints with DMA capabilities").
Especially, as an empty return statements at the end of a void functions
serve little purpose.
This fixes the following checkpatch.pl script warning:
WARNING: void function return statements are not generally useful
#296: FILE: drivers/pci/endpoint/functions/pci-epf-test.c:296:
+ return;
+}
Link: https://lore.kernel.org/r/tencent_F250BEE2A65745A524E2EFE70CF615CA8F06@qq.com
Signed-off-by: Wang Jiang <jiangwang@kylinos.cn>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Stable-dep-of: 85afa9ea12 ("PCI: endpoint: pci-epf-test: Add NULL check for DMA channels before release")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4d6fc29f36 ]
Patch series "mm/ksm: Fix incorrect accounting of KSM counters during
fork", v3.
The first patch in this series fixes the incorrect accounting of KSM
counters such as ksm_merging_pages, ksm_rmap_items, and the global
ksm_zero_pages during fork.
The following patch add a selftest to verify the ksm_merging_pages counter
was updated correctly during fork.
Test Results
============
Without the first patch
-----------------------
# [RUN] test_fork_ksm_merging_page_count
not ok 10 ksm_merging_page in child: 32
With the first patch
--------------------
# [RUN] test_fork_ksm_merging_page_count
ok 10 ksm_merging_pages is not inherited after fork
This patch (of 2):
Currently, the KSM-related counters in `mm_struct`, such as
`ksm_merging_pages`, `ksm_rmap_items`, and `ksm_zero_pages`, are inherited
by the child process during fork. This results in inconsistent
accounting.
When a process uses KSM, identical pages are merged and an rmap item is
created for each merged page. The `ksm_merging_pages` and
`ksm_rmap_items` counters are updated accordingly. However, after a fork,
these counters are copied to the child while the corresponding rmap items
are not. As a result, when the child later triggers an unmerge, there are
no rmap items present in the child, so the counters remain stale, leading
to incorrect accounting.
A similar issue exists with `ksm_zero_pages`, which maintains both a
global counter and a per-process counter. During fork, the per-process
counter is inherited by the child, but the global counter is not
incremented. Since the child also references zero pages, the global
counter should be updated as well. Otherwise, during zero-page unmerge,
both the global and per-process counters are decremented, causing the
global counter to become inconsistent.
To fix this, ksm_merging_pages and ksm_rmap_items are reset to 0 during
fork, and the global ksm_zero_pages counter is updated with the
per-process ksm_zero_pages value inherited by the child. This ensures
that KSM statistics remain accurate and reflect the activity of each
process correctly.
Link: https://lkml.kernel.org/r/cover.1758648700.git.donettom@linux.ibm.com
Link: https://lkml.kernel.org/r/7b9870eb67ccc0d79593940d9dbd4a0b39b5d396.1758648700.git.donettom@linux.ibm.com
Fixes: 7609385337 ("ksm: count ksm merging pages for each process")
Fixes: cb4df4cae4 ("ksm: count allocated ksm rmap_items for each process")
Fixes: e2942062e0 ("ksm: count all zero pages placed by KSM")
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Aboorva Devarajan <aboorvad@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: <stable@vger.kernel.org> [6.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ replaced mm_flags_test() calls with test_bit() ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2f13daee2a upstream.
After commit 6f110a5e4f ("Disable SLUB_TINY for build testing"), which
causes CONFIG_KASAN to be enabled in allmodconfig again, arm64
allmodconfig builds with clang-17 and older show an instance of
-Wframe-larger-than (which breaks the build with CONFIG_WERROR=y):
lib/crypto/curve25519-hacl64.c:757:6: error: stack frame size (2336) exceeds limit (2048) in 'curve25519_generic' [-Werror,-Wframe-larger-than]
757 | void curve25519_generic(u8 mypublic[CURVE25519_KEY_SIZE],
| ^
When KASAN is disabled, the stack usage is roughly quartered:
lib/crypto/curve25519-hacl64.c:757:6: error: stack frame size (608) exceeds limit (128) in 'curve25519_generic' [-Werror,-Wframe-larger-than]
757 | void curve25519_generic(u8 mypublic[CURVE25519_KEY_SIZE],
| ^
Using '-Rpass-analysis=stack-frame-layout' shows the following variables
and many, many 8-byte spills when KASAN is enabled:
Offset: [SP-144], Type: Variable, Align: 8, Size: 40
Offset: [SP-464], Type: Variable, Align: 8, Size: 320
Offset: [SP-784], Type: Variable, Align: 8, Size: 320
Offset: [SP-864], Type: Variable, Align: 32, Size: 80
Offset: [SP-896], Type: Variable, Align: 32, Size: 32
Offset: [SP-1016], Type: Variable, Align: 8, Size: 120
When KASAN is disabled, there are still spills but not at many and the
variables list is smaller:
Offset: [SP-192], Type: Variable, Align: 32, Size: 80
Offset: [SP-224], Type: Variable, Align: 32, Size: 32
Offset: [SP-344], Type: Variable, Align: 8, Size: 120
Disable KASAN for this file when using clang-17 or older to avoid
blowing out the stack, clearing up the warning.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250609-curve25519-hacl64-disable-kasan-clang-v1-1-08ea0ac5ccff@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit feb8ae81b2 upstream.
Introduce acpi_gbl_use_global_lock, which allows to skip the Global Lock
initialization. This is useful for systems without Global Lock (such as
loong_arch), so as to avoid error messages during boot phase:
ACPI Error: Could not enable global_lock event (20240827/evxfevnt-182)
ACPI Error: No response from Global Lock hardware, disabling lock (20240827/evglock-59)
Link: https://github.com/acpica/acpica/commit/463cb0fe
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 57295e8354 upstream.
syzkaller found a path where ext4_xattr_inode_update_ref() reads an EA
inode refcount that is already <= 0 and then applies ref_change (often
-1). That lets the refcount underflow and we proceed with a bogus value,
triggering errors like:
EXT4-fs error: EA inode <n> ref underflow: ref_count=-1 ref_change=-1
EXT4-fs warning: ea_inode dec ref err=-117
Make the invariant explicit: if the current refcount is non-positive,
treat this as on-disk corruption, emit ext4_error_inode(), and fail the
operation with -EFSCORRUPTED instead of updating the refcount. Delete the
WARN_ONCE() as negative refcounts are now impossible; keep error reporting
in ext4_error_inode().
This prevents the underflow and the follow-on orphan/cleanup churn.
Reported-by: syzbot+0be4f339a8218d2a5bb1@syzkaller.appspotmail.com
Fixes: https://syzbot.org/bug?extid=0be4f339a8218d2a5bb1
Cc: stable@kernel.org
Co-developed-by: Albin Babu Varghese <albinbabuvarghese20@gmail.com>
Signed-off-by: Albin Babu Varghese <albinbabuvarghese20@gmail.com>
Signed-off-by: Ahmet Eray Karadag <eraykrdg1@gmail.com>
Message-ID: <20250920021342.45575-1-eraykrdg1@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 12e803c882 upstream.
During the movement of a written extent, mext_page_mkuptodate() is
called to read data in the range [from, to) into the page cache and to
update the corresponding buffers. Therefore, we should not wait on any
buffer whose start offset is >= 'to'. Otherwise, it will return -EIO and
fail the extents movement.
$ for i in `seq 3 -1 0`; \
do xfs_io -fs -c "pwrite -b 1024 $((i * 1024)) 1024" /mnt/foo; \
done
$ umount /mnt && mount /dev/pmem1s /mnt # drop cache
$ e4defrag /mnt/foo
e4defrag 1.47.0 (5-Feb-2023)
ext4 defragmentation for /mnt/foo
[1/1]/mnt/foo: 0% [ NG ]
Success: [0/1]
Cc: stable@kernel.org
Fixes: a40759fb16 ("ext4: remove array of buffer_heads from mext_page_mkuptodate()")
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-ID: <20250912105841.1886799-1-yi.zhang@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 46c22a8bb4 upstream.
Currently, our handling of metadata is _ambiguous_ in some scenarios,
that is, we end up returning unknown if the range only covers the
mapping partially.
For example, in the following case:
$ xfs_io -c fsmap -d
0: 254:16 [0..7]: static fs metadata 8
1: 254:16 [8..15]: special 102:1 8
2: 254:16 [16..5127]: special 102:2 5112
3: 254:16 [5128..5255]: special 102:3 128
4: 254:16 [5256..5383]: special 102:4 128
5: 254:16 [5384..70919]: inodes 65536
6: 254:16 [70920..70967]: unknown 48
...
$ xfs_io -c fsmap -d 24 33
0: 254:16 [24..39]: unknown 16 <--- incomplete reporting
$ xfs_io -c fsmap -d 24 33 (With patch)
0: 254:16 [16..5127]: special 102:2 5112
This is because earlier in ext4_getfsmap_meta_helper, we end up ignoring
any extent that starts before our queried range, but overlaps it. While
the man page [1] is a bit ambiguous on this, this fix makes the output
make more sense since we are anyways returning an "unknown" extent. This
is also consistent to how XFS does it:
$ xfs_io -c fsmap -d
...
6: 254:16 [104..127]: free space 24
7: 254:16 [128..191]: inodes 64
...
$ xfs_io -c fsmap -d 137 150
0: 254:16 [128..191]: inodes 64 <-- full extent returned
[1] https://man7.org/linux/man-pages/man2/ioctl_getfsmap.2.html
Reported-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Message-ID: <023f37e35ee280cd9baac0296cbadcbe10995cab.1757058211.git.ojaswin@linux.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9d80eaa1a1 upstream.
After running a stress test combined with fault injection,
we performed fsck -a followed by fsck -fn on the filesystem
image. During the second pass, fsck -fn reported:
Inode 131512, end of extent exceeds allowed value
(logical block 405, physical block 1180540, len 2)
This inode was not in the orphan list. Analysis revealed the
following call chain that leads to the inconsistency:
ext4_da_write_end()
//does not update i_disksize
ext4_punch_hole()
//truncate folio, keep size
ext4_page_mkwrite()
ext4_block_page_mkwrite()
ext4_block_write_begin()
ext4_get_block()
//insert written extent without update i_disksize
journal commit
echo 1 > /sys/block/xxx/device/delete
da-write path updates i_size but does not update i_disksize. Then
ext4_punch_hole truncates the da-folio yet still leaves i_disksize
unchanged(in the ext4_update_disksize_before_punch function, the
condition offset + len < size is met). Then ext4_page_mkwrite sees
ext4_nonda_switch return 1 and takes the nodioread_nolock path, the
folio about to be written has just been punched out, and it’s offset
sits beyond the current i_disksize. This may result in a written
extent being inserted, but again does not update i_disksize. If the
journal gets committed and then the block device is yanked, we might
run into this. It should be noted that replacing ext4_punch_hole with
ext4_zero_range in the call sequence may also trigger this issue, as
neither will update i_disksize under these circumstances.
To fix this, we can modify ext4_update_disksize_before_punch to
increase i_disksize to min(i_size, offset + len) when both i_size and
(offset + len) are greater than i_disksize.
Cc: stable@kernel.org
Signed-off-by: Yongjian Sun <sunyongjian1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Message-ID: <20250911133024.1841027-1-sunyongjian@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0a6ce20c15 upstream.
In principle orphan file can be arbitrarily large. However orphan replay
needs to traverse it all and we also pin all its buffers in memory. Thus
filesystems with absurdly large orphan files can lead to big amounts of
memory consumed. Limit orphan file size to a sane value and also use
kvmalloc() for allocating array of block descriptor structures to avoid
large order allocations for sane but large orphan files.
Reported-by: syzbot+0b92850d68d9b12934f5@syzkaller.appspotmail.com
Fixes: 02f310fcf4 ("ext4: Speedup ext4 orphan inode handling")
Cc: stable@kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Message-ID: <20250909112206.10459-2-jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d8b90e6387 upstream.
The implicit __GFP_NOFAIL flag in ext4_sb_bread() was removed in commit
8a83ac5494 ("ext4: call bdev_getblk() from sb_getblk_gfp()"), meaning
the function can now fail under memory pressure.
Most callers of ext4_sb_bread() propagate the error to userspace and do not
remount the filesystem read-only. However, ext4_free_branches() handles
ext4_sb_bread() failure by remounting the filesystem read-only.
This implies that an ext3 filesystem (mounted via the ext4 driver) could be
forcibly remounted read-only due to a transient page allocation failure,
which is unacceptable.
To mitigate this, introduce a new helper function, ext4_sb_bread_nofail(),
which explicitly uses __GFP_NOFAIL, and use it in ext4_free_branches().
Fixes: 8a83ac5494 ("ext4: call bdev_getblk() from sb_getblk_gfp()")
Cc: stable@kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a082e4b4d0 upstream.
When v3 NLM request finds a conflicting delegation, it triggers
a delegation recall and nfsd_open fails with EAGAIN. nfsd_open
then translates EAGAIN into nfserr_jukebox. In nlm_fopen, instead
of returning nlm_failed for when there is a conflicting delegation,
drop this NLM request so that the client retries. Once delegation
is recalled and if a local lock is claimed, a retry would lead to
nfsd returning a nlm_lck_blocked error or a successful nlm lock.
Fixes: d343fce148 ("[PATCH] knfsd: Allow lockd to drop replies as appropriate")
Cc: stable@vger.kernel.org # v6.6
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ab1c282c01 upstream.
Commit 5304877936 ("NFSD: Fix strncpy() fortify warning") replaced
strncpy(,, sizeof(..)) with strlcpy(,, sizeof(..) - 1), but strlcpy()
already guaranteed NUL-termination of the destination buffer and
subtracting one byte potentially truncated the source string.
The incorrect size was then carried over in commit 72f78ae00a ("NFSD:
move from strlcpy with unused retval to strscpy") when switching from
strlcpy() to strscpy().
Fix this off-by-one error by using the full size of the destination
buffer again.
Cc: stable@vger.kernel.org
Fixes: 5304877936 ("NFSD: Fix strncpy() fortify warning")
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e18190b7e9 upstream.
damon_lru_sort_apply_parameters() allocates a new DAMON context, stages
user-specified DAMON parameters on it, and commits to running DAMON
context at once, using damon_commit_ctx(). The code is, however, directly
updating the monitoring attributes of the running context. And the
attributes are over-written by later damon_commit_ctx() call. This means
that the monitoring attributes parameters are not really working. Fix the
wrong use of the parameter context.
Link: https://lkml.kernel.org/r/20250916031549.115326-1-sj@kernel.org
Fixes: a309694364 ("mm/damon/lru_sort: use damon_commit_ctx()")
Signed-off-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: <stable@vger.kernel.org> [6.11+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b93af2cc8e upstream.
DAMON's virtual address space operation set implementation (vaddr) calls
pte_offset_map_lock() inside the page table walk callback function. This
is for reading and writing page table accessed bits. If
pte_offset_map_lock() fails, it retries by returning the page table walk
callback function with ACTION_AGAIN.
pte_offset_map_lock() can continuously fail if the target is a pmd
migration entry, though. Hence it could cause an infinite page table walk
if the migration cannot be done until the page table walk is finished.
This indeed caused a soft lockup when CPU hotplugging and DAMON were
running in parallel.
Avoid the infinite loop by simply not retrying the page table walk. DAMON
is promising only a best-effort accuracy, so missing access to such pages
is no problem.
Link: https://lkml.kernel.org/r/20250930004410.55228-1-sj@kernel.org
Fixes: 7780d04046 ("mm/pagewalkers: ACTION_AGAIN if pte_offset_map_lock() fails")
Signed-off-by: SeongJae Park <sj@kernel.org>
Reported-by: Xinyu Zheng <zhengxinyu6@huawei.com>
Closes: https://lore.kernel.org/20250918030029.2652607-1-zhengxinyu6@huawei.com
Acked-by: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org> [6.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6a204d4b14 upstream.
Commit 524c48072e ("mm/page_alloc: rename ALLOC_HIGH to
ALLOC_MIN_RESERVE") is the start of a series that explains how __GFP_HIGH,
which implies ALLOC_MIN_RESERVE, is going to be used instead of
__GFP_ATOMIC for high atomic reserves.
Commit eb2e2b425c ("mm/page_alloc: explicitly record high-order atomic
allocations in alloc_flags") introduced ALLOC_HIGHATOMIC for such
allocations of order higher than 0. It still used __GFP_ATOMIC, though.
Then, commit 1ebbb21811 ("mm/page_alloc: explicitly define how
__GFP_HIGH non-blocking allocations accesses reserves") just turned that
check for !__GFP_DIRECT_RECLAIM, ignoring that high atomic reserves were
expected to test for __GFP_HIGH.
This leads to high atomic reserves being added for high-order GFP_NOWAIT
allocations and others that clear __GFP_DIRECT_RECLAIM, which is
unexpected. Later, those reserves lead to 0-order allocations going to
the slow path and starting reclaim.
From /proc/pagetypeinfo, without the patch:
Node 0, zone DMA, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type HighAtomic 1 8 10 9 7 3 0 0 0 0 0
Node 0, zone Normal, type HighAtomic 64 20 12 5 0 0 0 0 0 0 0
With the patch:
Node 0, zone DMA, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Link: https://lkml.kernel.org/r/20250814172245.1259625-1-cascardo@igalia.com
Fixes: 1ebbb21811 ("mm/page_alloc: explicitly define how __GFP_HIGH non-blocking allocations accesses reserves")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Tested-by: Helen Koike <koike@igalia.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: NeilBrown <neilb@suse.de>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4038016397 upstream.
When object extension vector allocation fails, we set slab->obj_exts to
OBJEXTS_ALLOC_FAIL to indicate the failure. Later, once the vector is
successfully allocated, we will use this flag to mark codetag references
stored in that vector as empty to avoid codetag warnings.
slab_obj_exts() used to retrieve the slab->obj_exts vector pointer checks
slab->obj_exts for being either NULL or a pointer with MEMCG_DATA_OBJEXTS
bit set. However it does not handle the case when slab->obj_exts equals
OBJEXTS_ALLOC_FAIL. Add the missing condition to avoid extra warning.
Fixes: 09c46563ff ("codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext allocations")
Reported-by: Shakeel Butt <shakeel.butt@linux.dev>
Closes: https://lore.kernel.org/all/jftidhymri2af5u3xtcqry3cfu6aqzte3uzlznhlaylgrdztsi@5vpjnzpsemf5/
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: stable@vger.kernel.org # v6.10+
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fa7a0a53ee upstream.
If the decompressor is compiled with clang this can lead to the following
warning:
In file included from arch/s390/boot/startup.c:4:
...
In file included from ./include/linux/pgtable.h:6:
./arch/s390/include/asm/pgtable.h:2065:48: warning: passing 'unsigned long *' to parameter of type
'long *' converts between pointers to integer types with different sign [-Wpointer-sign]
2065 | value = __atomic64_or_barrier(PGSTE_PCL_BIT, ptr);
Add -Wno-pointer-sign to the decompressor compile flags, like it is also
done for the kernel. This is similar to what was done for x86 to address
the same problem [1].
[1] commit dca5203e3f ("x86/boot: Add -Wno-pointer-sign to KBUILD_CFLAGS")
Cc: stable@vger.kernel.org
Reported-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8f4ed0ce48 upstream.
Currently, if CCW request creation fails with -EINVAL, the DASD driver
returns BLK_STS_IOERR to the block layer.
This can happen, for example, when a user-space application such as QEMU
passes a misaligned buffer, but the original cause of the error is
masked as a generic I/O error.
This patch changes the behavior so that -EINVAL is returned as
BLK_STS_INVAL, allowing user space to properly detect alignment issues
instead of interpreting them as I/O errors.
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Cc: stable@vger.kernel.org #6.11+
Signed-off-by: Jaehoon Kim <jhkim@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 130e6de621 upstream.
The block layer validates buffer alignment using the device's
dma_alignment value. If dma_alignment is smaller than
logical_block_size(bp_block) -1, misaligned buffer incorrectly pass
validation and propagate to the lower-level driver.
This patch adjusts dma_alignment to be at least logical_block_size -1,
ensuring that misalignment buffers are properly rejected at the block
layer and do not reach the DASD driver unnecessarily.
Fixes: 2a07bb64d8 ("s390/dasd: Remove DMA alignment")
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Cc: stable@vger.kernel.org #6.11+
Signed-off-by: Jaehoon Kim <jhkim@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 008385efd0 upstream.
The previous commit adds an exception for the C-flag case. The
'mptcp_join.sh' selftest is extended to validate this case.
In this subtest, there is a typical CDN deployment with a client where
MPTCP endpoints have been 'automatically' configured:
- the server set net.mptcp.allow_join_initial_addr_port=0
- the client has multiple 'subflow' endpoints, and the default limits:
not accepting ADD_ADDRs.
Without the parent patch, the client is not able to establish new
subflows using its 'subflow' endpoints. The parent commit fixes that.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
Fixes: df377be387 ("mptcp: add deny_join_id0 in mptcp_options_received")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250925-net-next-mptcp-c-flag-laminar-v1-2-ad126cc47c6b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 27b1fd6201 upstream.
Filter out the register forms of 0F 01 when determining whether or not to
emulate in response to a potential UMIP violation #GP, as SGDT and SIDT only
accept memory operands. The register variants of 0F 01 are used to encode
instructions for things like VMX and SGX, i.e. not checking the Mod field
would cause the kernel to incorrectly emulate on #GP, e.g. due to a CPL
violation on VMLAUNCH.
Fixes: 1e5db22369 ("x86/umip: Add emulation code for UMIP instructions")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 32278c6779 upstream.
When checking for a potential UMIP violation on #GP, verify the decoder found
at least two opcode bytes to avoid false positives when the kernel encounters
an unknown instruction that starts with 0f. Because the array of opcode.bytes
is zero-initialized by insn_init(), peeking at bytes[1] will misinterpret
garbage as a potential SLDT or STR instruction, and can incorrectly trigger
emulation.
E.g. if a VPALIGNR instruction
62 83 c5 05 0f 08 ff vpalignr xmm17{k5},xmm23,XMMWORD PTR [r8],0xff
hits a #GP, the kernel emulates it as STR and squashes the #GP (and corrupts
the userspace code stream).
Arguably the check should look for exactly two bytes, but no three byte
opcodes use '0f 00 xx' or '0f 01 xx' as an escape, i.e. it should be
impossible to get a false positive if the first two opcode bytes match '0f 00'
or '0f 01'. Go with a more conservative check with respect to the existing
code to minimize the chances of breaking userspace, e.g. due to decoder
weirdness.
Analyzed by Nick Bray <ncbray@google.com>.
Fixes: 1e5db22369 ("x86/umip: Add emulation code for UMIP instructions")
Reported-by: Dan Snyder <dansnyder@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3da01ffe1a upstream.
The FRED specification has been changed in v9.0 to state that there
is no need for FRED event handlers to begin with ENDBR64, because
in the presence of supervisor indirect branch tracking, FRED event
delivery does not enter the WAIT_FOR_ENDBRANCH state.
As a result, remove ENDBR64 from FRED entry points.
Then add ANNOTATE_NOENDBR to indicate that FRED entry points will
never be used for indirect calls to suppress an objtool warning.
This change implies that any indirect CALL/JMP to FRED entry points
causes #CP in the presence of supervisor indirect branch tracking.
Credit goes to Jennifer Miller <jmill@asu.edu> and other contributors
from Arizona State University whose research shows that placing ENDBR
at entry points has negative value thus led to this change.
Note: This is obviously an incompatible change to the FRED
architecture. But, it's OK because there no FRED systems out in the
wild today. All production hardware and late pre-production hardware
will follow the FRED v9 spec and be compatible with this approach.
[ dhansen: add note to changelog about incompatibility ]
Fixes: 14619d912b ("x86/fred: FRED entry/exit and dispatch code")
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Link: https://lore.kernel.org/linux-hardening/Z60NwR4w%2F28Z7XUa@ubun/
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20250716063320.1337818-1-xin%40zytor.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1ad55767e7 upstream.
cqspi_read_setup() and cqspi_write_setup() program the address width as
the last step in the setup. This is likely to be immediately followed by
a DAC region read/write. On TI K3 SoCs the DAC region is on a different
endpoint from the register region. This means that the order of the two
operations is not guaranteed, and they might be reordered at the
interconnect level. It is possible that the DAC read/write goes through
before the address width update goes through. In this situation if the
previous command used a different address width the OSPI command is sent
with the wrong number of address bytes, resulting in an invalid command
and undefined behavior.
Read back the size register to make sure the write gets flushed before
accessing the DAC region.
Fixes: 1406234105 ("mtd: spi-nor: Add driver for Cadence Quad SPI Flash Controller")
CC: stable@vger.kernel.org
Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
Signed-off-by: Pratyush Yadav <pratyush@kernel.org>
Signed-off-by: Santhosh Kumar K <s-k6@ti.com>
Message-ID: <20250905185958.3575037-3-s-k6@ti.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 29e0b471cc upstream.
cqspi_indirect_read_execute() and cqspi_indirect_write_execute() first
set the enable bit on APB region and then start reading/writing to the
AHB region. On TI K3 SoCs these regions lie on different endpoints. This
means that the order of the two operations is not guaranteed, and they
might be reordered at the interconnect level.
It is possible for the AHB write to be executed before the APB write to
enable the indirect controller, causing the transaction to be invalid
and the write erroring out. Read back the APB region write before
accessing the AHB region to make sure the write got flushed and the race
condition is eliminated.
Fixes: 1406234105 ("mtd: spi-nor: Add driver for Cadence Quad SPI Flash Controller")
CC: stable@vger.kernel.org
Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
Signed-off-by: Pratyush Yadav <pratyush@kernel.org>
Signed-off-by: Santhosh Kumar K <s-k6@ti.com>
Message-ID: <20250905185958.3575037-2-s-k6@ti.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 42f9c66a6d upstream.
Tegra already defines all BARs except BAR0 as BAR_RESERVED. This is
sufficient for pci-epf-test to not allocate backing memory and to not call
set_bar() for those BARs. However, marking a BAR as BAR_RESERVED does not
mean that the BAR gets disabled.
The host side driver, pci_endpoint_test, simply does an ioremap for all
enabled BARs and will run tests against all enabled BARs, so it will run
tests against the BARs marked as BAR_RESERVED.
After running the BAR tests (which will write to all enabled BARs), the
inbound address translation is broken. This is because the tegra controller
exposes the ATU Port Logic Structure in BAR4, so when BAR4 is written, the
inbound address translation settings get overwritten.
To avoid this, implement the dw_pcie_ep_ops .init() callback and start off
by disabling all BARs (pci-epf-test will later enable/configure BARs that
are not defined as BAR_RESERVED).
This matches the behavior of other PCIe endpoint drivers: dra7xx, imx6,
layerscape-ep, artpec6, dw-rockchip, qcom-ep, rcar-gen4, and uniphier-ep.
With this, the PCI endpoint kselftest test case CONSECUTIVE_BAR_TEST (which
was specifically made to detect address translation issues) passes.
Fixes: c57247f940 ("PCI: tegra: Add support for PCIe endpoint mode in Tegra194")
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250922140822.519796-7-cassel@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f8c9ad46b0 upstream.
The return value from tegra_bpmp_transfer() indicates the success or
failure of the IPC transaction with BPMP. If the transaction succeeded, we
also need to check the actual command's result code.
If we don't have error handling for tegra_bpmp_transfer(), we will set the
pcie->ep_state to EP_STATE_ENABLED even when the tegra_bpmp_transfer()
command fails. Thus, the pcie->ep_state will get out of sync with reality,
and any further PERST# assert + deassert will be a no-op and will not
trigger the hardware initialization sequence.
This is because pex_ep_event_pex_rst_deassert() checks the current
pcie->ep_state, and does nothing if the current state is already
EP_STATE_ENABLED.
Thus, it is important to have error handling for tegra_bpmp_transfer(),
such that the pcie->ep_state can not get out of sync with reality, so that
we will try to initialize the hardware not only during the first PERST#
assert + deassert, but also during any succeeding PERST# assert + deassert.
One example where this fix is needed is when using a rock5b as host.
During the initial PERST# assert + deassert (triggered by the bootloader on
the rock5b) pex_ep_event_pex_rst_deassert() will get called, but for some
unknown reason, the tegra_bpmp_transfer() call to initialize the PHY fails.
Once Linux has been loaded on the rock5b, the PCIe driver will once again
assert + deassert PERST#. However, without tegra_bpmp_transfer() error
handling, this second PERST# assert + deassert will not trigger the
hardware initialization sequence.
With tegra_bpmp_transfer() error handling, the second PERST# assert +
deassert will once again trigger the hardware to be initialized and this
time the tegra_bpmp_transfer() succeeds.
Fixes: c57247f940 ("PCI: tegra: Add support for PCIe endpoint mode in Tegra194")
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
[cassel: improve commit log]
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250922140822.519796-8-cassel@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b640d42a6a upstream.
The pci_epc_raise_irq() supplies a MSI or MSI-X interrupt number in range
(1-N), as per the pci_epc_raise_irq() kdoc, where N is 32 for MSI.
But tegra_pcie_ep_raise_msi_irq() incorrectly uses the interrupt number as
the MSI vector. This causes wrong MSI vector to be triggered, leading to
the failure of PCI endpoint Kselftest MSI_TEST test case.
To fix this issue, convert the interrupt number to MSI vector.
Fixes: c57247f940 ("PCI: tegra: Add support for PCIe endpoint mode in Tegra194")
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250922140822.519796-6-cassel@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0a8f173d9d upstream.
The pmsr_lock spinlock used to be necessary to synchronize access to the
PMSR register, because that access could have been triggered from either
config space access in rcar_pcie_config_access() or an exception handler
rcar_pcie_aarch32_abort_handler().
The rcar_pcie_aarch32_abort_handler() case is no longer applicable since
commit 6e36203bc1 ("PCI: rcar: Use PCI_SET_ERROR_RESPONSE after read
which triggered an exception"), which performs more accurate, controlled
invocation of the exception, and a fixup.
This leaves rcar_pcie_config_access() as the only call site from which
rcar_pcie_wakeup() is called. The rcar_pcie_config_access() can only be
called from the controller struct pci_ops .read and .write callbacks,
and those are serialized in drivers/pci/access.c using raw spinlock
'pci_lock' . It should be noted that CONFIG_PCI_LOCKLESS_CONFIG is never
set on this platform.
Since the 'pci_lock' is a raw spinlock , and the 'pmsr_lock' is not a
raw spinlock, this constellation triggers 'BUG: Invalid wait context'
with CONFIG_PROVE_RAW_LOCK_NESTING=y .
Remove the pmsr_lock to fix the locking.
Fixes: a115b1bd3a ("PCI: rcar: Add L1 link state fix into data abort hook")
Reported-by: Duy Nguyen <duy.nguyen.rh@renesas.com>
Reported-by: Thuan Nguyen <thuan.nguyen-hong@banvien.com.vn>
Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250909162707.13927-1-marek.vasut+renesas@mailbox.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f842d3313b upstream.
The Cadence PCIe Controller integrated in the TI K3 SoCs supports both
Root-Complex and Endpoint modes of operation. The Glue Layer allows
"strapping" the Mode of operation of the Controller, the Link Speed
and the Link Width. This is enabled by programming the "PCIEn_CTRL"
register (n corresponds to the PCIe instance) within the CTRL_MMR
memory-mapped register space. The "reset-values" of the registers are
also different depending on the mode of operation.
Since the PCIe Controller latches onto the "reset-values" immediately
after being powered on, if the Glue Layer configuration is not done while
the PCIe Controller is off, it will result in the PCIe Controller latching
onto the wrong "reset-values". In practice, this will show up as a wrong
representation of the PCIe Controller's capability structures in the PCIe
Configuration Space. Some such capabilities which are supported by the PCIe
Controller in the Root-Complex mode but are incorrectly latched onto as
being unsupported are:
- Link Bandwidth Notification
- Alternate Routing ID (ARI) Forwarding Support
- Next capability offset within Advanced Error Reporting (AER) capability
Fix this by powering off the PCIe Controller before programming the "strap"
settings and powering it on after that. The runtime PM APIs namely
pm_runtime_put_sync() and pm_runtime_get_sync() will decrement and
increment the usage counter respectively, causing GENPD to power off and
power on the PCIe Controller.
Fixes: f3e25911a4 ("PCI: j721e: Add TI J721E PCIe driver")
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250908120828.1471776-1-s-vadapalli@ti.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1cbc5e25fb upstream.
Upon failure to recover from a PCIe error through AER, DPC or EDR, a
uevent is sent to inform user space about disconnection of the bridge
whose subordinate devices failed to recover.
However the bridge itself is not disconnected. Instead, a uevent should
be sent for each of the subordinate devices.
Only if the "bridge" happens to be a Root Complex Event Collector or
Integrated Endpoint does it make sense to send a uevent for it (because
there are no subordinate devices).
Right now if there is a mix of subordinate devices with and without
pci_error_handlers, a BEGIN_RECOVERY event is sent for those with
pci_error_handlers but no FAILED_RECOVERY event is ever sent for them
afterwards. Fix it.
Fixes: 856e1eb9bd ("PCI/AER: Add uevents in AER and EEH error/resume")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v4.16+
Link: https://patch.msgid.link/68fc527a380821b5d861dd554d2ce42cb739591c.1755008151.git.lukas@wunner.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 05703271c3 upstream.
Before disabling SR-IOV via config space accesses to the parent PF,
sriov_disable() first removes the PCI devices representing the VFs.
Since commit 9d16947b75 ("PCI: Add global pci_lock_rescan_remove()")
such removal operations are serialized against concurrent remove and
rescan using the pci_rescan_remove_lock. No such locking was ever added
in sriov_disable() however. In particular when commit 18f9e9d150
("PCI/IOV: Factor out sriov_add_vfs()") factored out the PCI device
removal into sriov_del_vfs() there was still no locking around the
pci_iov_remove_virtfn() calls.
On s390 the lack of serialization in sriov_disable() may cause double
remove and list corruption with the below (amended) trace being observed:
PSW: 0704c00180000000 0000000c914e4b38 (klist_put+56)
GPRS: 000003800313fb48 0000000000000000 0000000100000001 0000000000000001
00000000f9b520a8 0000000000000000 0000000000002fbd 00000000f4cc9480
0000000000000001 0000000000000000 0000000000000000 0000000180692828
00000000818e8000 000003800313fe2c 000003800313fb20 000003800313fad8
#0 [3800313fb20] device_del at c9158ad5c
#1 [3800313fb88] pci_remove_bus_device at c915105ba
#2 [3800313fbd0] pci_iov_remove_virtfn at c9152f198
#3 [3800313fc28] zpci_iov_remove_virtfn at c90fb67c0
#4 [3800313fc60] zpci_bus_remove_device at c90fb6104
#5 [3800313fca0] __zpci_event_availability at c90fb3dca
#6 [3800313fd08] chsc_process_sei_nt0 at c918fe4a2
#7 [3800313fd60] crw_collect_info at c91905822
#8 [3800313fe10] kthread at c90feb390
#9 [3800313fe68] __ret_from_fork at c90f6aa64
#10 [3800313fe98] ret_from_fork at c9194f3f2.
This is because in addition to sriov_disable() removing the VFs, the
platform also generates hot-unplug events for the VFs. This being the
reverse operation to the hotplug events generated by sriov_enable() and
handled via pdev->no_vf_scan. And while the event processing takes
pci_rescan_remove_lock and checks whether the struct pci_dev still exists,
the lack of synchronization makes this checking racy.
Other races may also be possible of course though given that this lack of
locking persisted so long observable races seem very rare. Even on s390 the
list corruption was only observed with certain devices since the platform
events are only triggered by config accesses after the removal, so as long
as the removal finished synchronously they would not race. Either way the
locking is missing so fix this by adding it to the sriov_del_vfs() helper.
Just like PCI rescan-remove, locking is also missing in sriov_add_vfs()
including for the error case where pci_stop_and_remove_bus_device() is
called without the PCI rescan-remove lock being held. Even in the non-error
case, adding new PCI devices and buses should be serialized via the PCI
rescan-remove lock. Add the necessary locking.
Fixes: 18f9e9d150 ("PCI/IOV: Factor out sriov_add_vfs()")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
Reviewed-by: Julian Ruess <julianr@linux.ibm.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250826-pci_fix_sriov_disable-v1-1-2d0bc938f2a3@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 48991e4935 upstream.
The "max_link_width", "current_link_speed", "current_link_width",
"secondary_bus_number", and "subordinate_bus_number" sysfs files all access
config registers, but they don't check the runtime PM state. If the device
is in D3cold or a parent bridge is suspended, we may see -EINVAL, bogus
values, or worse, depending on implementation details.
Wrap these access in pci_config_pm_runtime_{get,put}() like most of the
rest of the similar sysfs attributes.
Notably, "max_link_speed" does not access config registers; it returns a
cached value since d2bd39c045 ("PCI: Store all PCIe Supported Link
Speeds").
Fixes: 56c1af4606 ("PCI: Add sysfs max_link_speed/width, current_link_speed/width, etc")
Signed-off-by: Brian Norris <briannorris@google.com>
Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250924095711.v2.1.Ibb5b6ca1e2c059e04ec53140cd98a44f2684c668@changeid
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 98a4f5b735 upstream.
When PCIe has been set up by the bootloader, the ecam_size field in the
E_ECAM_CONTROL register already contains a value.
The driver previously programmed it to 0xc (for 16 busses; 16 MB), but
bumped to 0x10 (for 256 busses; 256 MB) by the commit 2fccd11518 ("PCI:
xilinx-nwl: Modify ECAM size to enable support for 256 buses").
Regardless of what the bootloader has programmed, the driver ORs in a
new maximal value without doing a proper RMW sequence. This can lead to
problems.
For example, if the bootloader programs in 0xc and the driver uses 0x10,
the ORed result is 0x1c, which is beyond the ecam_max_size limit of 0x10
(from E_ECAM_CAPABILITIES).
Avoid the problems by doing a proper RMW.
Fixes: 2fccd11518 ("PCI: xilinx-nwl: Modify ECAM size to enable support for 256 buses")
Signed-off-by: Jani Nurminen <jani.nurminen@windriver.com>
[mani: added stable tag]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/e83a2af2-af0b-4670-bcf5-ad408571c2b0@windriver.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a001cd248a upstream.
Add "extern" to the glibc-defined weak rseq symbols to convert the rseq
selftest's usage from weak symbol definitions to weak symbol _references_.
Effectively re-defining the glibc symbols wreaks havoc when building with
-fno-common, e.g. generates segfaults when running multi-threaded programs,
as dynamically linked applications end up with multiple versions of the
symbols.
Building with -fcommon, which until recently has the been the default for
GCC and clang, papers over the bug by allowing the linker to resolve the
weak/tentative definition to glibc's "real" definition.
Note, the symbol itself (or rather its address), not the value of the
symbol, is set to 0/NULL for unresolved weak symbol references, as the
symbol doesn't exist and thus can't have a value. Check for a NULL rseq
size pointer to handle the scenario where the test is statically linked
against a libc that doesn't support rseq in any capacity.
Fixes: 3bcbc20942 ("selftests/rseq: Play nice with binaries statically linked against glibc 2.35+")
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: stable@vger.kernel.org
Closes: https://lore.kernel.org/all/87frdoybk4.ffs@tglx
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 795cda8338 upstream.
As described in the old comment dating back to
commit 6610e0893b ("RTC: Rework RTC code to use timerqueue for events")
from 2010, we have been living with a race window when setting alarm
with an expiry in the near future (i.e. next second).
With 1 second resolution, it can happen that the second ticks after the
check for the timer having expired, but before the alarm is actually set.
When this happen, no alarm IRQ is generated, at least not with some RTC
chips (isl12022 is an example of this).
With UIE RTC timer being implemented on top of alarm irq, being re-armed
every second, UIE will occasionally fail to work, as an alarm irq lost
due to this race will stop the re-arming loop.
For now, I have limited the additional expiry check to only be done for
alarms set to next seconds. I expect it should be good enough, although I
don't know if we can now for sure that systems with loads could end up
causing the same problems for alarms set 2 seconds or even longer in the
future.
I haven't been able to reproduce the problem with this check in place.
Cc: stable@vger.kernel.org
Signed-off-by: Esben Haabendal <esben@geanix.com>
Link: https://lore.kernel.org/r/20250516-rtc-uie-irq-fixes-v2-1-3de8e530a39e@geanix.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6744085079 upstream.
The of_platform_populate() call at the end of the function has a
possible failure path, causing a resource leak.
Replace of_iomap() with devm_platform_ioremap_resource() to ensure
automatic cleanup of srom->reg_base.
This issue was detected by smatch static analysis:
drivers/memory/samsung/exynos-srom.c:155 exynos_srom_probe()warn:
'srom->reg_base' from of_iomap() not released on lines: 155.
Fixes: 8ac2266d88 ("memory: samsung: exynos-srom: Add support for bank configuration")
Cc: stable@vger.kernel.org
Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
Link: https://lore.kernel.org/r/20250806025538.306593-1-zhen.ni@easystack.cn
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fef12d9f5b upstream.
For multiple block read, the current implementation, transfer packet
includes cmd53 + cmd53 response + block nums*(1byte token +
block length bytes payload + 2bytes CRC + 1byte transfer), the last
1byte transfer of every block is not needed, so remove it.
Why doesn't multiple block read need CRC ack?
For read operation, host side get the payload and CRC value, then
will only check the CRC value to confirm if the data is correct or
not, but not send CRC ack to card. If the data is correct, save it,
or discard it and retransmit if data is error, so the last 1byte
transfer of every block make no sense.
What's the side effect of this 1byte transfer?
As the SPI is full duplex, if add this redundant 1byte transfer, SDIO
card side take it as the token of next block, then all the next sub
blocks sequence distort.
Signed-off-by: Rex Chen <rex.chen_1@nxp.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250728082230.1037917-3-rex.chen_1@nxp.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 07ca98f906 upstream.
Turned out certain clearly invalid values passed in xdp_desc from
userspace can pass xp_{,un}aligned_validate_desc() and then lead
to UBs or just invalid frames to be queued for xmit.
desc->len close to ``U32_MAX`` with a non-zero pool->tx_metadata_len
can cause positive integer overflow and wraparound, the same way low
enough desc->addr with a non-zero pool->tx_metadata_len can cause
negative integer overflow. Both scenarios can then pass the
validation successfully.
This doesn't happen with valid XSk applications, but can be used
to perform attacks.
Always promote desc->len to ``u64`` first to exclude positive
overflows of it. Use explicit check_{add,sub}_overflow() when
validating desc->addr (which is ``u64`` already).
bloat-o-meter reports a little growth of the code size:
add/remove: 0/0 grow/shrink: 2/1 up/down: 60/-16 (44)
Function old new delta
xskq_cons_peek_desc 299 330 +31
xsk_tx_peek_release_desc_batch 973 1002 +29
xsk_generic_xmit 3148 3132 -16
but hopefully this doesn't hurt the performance much.
Fixes: 341ac980ea ("xsk: Support tx_metadata_len")
Cc: stable@vger.kernel.org # 6.8+
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20251008165659.4141318-1-aleksander.lobakin@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 302c04110f upstream.
Once of_device_register() failed, we should call put_device() to
decrement reference count for cleanup. Or it could cause memory leak.
So fix this by calling put_device(), then the name can be freed in
kobject_cleanup().
Calling path: of_device_register() -> of_device_add() -> device_add().
As comment of device_add() says, 'if device_add() succeeds, you should
call device_del() when you want to get rid of it. If device_add() has
not succeeded, use only put_device() to drop the reference count'.
Found by code review.
Cc: stable@vger.kernel.org
Fixes: cf44bbc26c ("[SPARC]: Beginnings of generic of_device framework.")
Signed-off-by: Ma Ke <make24@iscas.ac.cn>
Reviewed-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6fd44a481b upstream.
An attempt to exercise sparc hugetlb code in a sun4u-based guest
running under qemu results in the guest hanging due to being stuck
in a trap loop. This is due to invalid hugetlb TTEs being installed
that do not have the expected _PAGE_PMD_HUGE and page size bits set.
Although the breakage has gone apparently unnoticed for several years,
fix it now so there is the option to exercise sparc hugetlb code under
qemu. This can be useful because sun4v support in qemu does not support
linux guests currently and sun4v-based hardware resources may not be
readily available.
Fix tested with a 6.15.2 and 6.16-rc6 kernels by running libhugetlbfs
tests on a qemu guest running Debian 13.
Fixes: c7d9f77d33 ("sparc64: Multi-page size support")
Cc: stable@vger.kernel.org
Signed-off-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Reviewed-by: Andreas Larsson <andreas@gaisler.com>
Link: https://lore.kernel.org/r/20250716012446.10357-1-anthony.yznaga@oracle.com
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b5f717b31b upstream.
A build warning was triggered due to excessive stack usage in
sd_revalidate_disk():
drivers/scsi/sd.c: In function ‘sd_revalidate_disk.isra’:
drivers/scsi/sd.c:3824:1: warning: the frame size of 1160 bytes is larger than 1024 bytes [-Wframe-larger-than=]
This is caused by a large local struct queue_limits (~400B) allocated on
the stack. Replacing it with a heap allocation using kmalloc()
significantly reduces frame usage. Kernel stack is limited (~8 KB), and
allocating large structs on the stack is discouraged. As the function
already performs heap allocations (e.g. for buffer), this change fits
well.
Fixes: 804e498e04 ("sd: convert to the atomic queue limits API")
Cc: stable@vger.kernel.org
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Abinash Singh <abinashsinghlalotra@gmail.com>
Link: https://lore.kernel.org/r/20250825183940.13211-2-abinashsinghlalotra@gmail.com
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b81296591c upstream.
Replace kmalloc() followed by copy_from_user() with memdup_user() to fix
a memory leak that occurs when copy_from_user(buff[sg_used],,) fails and
the 'cleanup1:' path does not free the memory for 'buff[sg_used]'. Using
memdup_user() avoids this by freeing the memory internally.
Since memdup_user() already allocates memory, use kzalloc() in the else
branch instead of manually zeroing 'buff[sg_used]' using memset(0).
Cc: stable@vger.kernel.org
Fixes: edd163687e ("[SCSI] hpsa: add driver for HP Smart Array controllers.")
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Acked-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8fd5485fb4 upstream.
When a CPU chooses to call push_dl_task and picks a task to push to
another CPU's runqueue then it will call find_lock_later_rq method
which would take a double lock on both CPUs' runqueues. If one of the
locks aren't readily available, it may lead to dropping the current
runqueue lock and reacquiring both the locks at once. During this window
it is possible that the task is already migrated and is running on some
other CPU. These cases are already handled. However, if the task is
migrated and has already been executed and another CPU is now trying to
wake it up (ttwu) such that it is queued again on the runqeue
(on_rq is 1) and also if the task was run by the same CPU, then the
current checks will pass even though the task was migrated out and is no
longer in the pushable tasks list.
Please go through the original rt change for more details on the issue.
To fix this, after the lock is obtained inside the find_lock_later_rq,
it ensures that the task is still at the head of pushable tasks list.
Also removed some checks that are no longer needed with the addition of
this new check.
However, the new check of pushable tasks list only applies when
find_lock_later_rq is called by push_dl_task. For the other caller i.e.
dl_task_offline_migration, existing checks are used.
Signed-off-by: Harshit Agarwal <harshit@nutanix.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250408045021.3283624-1-harshit@nutanix.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5d09ee1bec upstream.
This reverts commit c608966f3f.
This patch has a subtle bug that can cause the IPMI driver to go into an
infinite loop if the BMC misbehaves in a certain way. Apparently
certain BMCs do misbehave this way because several reports have come in
recently about this.
Signed-off-by: Corey Minyard <corey@minyard.net>
Tested-by: Eric Hagberg <ehagberg@janestreet.com>
Cc: <stable@vger.kernel.org> # 6.2
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ee6cd8f3e2 upstream.
CHARGE_CONTROL_LIMIT is a wrong property to report charge current limit,
because `CHARGE_*` attributes represents capacity, not current. The
correct attribute to report and set charge current limit is
CONSTANT_CHARGE_CURRENT.
Rename CHARGE_CONTROL_LIMIT to CONSTANT_CHARGE_CURRENT.
Cc: stable@vger.kernel.org
Fixes: 715ecbc10d ("power: supply: max77976: add Maxim MAX77976 charger driver")
Signed-off-by: Dzmitry Sankouski <dsankouski@gmail.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit eeaed48980 upstream.
On the TUXEDO InfinityBook S Gen8, a Samsung 990 Evo NVMe leads to
a high power consumption in s2idle sleep (3.5 watts).
This patch applies 'Force No Simple Suspend' quirk to achieve a sleep with
a lower power consumption, typically around 1 watts.
Signed-off-by: Georg Gottleuber <ggo@tuxedocomputers.com>
Signed-off-by: Werner Sembach <wse@tuxedocomputers.com>
Cc: stable@vger.kernel.org
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8ec5a066f8 upstream.
Similar in nature to ab10727660. glibc-2.42
drops the legacy termio struct, but the ioctls.h header still defines some
TC* constants in terms of termio (via sizeof). Hardcode the values instead.
This fixes building Python for example, which falls over like:
./Modules/termios.c:1119:16: error: invalid application of 'sizeof' to incomplete type 'struct termio'
Link: https://bugs.gentoo.org/961769
Link: https://bugs.gentoo.org/962600
Co-authored-by: Stian Halseth <stian@itx.no>
Cc: stable@vger.kernel.org
Signed-off-by: Sam James <sam@gentoo.org>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a8de554774 upstream.
In of_unittest_pci_node_verify(), when the add parameter is false,
device_find_any_child() obtains a reference to a child device. This
function implicitly calls get_device() to increment the device's
reference count before returning the pointer. However, the caller
fails to properly release this reference by calling put_device(),
leading to a device reference count leak. Add put_device() in the else
branch immediately after child_dev is no longer needed.
As the comment of device_find_any_child states: "NOTE: you will need
to drop the reference with put_device() after use".
Found by code review.
Cc: stable@vger.kernel.org
Fixes: 26409dd045 ("of: unittest: Add pci_dt_testdrv pci driver")
Signed-off-by: Ma Ke <make24@iscas.ac.cn>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit eed0e3d305 upstream.
To prevent timing attacks, HMAC value comparison needs to be constant
time. Replace the memcmp() with the correct function, crypto_memneq().
[For the Fixes commit I used the commit that introduced the memcmp().
It predates the introduction of crypto_memneq(), but it was still a bug
at the time even though a helper function didn't exist yet.]
Fixes: d00a1c72f7 ("keys: add new trusted key-type")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a15f37a401 upstream.
The usage of task_lock(tsk->group_leader) in sys_prlimit64()->do_prlimit()
path is very broken.
sys_prlimit64() does get_task_struct(tsk) but this only protects task_struct
itself. If tsk != current and tsk is not a leader, this process can exit/exec
and task_lock(tsk->group_leader) may use the already freed task_struct.
Another problem is that sys_prlimit64() can race with mt-exec which changes
->group_leader. In this case do_prlimit() may take the wrong lock, or (worse)
->group_leader may change between task_lock() and task_unlock().
Change sys_prlimit64() to take tasklist_lock when necessary. This is not
nice, but I don't see a better fix for -stable.
Link: https://lkml.kernel.org/r/20250915120917.GA27702@redhat.com
Fixes: 18c91bb2d8 ("prlimit: do not grab the tasklist_lock")
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5ef7e24c74 upstream.
The specification, Section 7.10, "Software Steps to Drain Page Requests &
Responses," requires software to submit an Invalidation Wait Descriptor
(inv_wait_dsc) with the Page-request Drain (PD=1) flag set, along with
the Invalidation Wait Completion Status Write flag (SW=1). It then waits
for the Invalidation Wait Descriptor's completion.
However, the PD field in the Invalidation Wait Descriptor is optional, as
stated in Section 6.5.2.9, "Invalidation Wait Descriptor":
"Page-request Drain (PD): Remapping hardware implementations reporting
Page-request draining as not supported (PDS = 0 in ECAP_REG) treat this
field as reserved."
This implies that if the IOMMU doesn't support the PDS capability, software
can't drain page requests and group responses as expected.
Do not enable PCI/PRI if the IOMMU doesn't support PDS.
Reported-by: Joel Granados <joel.granados@kernel.org>
Closes: https://lore.kernel.org/r/20250909-jag-pds-v1-1-ad8cba0e494e@kernel.org
Fixes: 66ac4db36f ("iommu/vt-d: Add page request draining support")
Cc: stable@vger.kernel.org
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20250915062946.120196-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit feb500c7ae upstream.
To convert level-triggered alarms into edge-triggered IIO events, alarms
are masked when they are triggered. To ensure we catch subsequent
alarms, we then periodically poll to see if the alarm is still active.
If it isn't, we unmask it. Active but masked alarms are stored in
current_masked_alarm.
If an active alarm is disabled, it will remain set in
current_masked_alarm until ams_unmask_worker clears it. If the alarm is
re-enabled before ams_unmask_worker runs, then it will never be cleared
from current_masked_alarm. This will prevent the alarm event from being
pushed even if the alarm is still active.
Fix this by recalculating current_masked_alarm immediately when enabling
or disabling alarms.
Fixes: d5c70627a7 ("iio: adc: Add Xilinx AMS driver")
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: O'Griofa, Conall <conall.ogriofa@amd.com>
Tested-by: Erim, Salih <Salih.Erim@amd.com>
Acked-by: Erim, Salih <Salih.Erim@amd.com>
Link: https://patch.msgid.link/20250715002847.2035228-1-sean.anderson@linux.dev
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 33d7ecbf69 upstream.
The ADF4350/1 features a programmable dual-modulus prescaler of 4/5 or 8/9.
When set to 4/5, the maximum RF frequency allowed is 3 GHz.
Therefore, when operating the ADF4351 above 3 GHz, this must be set to 8/9.
In this context not the RF output frequency is meant
- it's the VCO frequency.
Therefore move the prescaler selection after we derived the VCO frequency
from the desired RF output frequency.
This BUG may have caused PLL lock instabilities when operating the VCO at
the very high range close to 4.4 GHz.
Fixes: e31166f0fd ("iio: frequency: New driver for Analog Devices ADF4350/ADF4351 Wideband Synthesizers")
Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Nuno Sá <nuno.sa@analog.com>
Reviewed-by: Andy Shevchenko <andy@kernel.org>
Link: https://patch.msgid.link/20250829-adf4350-fix-v2-1-0bf543ba797d@analog.com
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3c63ba1c43 upstream.
There are two problems with the chip configuration in this driver:
- First, is that writing 12 bytes (ARRAY_SIZE(regs)) would anyhow
lead to a config overflow due to HW auto increment implementation
in the chip.
- Second, the i2c_smbus_write_block_data write ends up in writing
unexpected value to the channel_dis register, this is because
the smbus size that is 0x03 in this case gets written to the
register. The PAC1931/2/3/4 data sheet does not really specify
that block write is indeed supported.
This problem is probably not visible on PAC1934 version where all
channels are used as the chip is properly configured by luck,
but in our case whenusing PAC1931 this leads to nonfunctional device.
Fixes: 0fb528c825 (iio: adc: adding support for PAC193x)
Suggested-by: Rene Straub <mailto:rene.straub@belden.com>
Signed-off-by: Aleksandar Gerasimovski <aleksandar.gerasimovski@belden.com>
Reviewed-by: Marius Cristea <marius.cristea@microchip.com>
Link: https://patch.msgid.link/20250811130904.2481790-1-aleksandar.gerasimovski@belden.com
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 26e5c67deb upstream.
I observed a hang when running generic/323 against a fuseblk server.
This test opens a file, initiates a lot of AIO writes to that file
descriptor, and closes the file descriptor before the writes complete.
Unsurprisingly, the AIO exerciser threads are mostly stuck waiting for
responses from the fuseblk server:
# cat /proc/372265/task/372313/stack
[<0>] request_wait_answer+0x1fe/0x2a0 [fuse]
[<0>] __fuse_simple_request+0xd3/0x2b0 [fuse]
[<0>] fuse_do_getattr+0xfc/0x1f0 [fuse]
[<0>] fuse_file_read_iter+0xbe/0x1c0 [fuse]
[<0>] aio_read+0x130/0x1e0
[<0>] io_submit_one+0x542/0x860
[<0>] __x64_sys_io_submit+0x98/0x1a0
[<0>] do_syscall_64+0x37/0xf0
[<0>] entry_SYSCALL_64_after_hwframe+0x4b/0x53
But the /weird/ part is that the fuseblk server threads are waiting for
responses from itself:
# cat /proc/372210/task/372232/stack
[<0>] request_wait_answer+0x1fe/0x2a0 [fuse]
[<0>] __fuse_simple_request+0xd3/0x2b0 [fuse]
[<0>] fuse_file_put+0x9a/0xd0 [fuse]
[<0>] fuse_release+0x36/0x50 [fuse]
[<0>] __fput+0xec/0x2b0
[<0>] task_work_run+0x55/0x90
[<0>] syscall_exit_to_user_mode+0xe9/0x100
[<0>] do_syscall_64+0x43/0xf0
[<0>] entry_SYSCALL_64_after_hwframe+0x4b/0x53
The fuseblk server is fuse2fs so there's nothing all that exciting in
the server itself. So why is the fuse server calling fuse_file_put?
The commit message for the fstest sheds some light on that:
"By closing the file descriptor before calling io_destroy, you pretty
much guarantee that the last put on the ioctx will be done in interrupt
context (during I/O completion).
Aha. AIO fgets a new struct file from the fd when it queues the ioctx.
The completion of the FUSE_WRITE command from userspace causes the fuse
server to call the AIO completion function. The completion puts the
struct file, queuing a delayed fput to the fuse server task. When the
fuse server task returns to userspace, it has to run the delayed fput,
which in the case of a fuseblk server, it does synchronously.
Sending the FUSE_RELEASE command sychronously from fuse server threads
is a bad idea because a client program can initiate enough simultaneous
AIOs such that all the fuse server threads end up in delayed_fput, and
now there aren't any threads left to handle the queued fuse commands.
Fix this by only using asynchronous fputs when closing files, and leave
a comment explaining why.
Cc: stable@vger.kernel.org # v2.6.38
Fixes: 5a18ec176c ("fuse: fix hang of single threaded fuseblk filesystem")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0b563aad1c upstream.
In case of FUSE_NOTIFY_RESEND and FUSE_NOTIFY_INC_EPOCH fuse_copy_finish()
isn't called.
Fix by always calling fuse_copy_finish() after fuse_notify(). It's a no-op
if called a second time.
Fixes: 760eac73f9 ("fuse: Introduce a new notification type for resend pending requests")
Fixes: 2396356a94 ("fuse: add more control over cache invalidation behaviour")
Cc: <stable@vger.kernel.org> # v6.9
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 72b7ceca85 upstream.
There is a kernel panic due to WARN_ONCE when panic_on_warn is set.
This issue occurs when writeback is triggered due to sync call for an
opened file(ie, writeback reason is WB_REASON_SYNC). When f2fs balance
is needed at sync path, flush for quota_release_work is triggered.
By default quota_release_work is queued to "events_unbound" queue which
does not have WQ_MEM_RECLAIM flag. During f2fs balance "writeback"
workqueue tries to flush quota_release_work causing kernel panic due to
MEM_RECLAIM flag mismatch errors.
This patch creates dedicated workqueue with WQ_MEM_RECLAIM flag
for work quota_release_work.
------------[ cut here ]------------
WARNING: CPU: 4 PID: 14867 at kernel/workqueue.c:3721 check_flush_dependency+0x13c/0x148
Call trace:
check_flush_dependency+0x13c/0x148
__flush_work+0xd0/0x398
flush_delayed_work+0x44/0x5c
dquot_writeback_dquots+0x54/0x318
f2fs_do_quota_sync+0xb8/0x1a8
f2fs_write_checkpoint+0x3cc/0x99c
f2fs_gc+0x190/0x750
f2fs_balance_fs+0x110/0x168
f2fs_write_single_data_page+0x474/0x7dc
f2fs_write_data_pages+0x7d0/0xd0c
do_writepages+0xe0/0x2f4
__writeback_single_inode+0x44/0x4ac
writeback_sb_inodes+0x30c/0x538
wb_writeback+0xf4/0x440
wb_workfn+0x128/0x5d4
process_scheduled_works+0x1c4/0x45c
worker_thread+0x32c/0x3e8
kthread+0x11c/0x1b0
ret_from_fork+0x10/0x20
Kernel panic - not syncing: kernel: panic_on_warn set ...
Fixes: ac6f420291 ("quota: flush quota_release_work upon quota writeback")
CC: stable@vger.kernel.org
Signed-off-by: Shashank A P <shashank.ap@samsung.com>
Link: https://patch.msgid.link/20250901092905.2115-1-shashank.ap@samsung.com
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 15df28699b upstream.
A regression was reported to me recently whereby /dev/fb0 had disappeared
from a PowerBook G3 Series "Wallstreet". The problem shows up when the
"video=ofonly" parameter is passed to the kernel, which is what the
bootloader does when "no video driver" is selected. The cause of the
problem is the "offb" string comparison, which got mangled when it got
refactored. Fix it.
Cc: stable@vger.kernel.org
Fixes: 93604a5ade ("fbdev: Handle video= parameter in video/cmdline.c")
Reported-and-tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@linux-m68k.org>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0c43094f8c upstream.
The ready event list of an epoll object is protected by read-write
semaphore:
- The consumer (waiter) acquires the write lock and takes items.
- the producer (waker) takes the read lock and adds items.
The point of this design is enabling epoll to scale well with large number
of producers, as multiple producers can hold the read lock at the same
time.
Unfortunately, this implementation may cause scheduling priority inversion
problem. Suppose the consumer has higher scheduling priority than the
producer. The consumer needs to acquire the write lock, but may be blocked
by the producer holding the read lock. Since read-write semaphore does not
support priority-boosting for the readers (even with CONFIG_PREEMPT_RT=y),
we have a case of priority inversion: a higher priority consumer is blocked
by a lower priority producer. This problem was reported in [1].
Furthermore, this could also cause stall problem, as described in [2].
Fix this problem by replacing rwlock with spinlock.
This reduces the event bandwidth, as the producers now have to contend with
each other for the spinlock. According to the benchmark from
https://github.com/rouming/test-tools/blob/master/stress-epoll.c:
On 12 x86 CPUs:
Before After Diff
threads events/ms events/ms
8 7162 4956 -31%
16 8733 5383 -38%
32 7968 5572 -30%
64 10652 5739 -46%
128 11236 5931 -47%
On 4 riscv CPUs:
Before After Diff
threads events/ms events/ms
8 2958 2833 -4%
16 3323 3097 -7%
32 3451 3240 -6%
64 3554 3178 -11%
128 3601 3235 -10%
Although the numbers look bad, it should be noted that this benchmark
creates multiple threads who do nothing except constantly generating new
epoll events, thus contention on the spinlock is high. For real workload,
the event rate is likely much lower, and the performance drop is not as
bad.
Using another benchmark (perf bench epoll wait) where spinlock contention
is lower, improvement is even observed on x86:
On 12 x86 CPUs:
Before: Averaged 110279 operations/sec (+- 1.09%), total secs = 8
After: Averaged 114577 operations/sec (+- 2.25%), total secs = 8
On 4 riscv CPUs:
Before: Averaged 175767 operations/sec (+- 0.62%), total secs = 8
After: Averaged 167396 operations/sec (+- 0.23%), total secs = 8
In conclusion, no one is likely to be upset over this change. After all,
spinlock was used originally for years, and the commit which converted to
rwlock didn't mention a real workload, just that the benchmark numbers are
nice.
This patch is not exactly the revert of commit a218cc4914 ("epoll: use
rwlock in order to reduce ep_poll_callback() contention"), because git
revert conflicts in some places which are not obvious on the resolution.
This patch is intended to be backported, therefore go with the obvious
approach:
- Replace rwlock_t with spinlock_t one to one
- Delete list_add_tail_lockless() and chain_epi_lockless(). These were
introduced to allow producers to concurrently add items to the list.
But now that spinlock no longer allows producers to touch the event
list concurrently, these two functions are not necessary anymore.
Fixes: a218cc4914 ("epoll: use rwlock in order to reduce ep_poll_callback() contention")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Link: https://lore.kernel.org/ec92458ea357ec503c737ead0f10b2c6e4c37d47.1752581388.git.namcao@linutronix.de
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: stable@vger.kernel.org
Reported-by: Frederic Weisbecker <frederic@kernel.org>
Closes: https://lore.kernel.org/linux-rt-users/20210825132754.GA895675@lothringen/ [1]
Reported-by: Valentin Schneider <vschneid@redhat.com>
Closes: https://lore.kernel.org/linux-rt-users/xhsmhttqvnall.mognet@vschneid.remote.csb/ [2]
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 69e5d50fcf upstream.
The cpufreq_cpu_put() call in update_qos_request() takes place too early
because the latter subsequently calls freq_qos_update_request() that
indirectly accesses the policy object in question through the QoS request
object passed to it.
Fortunately, update_qos_request() is called under intel_pstate_driver_lock,
so this issue does not matter for changing the intel_pstate operation
mode, but it theoretically can cause a crash to occur on CPU device hot
removal (which currently can only happen in virt, but it is formally
supported nevertheless).
Address this issue by modifying update_qos_request() to drop the
reference to the policy later.
Fixes: da5c504c7a ("cpufreq: intel_pstate: Implement QoS supported freq constraints")
Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Zihuan Zhang <zhangzihuan@kylinos.cn>
Link: https://patch.msgid.link/2255671.irdbgypaU6@rafael.j.wysocki
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 04ff48239f upstream.
With the introduction of clone3 in commit 7f192e3cd3 ("fork: add
clone3") the effective bit width of clone_flags on all architectures was
increased from 32-bit to 64-bit. However, the signature of the copy_*
helper functions (e.g., copy_sighand) used by copy_process was not
adapted.
As such, they truncate the flags on any 32-bit architectures that
supports clone3 (arc, arm, csky, m68k, microblaze, mips32, openrisc,
parisc32, powerpc32, riscv32, x86-32 and xtensa).
For copy_sighand with CLONE_CLEAR_SIGHAND being an actual u64
constant, this triggers an observable bug in kernel selftest
clone3_clear_sighand:
if (clone_flags & CLONE_CLEAR_SIGHAND)
in function copy_sighand within fork.c will always fail given:
unsigned long /* == uint32_t */ clone_flags
#define CLONE_CLEAR_SIGHAND 0x100000000ULL
This commit fixes the bug by always passing clone_flags to copy_sighand
via their declared u64 type, invariant of architecture-dependent integer
sizes.
Fixes: b612e5df45 ("clone3: add CLONE_CLEAR_SIGHAND")
Cc: stable@vger.kernel.org # linux-5.5+
Signed-off-by: Simon Schuster <schuster.simon@siemens-energy.com>
Link: https://lore.kernel.org/20250901-nios2-implement-clone3-v2-1-53fcf5577d57@siemens-energy.com
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f5225a34bd upstream.
The mhi_ep_read_channel function incorrectly assumes the End of Transfer
(EOT) bit is present for each packet in a chained transactions, causing
it to advance mhi_chan->rd_offset beyond wr_offset during host-to-device
transfers when EOT has not yet arrived. This leads to access of unmapped
host memory, causing IOMMU faults and processing of stale TREs.
Modify the loop condition to ensure mhi_queue is not empty, allowing the
function to process only valid TREs up to the current write pointer to
prevent premature reads and ensure safe traversal of chained TREs.
Due to this change, buf_left needs to be removed from the while loop
condition to avoid exiting prematurely before reading the ring completely,
and also remove write_offset since it will always be zero because the new
cache buffer is allocated every time.
Fixes: 5301258899 ("bus: mhi: ep: Add support for reading from the host")
Co-developed-by: Akhil Vinod <akhil.vinod@oss.qualcomm.com>
Signed-off-by: Akhil Vinod <akhil.vinod@oss.qualcomm.com>
Signed-off-by: Sumit Kumar <sumit.kumar@oss.qualcomm.com>
[mani: reworded description slightly]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Reviewed-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250910-final_chained-v3-1-ec77c9d88ace@oss.qualcomm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dff4f9ff5d upstream.
The function btrfs_encode_fh() does not properly account for the three
cases it handles.
Before writing to the file handle (fh), the function only returns to the
user BTRFS_FID_SIZE_NON_CONNECTABLE (5 dwords, 20 bytes) or
BTRFS_FID_SIZE_CONNECTABLE (8 dwords, 32 bytes).
However, when a parent exists and the root ID of the parent and the
inode are different, the function writes BTRFS_FID_SIZE_CONNECTABLE_ROOT
(10 dwords, 40 bytes).
If *max_len is not large enough, this write goes out of bounds because
BTRFS_FID_SIZE_CONNECTABLE_ROOT is greater than
BTRFS_FID_SIZE_CONNECTABLE originally returned.
This results in an 8-byte out-of-bounds write at
fid->parent_root_objectid = parent_root_id.
A previous attempt to fix this issue was made but was lost.
https://lore.kernel.org/all/4CADAEEC020000780001B32C@vpn.id2.novell.com/
Although this issue does not seem to be easily triggerable, it is a
potential memory corruption bug that should be fixed. This patch
resolves the issue by ensuring the function returns the appropriate size
for all three cases and validates that *max_len is large enough before
writing any data.
Fixes: be6e8dc0ba ("NFS support for btrfs - v3")
CC: stable@vger.kernel.org # 3.0+
Signed-off-by: Anderson Nascimento <anderson@allelesecurity.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2d1684a077 upstream.
Currently this is hidden behind perfmon_capable() since this is
technically an info leak, given that this is a system wide metric.
However the granularity reported here is always PAGE_SIZE aligned, which
matches what the core kernel is already willing to expose to userspace
if querying how many free RAM pages there are on the system, and that
doesn't need any special privileges. In addition other drm drivers seem
happy to expose this.
The motivation here if with oneAPI where they want to use the system
wide 'used' reporting here, so not the per-client fdinfo stats. This has
also come up with some perf overlay applications wanting this
information.
Fixes: 1105ac15d2 ("drm/xe/uapi: restrict system wide accounting")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Joshua Santosh <joshua.santosh.ranjan@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250919122052.420979-2-matthew.auld@intel.com
(cherry picked from commit 4d0b035fd6dae8ee48e9c928b10f14877e595356)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e4bea91958 upstream.
In `nouveau_bo_move_prep`, if `nouveau_mem_map` fails, an error code
should be returned. Currently, it returns zero even if vmm addr is not
correctly mapped.
Cc: stable@vger.kernel.org
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Shuhao Fu <sfual@cse.ust.hk>
Fixes: 9ce523cc3b ("drm/nouveau: separate buffer object backing memory from nvkm structures")
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d83f1d19c8 upstream.
Remove fixed PPI lane count setup. The R-Car DSI host is capable
of operating in 1..4 DSI lane mode. Remove the hard-coded 4-lane
configuration from PPI register settings and instead configure
the PPI lane count according to lane count information already
obtained by this driver instance.
Configure TXSETR register to match PPI lane count. The R-Car V4H
Reference Manual R19UH0186EJ0121 Rev.1.21 section 67.2.2.3 Tx Set
Register (TXSETR), field LANECNT description indicates that the
TXSETR register LANECNT bitfield lane count must be configured
such, that it matches lane count configuration in PPISETR register
DLEN bitfield. Make sure the LANECNT and DLEN bitfields are
configured to match.
Fixes: 155358310f ("drm: rcar-du: Add R-Car DSI driver")
Cc: stable@vger.kernel.org
Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
Reviewed-by: Tomi Valkeinen <tomi.valkeinen+renesas@ideasonboard.com>
Link: https://lore.kernel.org/r/20250813210840.97621-1-marek.vasut+renesas@mailbox.org
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4f4098c57e upstream.
When cdev_device_add() failed, calling put_device() to explicitly
release dev->lirc_dev. Otherwise, it could cause the fault of the
reference count.
Found by code review.
Cc: stable@vger.kernel.org
Fixes: a6ddd4fecb ("media: lirc: remove last remnants of lirc kapi")
Signed-off-by: Ma Ke <make24@iscas.ac.cn>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3e743cd0a7 upstream.
We don't use OF ports and remote-endpoints to connect the CSI2RX bridge
and this device in the device tree, thus it is wrong to use
v4l2_create_fwnode_links_to_pad() to create the media graph link between
the two.
It works out on accident, as neither the source nor the sink implement
the .get_fwnode_pad() callback, and the framework helper falls back on
using the first source and sink pads to create the link between them.
Instead, manually create the media link from the first source pad of the
bridge to the first sink pad of the J721E CSI2RX.
Fixes: b4a3d877dc ("media: ti: Add CSI2RX support for J721E")
Cc: stable@vger.kernel.org
Reviewed-by: Devarsh Thakkar <devarsht@ti.com>
Tested-by: Yemike Abhilash Chandra <y-abhilashchandra@ti.com> (on SK-AM68)
Signed-off-by: Jai Luthra <jai.luthra@ideasonboard.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4bd8a61476 upstream.
The vivid driver supports the <Vendor Command With ID> message,
but if the Vendor ID of the received message didn't match the Vendor ID
of the CEC Adapter, then it ignores it (good) and returns 0 (bad).
It should return -ENOMSG to indicate that other followers should be
asked to handle it. Return code 0 means that the driver handled it,
which is wrong in this case.
As a result, userspace followers never get the chance to process such a
message.
Refactor the code a bit to have the function return -ENOMSG at the end,
drop the default case, and ensure that the message handlers return 0.
That way 0 is only returned if the message is actually handled in the
vivid_received() function.
Fixes: 812765cd69 ("media: vivid: add <Vendor Command With ID> support")
Cc: stable@vger.kernel.org
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 93f213b444 upstream.
When starting venus with the "no_tz" code path, IRIS2 needs the same
boot/reset sequence as IRIS2_1. This is because most of the registers were
moved to the "wrapper_tz_base", which is already defined for both IRIS2 and
IRIS2_1 inside core.c. Add IRIS2 to the checks inside firmware.c as well to
make sure that it uses the correct reset sequence.
Both IRIS2 and IRIS2_1 are HFI v6 variants, so the correct sequence was
used before commit c38610f898 ("media: venus: firmware: Sanitize
per-VPU-version").
Fixes: c38610f898 ("media: venus: firmware: Sanitize per-VPU-version")
Cc: stable@vger.kernel.org
Signed-off-by: Stephan Gerhold <stephan.gerhold@linaro.org>
Reviewed-by: Vikash Garodia <quic_vgarodia@quicinc.com>
Reviewed-by: Dikshita Agarwal <quic_dikshita@quicinc.com>
Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
[bod: Fixed commit log IRIS -> IRIS2]
Signed-off-by: Bryan O'Donoghue <bod@kernel.org>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7fa37ba25a upstream.
The s5p_mfc_cmd_args structure in the v6 driver is never used, not
initialized to anything other than zero, but as of clang-21 this
causes a warning:
drivers/media/platform/samsung/s5p-mfc/s5p_mfc_cmd_v6.c:45:7: error: variable 'h2r_args' is uninitialized when passed as a const pointer argument here [-Werror,-Wuninitialized-const-pointer]
45 | &h2r_args);
| ^~~~~~~~
Just remove this for simplicity. Since the function is also called
through a callback, this does require adding a trivial wrapper with
the correct prototype.
Fixes: f96f3cfa0b ("[media] s5p-mfc: Update MFC v4l2 driver to support MFC6.x")
Cc: stable@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1069a4fe63 upstream.
The DMA map functions can fail and should be tested for errors.
If the mapping fails, free blanking_ptr and set it to 0. As 0 is a
valid DMA address, use blanking_ptr to test if the DMA address
is set.
Fixes: 1a0adaf37c ("V4L/DVB (5345): ivtv driver for Conexant cx23416/cx23415 MPEG encoder/decoder")
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bacd713145 upstream.
Change "ret" from unsigned int to int type in mt9v111_calc_frame_rate()
to store negative error codes or zero returned by __mt9v111_hw_reset()
and other functions.
Storing the negative error codes in unsigned type, doesn't cause an issue
at runtime but it's ugly as pants.
No effect on runtime.
Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Fixes: aab7ed1c39 ("media: i2c: Add driver for Aptina MT9V111")
Cc: stable@vger.kernel.org
Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d5d12cc03e upstream.
Delete the external-module style Makefile commands. They are not needed
for in-tree modules.
This is the only Makefile in the kernel tree (aside from tools/ and
samples/) that uses this Makefile style.
The same files are built with or without this patch.
Fixes: 056f2821b6 ("media: cec: extron-da-hd-4k-plus: add the Extron DA HD 4K Plus CEC driver")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3fcc8e1469 upstream.
VIRQs come in 3 flavors, per-VPU, per-domain, and global, and the VIRQs
are tracked in per-cpu virq_to_irq arrays.
Per-domain and global VIRQs must be bound on CPU 0, and
bind_virq_to_irq() sets the per_cpu virq_to_irq at registration time
Later, the interrupt can migrate, and info->cpu is updated. When
calling __unbind_from_irq(), the per-cpu virq_to_irq is cleared for a
different cpu. If bind_virq_to_irq() is called again with CPU 0, the
stale irq is returned. There won't be any irq_info for the irq, so
things break.
Make xen_rebind_evtchn_to_cpu() update the per_cpu virq_to_irq mappings
to keep them update to date with the current cpu. This ensures the
correct virq_to_irq is cleared in __unbind_from_irq().
Fixes: e46cdb66c8 ("xen: event channels")
Cc: stable@vger.kernel.org
Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Message-ID: <20250828003604.8949-4-jason.andryuk@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 07ce121d93 upstream.
Change find_virq() to return -EEXIST when a VIRQ is bound to a
different CPU than the one passed in. With that, remove the BUG_ON()
from bind_virq_to_irq() to propogate the error upwards.
Some VIRQs are per-cpu, but others are per-domain or global. Those must
be bound to CPU0 and can then migrate elsewhere. The lookup for
per-domain and global will probably fail when migrated off CPU 0,
especially when the current CPU is tracked. This now returns -EEXIST
instead of BUG_ON().
A second call to bind a per-domain or global VIRQ is not expected, but
make it non-fatal to avoid trying to look up the irq, since we don't
know which per_cpu(virq_to_irq) it will be in.
Cc: stable@vger.kernel.org
Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Message-ID: <20250828003604.8949-3-jason.andryuk@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f770c3d858 upstream.
The device power management API has the following asymmetry:
* dpm_suspend_start() does not clean up on failure
(it requires a call to dpm_resume_end())
* dpm_suspend_end() does clean up on failure
(it does not require a call to dpm_resume_start())
The asymmetry was introduced by commit d8f3de0d24 ("Suspend-related
patches for 2.6.27") in June 2008: It removed a call to device_resume()
from device_suspend() (which was later renamed to dpm_suspend_start()).
When Xen began using the device power management API in May 2008 with
commit 0e91398f2a ("xen: implement save/restore"), the asymmetry did
not yet exist. But since it was introduced, a call to dpm_resume_end()
is missing in the error path of dpm_suspend_start(). Fix it.
Fixes: d8f3de0d24 ("Suspend-related patches for 2.6.27")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org # v2.6.27
Reviewed-by: "Rafael J. Wysocki (Intel)" <rafael@kernel.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Message-ID: <22453676d1ddcebbe81641bb68ddf587fee7e21e.1756990799.git.lukas@wunner.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 08df2d7dd4 upstream.
rc is overwritten by the evtchn_status hypercall in each iteration, so
the return value will be whatever the last iteration is. This could
incorrectly return success even if the event channel was not found.
Change to an explicit -ENOENT for an un-found virq and return 0 on a
successful match.
Fixes: 62cc5fc7b2 ("xen/pv-on-hvm kexec: rebind virqs to existing eventchannel ports")
Cc: stable@vger.kernel.org
Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Message-ID: <20250828003604.8949-2-jason.andryuk@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b3fe1c83a5 upstream.
CMN S3's DTM offset is different between r0px and r1p0, and it
turns out this was not a error in the earlier documentation, but
does actually exist in the design. Lovely.
Cc: stable@vger.kernel.org
Fixes: 0dc2f4963f ("perf/arm-cmn: Support CMN S3")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8a6506e1ba upstream.
There is an issue possible where TI AM33xx SoCs do not boot properly after
a reset if EMU0/EMU1 pins were used as GPIO and have been driving low level
actively prior to reset [1].
"Advisory 1.0.36 EMU0 and EMU1: Terminals Must be Pulled High Before
ICEPick Samples
The state of the EMU[1:0] terminals are latched during reset to determine
ICEPick boot mode. For normal device operation, these terminals must be
pulled up to a valid high logic level ( > VIH min) before ICEPick samples
the state of these terminals, which occurs
[five CLK_M_OSC clock cycles - 10 ns] after the falling edge of WARMRSTn.
Many applications may not require the secondary GPIO function of the
EMU[1:0] terminals. In this case, they would only be connected to pull-up
resistors, which ensures they are always high when ICEPick samples.
However, some applications may need to use these terminals as GPIO where
they could be driven low before reset is asserted. This usage of the
EMU[1:0] terminals may require special attention to ensure the terminals
are allowed to return to a valid high-logic level before ICEPick samples
the state of these terminals.
When any device reset is asserted, the pin mux mode of EMU[1:0] terminals
configured to operate as GPIO (mode 7) will change back to EMU input
(mode 0) on the falling edge of WARMRSTn. This only provides a short period
of time for the terminals to return high if driven low before reset is
asserted...
If the EMU[1:0] terminals are configured to operate as GPIO, the product
should be designed such these terminals can be pulled to a valid high-logic
level within 190 ns after the falling edge of WARMRSTn."
We've noticed this problem with custom am335x hardware in combination with
recently implemented cold reset method
(commit 6521f6a195 ("ARM: AM33xx: PRM: Implement REBOOT_COLD")).
It looks like the problem can affect other HW, for instance AM335x
Chiliboard, because the latter has LEDs on GPIO3_7/GPIO3_8 as well.
One option would be to check if the pins are in GPIO mode and either switch
to output active high, or switch to input and poll until the external
pull-ups have brought the pins to the desired high state. But fighting
with GPIO driver for these pins is probably not the most straight forward
approch in a reboot handler.
Fortunately we can easily control pinmuxing here and rely on the external
pull-ups. TI recommends 4k7 external pull up resistors [2] and even with
quite conservative estimation for pin capacity (1 uF should never happen)
the required delay shall not exceed 5ms.
[1] Link: https://www.ti.com/lit/pdf/sprz360
[2] Link: https://e2e.ti.com/support/processors-group/processors/f/processors-forum/866346/am3352-emu-1-0-questions
Cc: stable@vger.kernel.org
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
Link: https://lore.kernel.org/r/20250717152708.487891-1-alexander.sverdlin@siemens.com
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f73c82c855 upstream.
On most MSM8939 devices, the bootloader already initializes the display to
show the boot splash screen. In this situation, MDSS is already configured
and left running when starting Linux. To avoid side effects from the
bootloader configuration, the MDSS reset can be specified in the device
tree to start again with a clean hardware state.
The reset for MDSS is currently missing in msm8939.dtsi, which causes
errors when the MDSS driver tries to re-initialize the registers:
dsi_err_worker: status=6
dsi_err_worker: status=6
dsi_err_worker: status=6
...
It turns out that we have always indirectly worked around this by building
the MDSS driver as a module. Before v6.17, the power domain was temporarily
turned off until the module was loaded, long enough to clear the register
contents. In v6.17, power domains are not turned off during boot until
sync_state() happens, so this is no longer working. Even before v6.17 this
resulted in broken behavior, but notably only when the MDSS driver was
built-in instead of a module.
Cc: stable@vger.kernel.org
Fixes: 61550c6c15 ("arm64: dts: qcom: Add msm8939 SoC")
Signed-off-by: Stephan Gerhold <stephan.gerhold@linaro.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250915-msm8916-resets-v1-2-a5c705df0c45@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 99b78773c2 upstream.
On most MSM8916 devices (aside from the DragonBoard 410c), the bootloader
already initializes the display to show the boot splash screen. In this
situation, MDSS is already configured and left running when starting Linux.
To avoid side effects from the bootloader configuration, the MDSS reset can
be specified in the device tree to start again with a clean hardware state.
The reset for MDSS is currently missing in msm8916.dtsi, which causes
errors when the MDSS driver tries to re-initialize the registers:
dsi_err_worker: status=6
dsi_err_worker: status=6
dsi_err_worker: status=6
...
It turns out that we have always indirectly worked around this by building
the MDSS driver as a module. Before v6.17, the power domain was temporarily
turned off until the module was loaded, long enough to clear the register
contents. In v6.17, power domains are not turned off during boot until
sync_state() happens, so this is no longer working. Even before v6.17 this
resulted in broken behavior, but notably only when the MDSS driver was
built-in instead of a module.
Cc: stable@vger.kernel.org
Fixes: 305410ffd1 ("arm64: dts: msm8916: Add display support")
Signed-off-by: Stephan Gerhold <stephan.gerhold@linaro.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250915-msm8916-resets-v1-1-a5c705df0c45@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 496f9372ea upstream.
In the ACPI debugger interface, the helper functions for read and write
operations use "int" as the length parameter data type. When a large
"size_t count" is passed from the file operations, this cast to "int"
results in truncation and a negative value due to signed integer
representation.
Logically, this negative number propagates to the min() calculation,
where it is selected over the positive buffer space value, leading to
unexpected behavior. Subsequently, when this negative value is used in
copy_to_user() or copy_from_user(), it is interpreted as a large positive
value due to the unsigned nature of the size parameter in these functions,
causing the copy operations to attempt handling sizes far beyond the
intended buffer limits.
Address the issue by:
- Changing the length parameters in acpi_aml_read_user() and
acpi_aml_write_user() from "int" to "size_t", aligning with the
expected unsigned size semantics.
- Updating return types and local variables in acpi_aml_read() and
acpi_aml_write() to "ssize_t" for consistency with kernel file
operation conventions.
- Using "size_t" for the "n" variable to ensure calculations remain
unsigned.
- Using min_t() for circ_count_to_end() and circ_space_to_end() to
ensure type-safe comparisons and prevent integer overflow.
Signed-off-by: Amir Mohammad Jahangirzad <a.jahangirzad@gmail.com>
Link: https://patch.msgid.link/20250923013113.20615-1-a.jahangirzad@gmail.com
[ rjw: Changelog tweaks, local variable definitions ordering adjustments ]
Fixes: 8cfb0cdf07 ("ACPI / debugger: Add IO interface to access debugger functionalities")
Cc: 4.5+ <stable@vger.kernel.org> # 4.5+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d0759b1098 upstream.
The ACPI handle passed to acpi_extract_properties() as the first
argument represents the ACPI namespace scope in which to look for
objects returning buffers associated with buffer properties.
For _DSD objects located immediately under ACPI devices, this handle is
the same as the handle of the device object holding the _DSD, but for
data-only subnodes it is not so.
First of all, data-only subnodes are represented by objects that
cannot hold other objects in their scopes (like control methods).
Therefore a data-only subnode handle cannot be used for completing
relative pathname segments, so the current code in
in acpi_nondev_subnode_extract() passing a data-only subnode handle
to acpi_extract_properties() is invalid.
Moreover, a data-only subnode of device A may be represented by an
object located in the scope of device B (which kind of makes sense,
for instance, if A is a B's child). In that case, the scope in
question would be the one of device B. In other words, the scope
mentioned above is the same as the scope used for subnode object
lookup in acpi_nondev_subnode_extract().
Accordingly, rearrange that function to use the same scope for the
extraction of properties and subnode object lookup.
Fixes: 103e10c69c ("ACPI: property: Add support for parsing buffer property UUID")
Cc: 6.0+ <stable@vger.kernel.org> # 6.0+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Tested-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9338d660b7 ]
When building s390 defconfig with binutils older than 2.32, there are
several warnings during the final linking stage:
s390-linux-ld: .tmp_vmlinux1: warning: allocated section `.got.plt' not in segment
s390-linux-ld: .tmp_vmlinux2: warning: allocated section `.got.plt' not in segment
s390-linux-ld: vmlinux.unstripped: warning: allocated section `.got.plt' not in segment
s390-linux-objcopy: vmlinux: warning: allocated section `.got.plt' not in segment
s390-linux-objcopy: st7afZyb: warning: allocated section `.got.plt' not in segment
binutils commit afca762f598 ("S/390: Improve partial relro support for
64 bit") [1] in 2.32 changed where .got.plt is emitted, avoiding the
warning.
The :NONE in the .vmlinux.info output section description changes the
segment for subsequent allocated sections. Move .vmlinux.info right
above the discards section to place all other sections in the previously
defined segment, .data.
Fixes: 30226853d6 ("s390: vmlinux.lds.S: explicitly handle '.got' and '.plt' sections")
Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=afca762f598d453c563f244cd3777715b1a0cb72 [1]
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Alexey Gladkov <legion@kernel.org>
Acked-by: Nicolas Schier <nsc@kernel.org>
Link: https://patch.msgid.link/20251008-kbuild-fix-modinfo-regressions-v1-3-9fc776c5887c@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4f375ade6a ]
When unpinning a BPF hash table (htab or htab_lru) that contains internal
structures (timer, workqueue, or task_work) in its values, a BUG warning
is triggered:
BUG: sleeping function called from invalid context at kernel/bpf/hashtab.c:244
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 14, name: ksoftirqd/0
...
The issue arises from the interaction between BPF object unpinning and
RCU callback mechanisms:
1. BPF object unpinning uses ->free_inode() which schedules cleanup via
call_rcu(), deferring the actual freeing to an RCU callback that
executes within the RCU_SOFTIRQ context.
2. During cleanup of hash tables containing internal structures,
htab_map_free_internal_structs() is invoked, which includes
cond_resched() or cond_resched_rcu() calls to yield the CPU during
potentially long operations.
However, cond_resched() or cond_resched_rcu() cannot be safely called from
atomic RCU softirq context, leading to the BUG warning when attempting
to reschedule.
Fix this by changing from ->free_inode() to ->destroy_inode() and rename
bpf_free_inode() to bpf_destroy_inode() for BPF objects (prog, map, link).
This allows direct inode freeing without RCU callback scheduling,
avoiding the invalid context warning.
Reported-by: Le Chen <tom2cat@sjtu.edu.cn>
Closes: https://lore.kernel.org/all/1444123482.1827743.1750996347470.JavaMail.zimbra@sjtu.edu.cn/
Fixes: 68134668c1 ("bpf: Add map side support for bpf timers.")
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20251008102628.808045-2-kafai.wan@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b5f8aa8d4b ]
The slimbus regmap passed to the GPIO driver down from MFD does not use
fast_io. This means a mutex is used for locking and thus this GPIO chip
must not be used in atomic context. Change the can_sleep switch in
struct gpio_chip to true.
Fixes: 59c3246834 ("gpio: wcd934x: Add support to wcd934x gpio controller")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8a81236f2c ]
The tpm_tis_write8() call specifies arguments in wrong order. Should be
(data, addr, value) not (data, value, addr). The initial correct order
was changed during the major refactoring when the code was split.
Fixes: 41a5e1cf1f ("tpm/tpm_tis: Split tpm_tis driver into a core and TCG TIS compliant phy")
Signed-off-by: Gunnar Kudrjavets <gunnarku@amazon.com>
Reviewed-by: Justinien Bouron <jbouron@amazon.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 057ac50638 ]
EA $LXMOD is required for WSL non-symlink reparse points.
Fixes: ef86ab131d ("cifs: Fix querying of WSL CHR and BLK reparse points over SMB1")
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b95cd1bdf5 ]
Don't reuse open handle when changing timestamps to prevent the server
from disabling automatic timestamp updates as per MS-FSA 2.1.4.17.
---8<---
import os
import time
filename = '/mnt/foo'
def print_stat(prefix):
st = os.stat(filename)
print(prefix, ': ', time.ctime(st.st_atime), time.ctime(st.st_ctime))
fd = os.open(filename, os.O_CREAT|os.O_TRUNC|os.O_WRONLY, 0o644)
print_stat('old')
os.utime(fd, None)
time.sleep(2)
os.write(fd, b'foo')
os.close(fd)
time.sleep(2)
print_stat('new')
---8<---
Before patch:
$ mount.cifs //srv/share /mnt -o ...
$ python3 run.py
old : Fri Oct 3 14:01:21 2025 Fri Oct 3 14:01:21 2025
new : Fri Oct 3 14:01:21 2025 Fri Oct 3 14:01:21 2025
After patch:
$ mount.cifs //srv/share /mnt -o ...
$ python3 run.py
old : Fri Oct 3 17:03:34 2025 Fri Oct 3 17:03:34 2025
new : Fri Oct 3 17:03:36 2025 Fri Oct 3 17:03:36 2025
Fixes: b6f2a0f89d ("cifs: for compound requests, use open handle if possible")
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Cc: Frank Sorenson <sorenson@redhat.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0cc380d0e1 ]
The return value of copy_to_iter() function will never be negative,
it is the number of bytes copied, or zero if nothing was copied.
Update the check to treat 0 as an error, and return -1 in that case.
Fixes: d08089f649 ("cifs: Change the I/O paths to use an iterator rather than a page list")
Acked-by: Tom Talpey <tom@talpey.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Fushuai Wang <wangfushuai@baidu.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6bb73db694 ]
Move the ssize check to the start in essiv_aead_crypt so that
it's also checked for decryption and in-place encryption.
Reported-by: Muhammad Alifa Ramdhan <ramdhan@starlabs.sg>
Fixes: be1eb7f78a ("crypto: essiv - create wrapper template for ESSIV generation")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e84945bdc6 ]
Jakub reported this self test flaking occasionally (fails, but passes on
re-run) on debug kernels.
This is because the test checks for elapsed time to determine if both
connections were established in parallel.
Rework this to no longer depend on timing.
Use busywait helper to check that both sockets have moved to established
state and then query the conntrack engine for the two entries.
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netfilter-devel/20250926163318.40d1a502@kernel.org/
Fixes: 117e149e26 ("selftests: netfilter: test nat source port clash resolution interaction with tcp early demux")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f359b809d5 ]
Referencing a synproxy stateful object from OUTPUT hook causes kernel
crash due to infinite recursive calls:
BUG: TASK stack guard page was hit at 000000008bda5b8c (stack is 000000003ab1c4a5..00000000494d8b12)
[...]
Call Trace:
__find_rr_leaf+0x99/0x230
fib6_table_lookup+0x13b/0x2d0
ip6_pol_route+0xa4/0x400
fib6_rule_lookup+0x156/0x240
ip6_route_output_flags+0xc6/0x150
__nf_ip6_route+0x23/0x50
synproxy_send_tcp_ipv6+0x106/0x200
synproxy_send_client_synack_ipv6+0x1aa/0x1f0
nft_synproxy_do_eval+0x263/0x310
nft_do_chain+0x5a8/0x5f0 [nf_tables
nft_do_chain_inet+0x98/0x110
nf_hook_slow+0x43/0xc0
__ip6_local_out+0xf0/0x170
ip6_local_out+0x17/0x70
synproxy_send_tcp_ipv6+0x1a2/0x200
synproxy_send_client_synack_ipv6+0x1aa/0x1f0
[...]
Implement objref and objrefmap expression validate functions.
Currently, only NFT_OBJECT_SYNPROXY object type requires validation.
This will also handle a jump to a chain using a synproxy object from the
OUTPUT hook.
Now when trying to reference a synproxy object in the OUTPUT hook, nft
will produce the following error:
synproxy_crash.nft: Error: Could not process rule: Operation not supported
synproxy name mysynproxy
^^^^^^^^^^^^^^^^^^^^^^^^
Fixes: ee394f96ad ("netfilter: nft_synproxy: add synproxy stateful object support")
Reported-by: Georg Pfuetzenreuter <georg.pfuetzenreuter@suse.com>
Closes: https://bugzilla.suse.com/1250237
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a7dc87f344 ]
SCL_SCALER_ENABLE can be used to enable/disable the scaler
on DCE6. Program it to 0 when scaling isn't used, 1 when used.
Additionally, clear some other registers when scaling is
disabled and program the SCL_UPDATE register as recommended.
This fixes visible glitches for users whose BIOS sets up a
mode with scaling at boot, which DC was unable to clean up.
Fixes: b70aaf5586 ("drm/amd/display: dce_transform: add DCE6 specific macros,functions")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3f39f56520 ]
pm_runtime_get_sync() and pm_runtime_put_autosuspend() were previously
called in cmdq_mbox_send_data(), which is under a spinlock in msg_submit()
(mailbox.c). This caused lockdep warnings such as "sleeping function
called from invalid context" when running with lockdebug enabled.
The BUG report:
BUG: sleeping function called from invalid context at drivers/base/power/runtime.c:1164
in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 3616, name: kworker/u17:3
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
INFO: lockdep is turned off.
irq event stamp: 0
CPU: 1 PID: 3616 Comm: kworker/u17:3 Not tainted 6.1.87-lockdep-14133-g26e933aca785 #1
Hardware name: Google Ciri sku0/unprovisioned board (DT)
Workqueue: imgsys_runner imgsys_runner_func
Call trace:
dump_backtrace+0x100/0x120
show_stack+0x20/0x2c
dump_stack_lvl+0x84/0xb4
dump_stack+0x18/0x48
__might_resched+0x354/0x4c0
__might_sleep+0x98/0xe4
__pm_runtime_resume+0x70/0x124
cmdq_mbox_send_data+0xe4/0xb1c
msg_submit+0x194/0x2dc
mbox_send_message+0x190/0x330
imgsys_cmdq_sendtask+0x1618/0x2224
imgsys_runner_func+0xac/0x11c
process_one_work+0x638/0xf84
worker_thread+0x808/0xcd0
kthread+0x24c/0x324
ret_from_fork+0x10/0x20
Additionally, pm_runtime_put_autosuspend() should be invoked from the
GCE IRQ handler to ensure the hardware has actually completed its work.
To resolve these issues, remove the pm_runtime calls from
cmdq_mbox_send_data() and delegate power management responsibilities
to the client driver.
Fixes: 8afe816b0c ("mailbox: mtk-cmdq-mailbox: Implement Runtime PM with autosuspend")
Signed-off-by: Jason-JH Lin <jason-jh.lin@mediatek.com>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 472f8a3fcc ]
__pm_runtime_put_autosuspend() was meant to be used by callers that needed
to put the Runtime PM usage_count without marking the device's last busy
timestamp. It was however seen that the Runtime PM autosuspend related
functions should include that call. Thus switch the driver to
use pm_runtime_put_autosuspend().
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
Stable-dep-of: 3f39f56520 ("mailbox: mtk-cmdq: Remove pm_runtime APIs from cmdq_mbox_send_data()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 08fb6d8ff9 ]
pm_runtime_put_autosuspend() will soon be changed to include a call to
pm_runtime_mark_last_busy(). This patch switches the current users to
__pm_runtime_put_autosuspend() which will continue to have the
functionality of old pm_runtime_put_autosuspend().
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
Stable-dep-of: 3f39f56520 ("mailbox: mtk-cmdq: Remove pm_runtime APIs from cmdq_mbox_send_data()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 23f3770e1a ]
Cilium has a BPF egress gateway feature which forces outgoing K8s Pod
traffic to pass through dedicated egress gateways which then SNAT the
traffic in order to interact with stable IPs outside the cluster.
The traffic is directed to the gateway via vxlan tunnel in collect md
mode. A recent BPF change utilized the bpf_redirect_neigh() helper to
forward packets after the arrival and decap on vxlan, which turned out
over time that the kmalloc-256 slab usage in kernel was ever-increasing.
The issue was that vxlan allocates the metadata_dst object and attaches
it through a fake dst entry to the skb. The latter was never released
though given bpf_redirect_neigh() was merely setting the new dst entry
via skb_dst_set() without dropping an existing one first.
Fixes: b4ab314149 ("bpf: Add redirect_neigh helper as redirect drop-in")
Reported-by: Yusuke Suzuki <yusuke.suzuki@isovalent.com>
Reported-by: Julian Wiedmann <jwi@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <martin.lau@kernel.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jordan Rife <jrife@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Jordan Rife <jrife@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20251003073418.291171-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bb160e791a ]
The driver incorrectly determines SGI vs SPI interrupts by checking IRQ
number < 16, which fails with dynamic IRQ allocation. During unbind,
this causes improper SGI cleanup leading to kernel crash.
Add explicit irq_type field to pdata for reliable identification of SGI
interrupts (type-2) and only clean up SGI resources when appropriate.
Fixes: 6ffb163534 ("mailbox: zynqmp: handle SGI for shared IPI")
Signed-off-by: Harini T <harini.t@amd.com>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0aead8197f ]
The cleanup loop was starting at the wrong array index, causing
out-of-bounds access.
Start the loop at the correct index for zero-indexed arrays to prevent
accessing memory beyond the allocated array bounds.
Fixes: 4981b82ba2 ("mailbox: ZynqMP IPI mailbox controller")
Signed-off-by: Harini T <harini.t@amd.com>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 019e3f4550 ]
The ipi_mbox->dev.parent check is unreliable proxy for registration
status as it fails to protect against probe failures that occur after
the parent is assigned but before device_register() completes.
device_is_registered() is the canonical and robust method to verify the
registration status.
Remove ipi_mbox->dev.parent check in zynqmp_ipi_free_mboxes().
Fixes: 4981b82ba2 ("mailbox: ZynqMP IPI mailbox controller")
Signed-off-by: Harini T <harini.t@amd.com>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 341867f730 ]
The controller is registered using the device-managed function
'devm_mbox_controller_register()'. As documented in mailbox.c, this
ensures the devres framework automatically calls
mbox_controller_unregister() when device_unregister() is invoked, making
the explicit call unnecessary.
Remove redundant mbox_controller_unregister() call as
device_unregister() handles controller cleanup.
Fixes: 4981b82ba2 ("mailbox: ZynqMP IPI mailbox controller")
Signed-off-by: Harini T <harini.t@amd.com>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 21b29e74ff ]
Some applications (like selftests/net/tcp_mmap.c) call SO_RCVLOWAT
on their listener, before accept().
This has an unfortunate effect on wscale selection in
tcp_select_initial_window() during 3WHS.
For instance, tcp_mmap was negotiating wscale 4, regardless
of tcp_rmem[2] and sysctl_rmem_max.
Do not change tp->window_clamp if it is zero
or bigger than our computed value.
Zero value is special, it allows tcp_select_initial_window()
to enable autotuning.
Note that SO_RCVLOWAT use on listener is probably not wise,
because tp->scaling_ratio has a default value, possibly wrong.
Fixes: d1361840f8 ("tcp: fix SO_RCVLOWAT and RCVBUF autotuning")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20251003184119.2526655-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2db687f346 ]
When ice_adapter_new() fails, the reserved XArray entry created by
xa_insert() is not released. This causes subsequent insertions at
the same index to return -EBUSY, potentially leading to
NULL pointer dereferences.
Reorder the operations as suggested by Przemek Kitszel:
1. Check if adapter already exists (xa_load)
2. Reserve the XArray slot (xa_reserve)
3. Allocate the adapter (ice_adapter_new)
4. Store the adapter (xa_store)
Fixes: 0f0023c649 ("ice: do not init struct ice_adapter more times than needed")
Suggested-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Suggested-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20251001115336.1707-1-vulab@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bc9ea78707 ]
The origin code calls cancel_delayed_work() in ocelot_stats_deinit()
to cancel the cyclic delayed work item ocelot->stats_work. However,
cancel_delayed_work() may fail to cancel the work item if it is already
executing. While destroy_workqueue() does wait for all pending work items
in the work queue to complete before destroying the work queue, it cannot
prevent the delayed work item from being rescheduled within the
ocelot_check_stats_work() function. This limitation exists because the
delayed work item is only enqueued into the work queue after its timer
expires. Before the timer expiration, destroy_workqueue() has no visibility
of this pending work item. Once the work queue appears empty,
destroy_workqueue() proceeds with destruction. When the timer eventually
expires, the delayed work item gets queued again, leading to the following
warning:
workqueue: cannot queue ocelot_check_stats_work on wq ocelot-switch-stats
WARNING: CPU: 2 PID: 0 at kernel/workqueue.c:2255 __queue_work+0x875/0xaf0
...
RIP: 0010:__queue_work+0x875/0xaf0
...
RSP: 0018:ffff88806d108b10 EFLAGS: 00010086
RAX: 0000000000000000 RBX: 0000000000000101 RCX: 0000000000000027
RDX: 0000000000000027 RSI: 0000000000000004 RDI: ffff88806d123e88
RBP: ffffffff813c3170 R08: 0000000000000000 R09: ffffed100da247d2
R10: ffffed100da247d1 R11: ffff88806d123e8b R12: ffff88800c00f000
R13: ffff88800d7285c0 R14: ffff88806d0a5580 R15: ffff88800d7285a0
FS: 0000000000000000(0000) GS:ffff8880e5725000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe18e45ea10 CR3: 0000000005e6c000 CR4: 00000000000006f0
Call Trace:
<IRQ>
? kasan_report+0xc6/0xf0
? __pfx_delayed_work_timer_fn+0x10/0x10
? __pfx_delayed_work_timer_fn+0x10/0x10
call_timer_fn+0x25/0x1c0
__run_timer_base.part.0+0x3be/0x8c0
? __pfx_delayed_work_timer_fn+0x10/0x10
? rcu_sched_clock_irq+0xb06/0x27d0
? __pfx___run_timer_base.part.0+0x10/0x10
? try_to_wake_up+0xb15/0x1960
? _raw_spin_lock_irq+0x80/0xe0
? __pfx__raw_spin_lock_irq+0x10/0x10
tmigr_handle_remote_up+0x603/0x7e0
? __pfx_tmigr_handle_remote_up+0x10/0x10
? sched_balance_trigger+0x1c0/0x9f0
? sched_tick+0x221/0x5a0
? _raw_spin_lock_irq+0x80/0xe0
? __pfx__raw_spin_lock_irq+0x10/0x10
? tick_nohz_handler+0x339/0x440
? __pfx_tmigr_handle_remote_up+0x10/0x10
__walk_groups.isra.0+0x42/0x150
tmigr_handle_remote+0x1f4/0x2e0
? __pfx_tmigr_handle_remote+0x10/0x10
? ktime_get+0x60/0x140
? lapic_next_event+0x11/0x20
? clockevents_program_event+0x1d4/0x2a0
? hrtimer_interrupt+0x322/0x780
handle_softirqs+0x16a/0x550
irq_exit_rcu+0xaf/0xe0
sysvec_apic_timer_interrupt+0x70/0x80
</IRQ>
...
The following diagram reveals the cause of the above warning:
CPU 0 (remove) | CPU 1 (delayed work callback)
mscc_ocelot_remove() |
ocelot_deinit() | ocelot_check_stats_work()
ocelot_stats_deinit() |
cancel_delayed_work()| ...
| queue_delayed_work()
destroy_workqueue() | (wait a time)
| __queue_work() //UAF
The above scenario actually constitutes a UAF vulnerability.
The ocelot_stats_deinit() is only invoked when initialization
failure or resource destruction, so we must ensure that any
delayed work items cannot be rescheduled.
Replace cancel_delayed_work() with disable_delayed_work_sync()
to guarantee proper cancellation of the delayed work item and
ensure completion of any currently executing work before the
workqueue is deallocated.
A deadlock concern was considered: ocelot_stats_deinit() is called
in a process context and is not holding any locks that the delayed
work item might also need. Therefore, the use of the _sync() variant
is safe here.
This bug was identified through static analysis. To reproduce the
issue and validate the fix, I simulated ocelot-switch device by
writing a kernel module and prepared the necessary resources for
the virtual ocelot-switch device's probe process. Then, removing
the virtual device will trigger the mscc_ocelot_remove() function,
which in turn destroys the workqueue.
Fixes: a556c76adc ("net: mscc: Add initial Ocelot switch support")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Link: https://patch.msgid.link/20251001011149.55073-1-duoming@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5ac2c02790 ]
Check that the resource which is converted to a surface exists before
trying to use the cursor snooper on it.
vmw_cmd_res_check allows explicit invalid (SVGA3D_INVALID_ID) identifiers
because some svga commands accept SVGA3D_INVALID_ID to mean "no surface",
unfortunately functions that accept the actual surfaces as objects might
(and in case of the cursor snooper, do not) be able to handle null
objects. Make sure that we validate not only the identifier (via the
vmw_cmd_res_check) but also check that the actual resource exists before
trying to do something with it.
Fixes unchecked null-ptr reference in the snooping code.
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Fixes: c0951b797e ("drm/vmwgfx: Refactor resource management")
Reported-by: Kuzey Arda Bulut <kuzeyardabulut@gmail.com>
Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com>
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: Ian Forbes <ian.forbes@broadcom.com>
Link: https://lore.kernel.org/r/20250917153655.1968583-1-zack.rusin@broadcom.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9daa5a8795 ]
Starting with 'commit 2297791c92 ("s390/cio: dont unregister
subchannel from child-drivers")', cio no longer unregisters
subchannels when the attached device is invalid or unavailable.
As an unintended side-effect, the cio_ignore purge function no longer
removes subchannels for devices on the cio_ignore list if no CCW device
is attached. This situation occurs when a CCW device is non-operational
or unavailable
To ensure the same outcome of the purge function as when the
current cio_ignore list had been active during boot, update the purge
function to remove I/O subchannels without working CCW devices if the
associated device number is found on the cio_ignore list.
Fixes: 2297791c92 ("s390/cio: dont unregister subchannel from child-drivers")
Suggested-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 08fdfd260e ]
In xe_hw_engine_group_get_mode(), a write lock is acquired before
calling switch_mode(), which in turn invokes
xe_hw_engine_group_suspend_faulting_lr_jobs().
On failure inside xe_hw_engine_group_suspend_faulting_lr_jobs(),
the write lock is released there, and then again in
xe_hw_engine_group_get_mode(), leading to a double release.
Fix this by keeping both acquire and release operation in
xe_hw_engine_group_get_mode().
Fixes: 770bd1d341 ("drm/xe/hw_engine_group: Ensure safe transition between execution modes")
Cc: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://lore.kernel.org/r/20250925023145.1203004-2-shuicheng.lin@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 662d98b8b373007fa1b08ba93fee11f6fd3e387c)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit aaab61de1f ]
It is allowed to mix Link and Host DMA channels in a way that their index
is different. In this case we would read the LLP from a channel which is
not used or used for other operation.
Such case can be reproduced on cAVS2.5 or ACE1 platforms with soundwire
configuration:
playback to SDW would take Host channel 0 (stream_tag 1) and no Link DMA
used
Second playback to HDMI (HDA) would use Host channel 1 (stream_tag 2) and
Link channel 0 (stream_tag 1).
In this case reading the LLP from channel 2 is incorrect as that is not the
Link channel used for the HDMI playback.
To correct this, we should look up the BE and get the channel used on the
Link side.
Fixes: 67b182bea0 ("ASoC: SOF: Intel: hda: Implement get_stream_position (Linear Link Position)")
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Link: https://patch.msgid.link/20251002074719.2084-6-peter.ujfalusi@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 98662be7ef ]
Init acpi_gbl_use_global_lock to false, in order to void error messages
during boot phase:
ACPI Error: Could not enable GlobalLock event (20240827/evxfevnt-182)
ACPI Error: No response from Global Lock hardware, disabling lock (20240827/evglock-59)
Fixes: 628c3bb40e ("LoongArch: Add boot and setup routines")
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit abb2a55722 ]
Currently, when compiling with GCC, there is no "break 7" instruction
for zero division due to using the option -mno-check-zero-division, but
the compiler still generates "break 0" instruction for zero division.
Here is a simple example:
$ cat test.c
int div(int a)
{
return a / 0;
}
$ gcc -O2 -S test.c -o test.s
GCC generates "break 0" on LoongArch and "ud2" on x86, objtool decodes
"ud2" as INSN_BUG for x86, so decode "break 0" as INSN_BUG can fix the
objtool warnings for LoongArch, but this is not the intention.
When decoding "break 0" as INSN_TRAP in the previous commit, the aim is
to handle "break 0" as a trap. The generated "break 0" for zero division
by GCC is not proper, it should generate a break instruction with proper
bug type, so add the GCC option -fno-isolate-erroneous-paths-dereference
to avoid generating the unexpected "break 0" instruction for now.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202509200413.7uihAxJ5-lkp@intel.com/
Fixes: baad7830ee ("objtool/LoongArch: Mark types based on break immediate code")
Suggested-by: WANG Rui <wangrui@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 60cd16a3b7 ]
During the detaching of Marvell's SAS/SATA controller, the original code
calls cancel_delayed_work() in mvs_free() to cancel the delayed work
item mwq->work_q. However, if mwq->work_q is already running, the
cancel_delayed_work() may fail to cancel it. This can lead to
use-after-free scenarios where mvs_free() frees the mvs_info while
mvs_work_queue() is still executing and attempts to access the
already-freed mvs_info.
A typical race condition is illustrated below:
CPU 0 (remove) | CPU 1 (delayed work callback)
mvs_pci_remove() |
mvs_free() | mvs_work_queue()
cancel_delayed_work() |
kfree(mvi) |
| mvi-> // UAF
Replace cancel_delayed_work() with cancel_delayed_work_sync() to ensure
that the delayed work item is properly canceled and any executing
delayed work item completes before the mvs_info is deallocated.
This bug was found by static analysis.
Fixes: 20b09c2992 ("[SCSI] mvsas: add support for 94xx; layout change; bug fixes")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0b1bb980fd ]
The original commit set all cores in a cluster to a shared policy, but
did not update set_target to apply a frequency change to all cores for
the policy. This caused most cores to remain stuck at their boot
frequency.
Fixes: be4ae8c194 ("cpufreq: tegra186: Share policy per cluster")
Signed-off-by: Aaron Kling <webgeek1234@gmail.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 49ef649110 ]
struct tegra_bpmp::clocks is a pointer to a dynamically allocated array
of pointers to 'struct tegra_bpmp_clk'.
But the size of the allocated area is calculated like it is an array
containing actual 'struct tegra_bpmp_clk' objects - it's not true, there
are just pointers.
Found by Linux Verification Center (linuxtesting.org) with Svace static
analysis tool.
Fixes: 2db12b15c6 ("clk: tegra: Register clocks from root to leaf")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1624dead9a ]
The conditional check for the PLL0 multiplier 'm' used a logical AND
instead of OR, making the range check ineffective. This patch replaces
&& with || to correctly reject invalid values of 'm' that are either
less than or equal to 0 or greater than LPC18XX_PLL0_MSEL_MAX.
This ensures proper bounds checking during clk rate setting and rounding.
Fixes: b04e0b8fd5 ("clk: add lpc18xx cgu clk driver")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
[sboyd@kernel.org: 'm' is unsigned so remove < condition]
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b46a3d323a ]
The round_rate() clk ops is deprecated, so migrate this driver from
round_rate() to determine_rate() using the Coccinelle semantic patch
on the cover letter of this series.
Signed-off-by: Brian Masney <bmasney@redhat.com>
Stable-dep-of: 1624dead9a ("clk: nxp: Fix pll0 rate check condition in LPC18xx CGU driver")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5e121370a7 ]
The `flags` in |struct mtk_mux| are core clk flags, not mux clk flags.
Passing one to the other is wrong.
Since there aren't any actual users adding CLK_MUX_* flags, just drop it
for now.
Fixes: b05ea33143 ("clk: mediatek: clk-mux: Add .determine_rate() callback")
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6c4c26b624 ]
The infrastructure gate for the HDMI specific crystal needs the
top_hdmi_xtal clock to be configured in order to ungate the 26m
clock to the HDMI IP, and it wouldn't work without.
Reparent the infra_ao_hdmi_26m clock to top_hdmi_xtal to fix that.
Fixes: e2edf59dec ("clk: mediatek: Add MT8195 infrastructure clock support")
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 48918cacef ]
The test starts a workload and then opens events. If the events fail
to open, for example because of perf_event_paranoid, the gopipe of the
workload is leaked and the file descriptor leak check fails when the
test exits. To avoid this cancel the workload when opening the events
fails.
Before:
```
$ perf test -vv 7
7: PERF_RECORD_* events & perf_sample fields:
--- start ---
test child forked, pid 1189568
Using CPUID GenuineIntel-6-B7-1
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
config 0xa00000000 (cpu_atom/PERF_COUNT_HW_CPU_CYCLES/)
disabled 1
------------------------------------------------------------
sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8
sys_perf_event_open failed, error -13
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
config 0xa00000000 (cpu_atom/PERF_COUNT_HW_CPU_CYCLES/)
disabled 1
exclude_kernel 1
------------------------------------------------------------
sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 3
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
config 0x400000000 (cpu_core/PERF_COUNT_HW_CPU_CYCLES/)
disabled 1
------------------------------------------------------------
sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8
sys_perf_event_open failed, error -13
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
config 0x400000000 (cpu_core/PERF_COUNT_HW_CPU_CYCLES/)
disabled 1
exclude_kernel 1
------------------------------------------------------------
sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 3
Attempt to add: software/cpu-clock/
..after resolving event: software/config=0/
cpu-clock -> software/cpu-clock/
------------------------------------------------------------
perf_event_attr:
type 1 (PERF_TYPE_SOFTWARE)
size 136
config 0x9 (PERF_COUNT_SW_DUMMY)
sample_type IP|TID|TIME|CPU
read_format ID|LOST
disabled 1
inherit 1
mmap 1
comm 1
enable_on_exec 1
task 1
sample_id_all 1
mmap2 1
comm_exec 1
ksymbol 1
bpf_event 1
{ wakeup_events, wakeup_watermark } 1
------------------------------------------------------------
sys_perf_event_open: pid 1189569 cpu 0 group_fd -1 flags 0x8
sys_perf_event_open failed, error -13
perf_evlist__open: Permission denied
---- end(-2) ----
Leak of file descriptor 6 that opened: 'pipe:[14200347]'
---- unexpected signal (6) ----
iFailed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
#0 0x565358f6666e in child_test_sig_handler builtin-test.c:311
#1 0x7f29ce849df0 in __restore_rt libc_sigaction.c:0
#2 0x7f29ce89e95c in __pthread_kill_implementation pthread_kill.c:44
#3 0x7f29ce849cc2 in raise raise.c:27
#4 0x7f29ce8324ac in abort abort.c:81
#5 0x565358f662d4 in check_leaks builtin-test.c:226
#6 0x565358f6682e in run_test_child builtin-test.c:344
#7 0x565358ef7121 in start_command run-command.c:128
#8 0x565358f67273 in start_test builtin-test.c:545
#9 0x565358f6771d in __cmd_test builtin-test.c:647
#10 0x565358f682bd in cmd_test builtin-test.c:849
#11 0x565358ee5ded in run_builtin perf.c:349
#12 0x565358ee6085 in handle_internal_command perf.c:401
#13 0x565358ee61de in run_argv perf.c:448
#14 0x565358ee6527 in main perf.c:555
#15 0x7f29ce833ca8 in __libc_start_call_main libc_start_call_main.h:74
#16 0x7f29ce833d65 in __libc_start_main@@GLIBC_2.34 libc-start.c:128
#17 0x565358e391c1 in _start perf[851c1]
7: PERF_RECORD_* events & perf_sample fields : FAILED!
```
After:
```
$ perf test 7
7: PERF_RECORD_* events & perf_sample fields : Skip (permissions)
```
Fixes: 16d00fee70 ("perf tests: Move test__PERF_RECORD into separate object")
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.ibm.com>
Cc: Chun-Tse Shao <ctshao@google.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 48314d20fe ]
When not running as root and with higher perf event paranoia values
the perf record LBR tests could fail rather than skipping the
problematic tests.
Add the sensitivity to the test and confirm it passes with paranoia
values from -1 to 2.
Committer testing:
Testing with '$ perf test -vv lbr', i.e. as non root, and then comparing
the output shows the mentioned errors before this patch:
acme@x1:~$ grep -m1 "model name" /proc/cpuinfo
model name : 13th Gen Intel(R) Core(TM) i7-1365U
acme@x1:~$
Before:
132: perf record LBR tests : Skip
After:
132: perf record LBR tests : Ok
Fixes: 32559b99e0 ("perf test: Add set of perf record LBR tests")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Chun-Tse Shao <ctshao@google.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 47b13635da ]
determine_rate() is expected to return an error code, or 0 on success.
clk_sam9x5_peripheral_determine_rate() has a branch that returns the
parent rate on a certain case. This is the behavior of round_rate(),
so let's go ahead and fix this by setting req->rate.
Fixes: b4c115c761 ("clk: at91: clk-peripheral: add support for changeable parent rate")
Reviewed-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: Brian Masney <bmasney@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 8327bd4fcb upstream.
With `CONFIG_TRACE_MMIO_ACCESS=y`, the `{read,write}{b,w,l,q}{_relaxed}()`
mmio accessors unconditionally call `log_{post_}{read,write}_mmio()`
helpers, which in turn call the ftrace ops for `rwmmio` trace events
This adds a performance penalty per mmio accessor call, even when
`rwmmio` events are disabled at runtime (~80% overhead on local
measurement).
Guard these with `tracepoint_enabled()`.
Signed-off-by: Varad Gautam <varadgautam@google.com>
Fixes: 210031971c ("asm-generic/io: Add logging support for MMIO accessors")
Cc: stable@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 95920c2ed0 upstream.
Helge reported that the introduction of PP_MAGIC_MASK let to crashes on
boot on his 32-bit parisc machine. The cause of this is the mask is set
too wide, so the page_pool_page_is_pp() incurs false positives which
crashes the machine.
Just disabling the check in page_pool_is_pp() will lead to the page_pool
code itself malfunctioning; so instead of doing this, this patch changes
the define for PP_DMA_INDEX_BITS to avoid mistaking arbitrary kernel
pointers for page_pool-tagged pages.
The fix relies on the kernel pointers that alias with the pp_magic field
always being above PAGE_OFFSET. With this assumption, we can use the
lowest bit of the value of PAGE_OFFSET as the upper bound of the
PP_DMA_INDEX_MASK, which should avoid the false positives.
Because we cannot rely on PAGE_OFFSET always being a compile-time
constant, nor on it always being >0, we fall back to disabling the
dma_index storage when there are not enough bits available. This leaves
us in the situation we were in before the patch in the Fixes tag, but
only on a subset of architecture configurations. This seems to be the
best we can do until the transition to page types in complete for
page_pool pages.
v2:
- Make sure there's at least 8 bits available and that the PAGE_OFFSET
bit calculation doesn't wrap
Link: https://lore.kernel.org/all/aMNJMFa5fDalFmtn@p100/
Fixes: ee62ce7a1d ("page_pool: Track DMA-mapped pages and unmap them when destroying the pool")
Cc: stable@vger.kernel.org # 6.15+
Tested-by: Helge Deller <deller@gmx.de>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Tested-by: Helge Deller <deller@gmx.de>
Link: https://patch.msgid.link/20250930114331.675412-1-toke@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c1f86d0ac3 upstream.
Massage listmount() and make sure we don't call path_put() under the
namespace semaphore. If we put the last reference we're fscked.
Fixes: b4c2bea8ce ("add listmount(2) syscall")
Cc: stable@vger.kernel.org # v6.8+
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6eb350a223 upstream.
rseq_need_restart() reads and clears task::rseq_event_mask with preemption
disabled to guard against the scheduler.
But membarrier() uses an IPI and sets the PREEMPT bit in the event mask
from the IPI, which leaves that RMW operation unprotected.
Use guard(irq) if CONFIG_MEMBARRIER is enabled to fix that.
Fixes: 2a36ab717e ("rseq/membarrier: Add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5973a62efa upstream.
Since the referenced fixes commit, the kernel's .text section is only
mapped starting from _stext; the region [_text, _stext) is omitted. As a
result, other vmalloc/vmap allocations may use the virtual addresses
nominally in the range [_text, _stext). This address reuse confuses
multiple things:
1. crash_prepare_elf64_headers() sets up a segment in /proc/vmcore
mapping the entire range [_text, _end) to
[__pa_symbol(_text), __pa_symbol(_end)). Reading an address in
[_text, _stext) from /proc/vmcore therefore gives the incorrect
result.
2. Tools doing symbolization (either by reading /proc/kallsyms or based
on the vmlinux ELF file) will incorrectly identify vmalloc/vmap
allocations in [_text, _stext) as kernel symbols.
In practice, both of these issues affect the drgn debugger.
Specifically, there were cases where the vmap IRQ stacks for some CPUs
were allocated in [_text, _stext). As a result, drgn could not get the
stack trace for a crash in an IRQ handler because the core dump
contained invalid data for the IRQ stack address. The stack addresses
were also symbolized as being in the _text symbol.
Fix this by bringing back the mapping of [_text, _stext), but now make
it non-executable and read-only. This prevents other allocations from
using it while still achieving the original goal of not mapping
unpredictable data as executable. Other than the changed protection,
this is effectively a revert of the fixes commit.
Fixes: e2a073dde9 ("arm64: omit [_text, _stext) from permanent kernel mapping")
Cc: stable@vger.kernel.org
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 72d271a7ba upstream.
Userspace generally expects APIs that return -EMSGSIZE to allow for them
to adjust their buffer size and retry the operation. However, the
fscontext log would previously clear the message even in the -EMSGSIZE
case.
Given that it is very cheap for us to check whether the buffer is too
small before we remove the message from the ring buffer, let's just do
that instead. While we're at it, refactor some fscontext_read() into a
separate helper to make the ring buffer logic a bit easier to read.
Fixes: 007ec26cdc ("vfs: Implement logging through fs_context")
Cc: David Howells <dhowells@redhat.com>
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Link: https://lore.kernel.org/20250807-fscontext-log-cleanups-v3-1-8d91d6242dc3@cyphar.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Although the PIO throughput benefits from larger burst sizes, only the
first two DMA channels support a burst size of 8 - the others are capped
at 4. To avoid misconfiguring the PIO hardware, retrieve the actual
max_burst value from the DMA channel's capabilities.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Use the simplified channel property declaration scheme, where the DMA
driver finds the best match based on positive matches with no hard
exclusions.
N.B. The new scheme should give the same results with old and new DTB,
with the old property name still be accepted for backwards-compatibility.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Simplify and improve the channel selection scheme by removing the
concept of excluded channels. Requesting more channels than are
available is a configuration error, but otherwise it is better to get
a non-ideal channel than no channel at all.
Give each channel a score based on how well it matches the client's
requirements, then attempt to claim them in order of decreasing score.
A channel which possesses a required flag is scored more highly than one
that doesn't, but not having a trait that isn't wanted also contributes
to a channel's score, just less so.
By walking the list until the end, even channels that don't match at all
will be allocated if there is enough demand.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
This patch adds support for external FSIN-triggered snapshot mode
to the OmniVision OV9282 sensor driver. It enables frame capture
synchronized with an external hardware trigger signal.
Signed-off-by: Omer Faruk Edemen <ofedemen@lectrontech.com>
commit e9c206324e upstream.
The cdnsp-pci driver uses pcim_enable_device() to enable a PCI device,
which means the device will be automatically disabled on driver detach
through the managed device framework. The manual pci_disable_device()
call in the error path is therefore redundant.
Found via static anlaysis and this is similar to commit 99ca0b57e4
("thermal: intel: int340x: processor: Fix warning during module unload").
Fixes: 3d82904559 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver")
Cc: stable@vger.kernel.org
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Link: https://lore.kernel.org/r/20250903141613.2535472-1-linmq006@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit be5ae730ff upstream.
Right now the interrupt handler first reads all updated status registers
and only then clears the interrupts. It's possible that a duplicate
interrupt for a changed register or plug state comes in after the
interrupts have been processed but before they have been cleared:
* plug is inserted, TPS_REG_INT_PLUG_EVENT is set
* TPS_REG_INT_EVENT1 is read
* tps6598x_handle_plug_event() has run and registered the plug
* plug is removed again, TPS_REG_INT_PLUG_EVENT is set (again)
* TPS_REG_INT_CLEAR1 is written, TPS_REG_INT_PLUG_EVENT is cleared
We then have no plug connected and no pending interrupt but the tipd
core still thinks there is a plug. It's possible to trigger this with
e.g. a slightly broken Type-C to USB A converter.
Fix this by first clearing the interrupts and only then reading the
updated registers.
Fixes: 45188f27b3 ("usb: typec: tipd: Add support for Apple CD321X")
Fixes: 0a4c005bd1 ("usb: typec: driver for TI TPS6598x USB Power Delivery controllers")
Cc: stable@kernel.org
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Neal Gompa <neal@gompa.dev>
Signed-off-by: Sven Peter <sven@kernel.org>
Link: https://lore.kernel.org/r/20250914-apple-usb3-tipd-v1-1-4e99c8649024@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3d3c4cd5c6 upstream.
Prevent USB runtime PM (autosuspend) for AX88772* in bind.
usbnet enables runtime PM (autosuspend) by default, so disabling it via
the usb_driver flag is ineffective. On AX88772B, autosuspend shows no
measurable power saving with current driver (no link partner, admin
up/down). The ~0.453 W -> ~0.248 W drop on v6.1 comes from phylib powering
the PHY off on admin-down, not from USB autosuspend.
The real hazard is that with runtime PM enabled, ndo_open() (under RTNL)
may synchronously trigger autoresume (usb_autopm_get_interface()) into
asix_resume() while the USB PM lock is held. Resume paths then invoke
phylink/phylib and MDIO, which also expect RTNL, leading to possible
deadlocks or PM lock vs MDIO wake issues.
To avoid this, keep the device runtime-PM active by taking a usage
reference in ax88772_bind() and dropping it in unbind(). A non-zero PM
usage count blocks runtime suspend regardless of userspace policy
(.../power/control - pm_runtime_allow/forbid), making this approach
robust against sysfs overrides.
Holding a runtime-PM usage ref does not affect system-wide suspend;
system sleep/resume callbacks continue to run as before.
Fixes: 4a2c7217cd ("net: usb: asix: ax88772: manage PHY PM from MAC")
Reported-by: Hubert Wiśniewski <hubert.wisniewski.25632@gmail.com>
Closes: https://lore.kernel.org/all/DCGHG5UJT9G3.2K1GHFZ3H87T0@gmail.com
Tested-by: Hubert Wiśniewski <hubert.wisniewski.25632@gmail.com>
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Closes: https://lore.kernel.org/all/b5ea8296-f981-445d-a09a-2f389d7f6fdd@samsung.com
Cc: stable@vger.kernel.org
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20251005081203.3067982-1-o.rempel@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c04db81cd0 upstream.
A buffer overflow vulnerability exists in the USB 9pfs transport layer
where inconsistent size validation between packet header parsing and
actual data copying allows a malicious USB host to overflow heap buffers.
The issue occurs because:
- usb9pfs_rx_header() validates only the declared size in packet header
- usb9pfs_rx_complete() uses req->actual (actual received bytes) for
memcpy
This allows an attacker to craft packets with small declared size
(bypassing validation) but large actual payload (triggering overflow
in memcpy).
Add validation in usb9pfs_rx_complete() to ensure req->actual does not
exceed the buffer capacity before copying data.
Reported-by: Yuhao Jiang <danisjiang@gmail.com>
Closes: https://lkml.kernel.org/r/20250616132539.63434-1-danisjiang@gmail.com
Fixes: a3be076dc1 ("net/9p/usbg: Add new usb gadget function transport")
Cc: stable@vger.kernel.org
Message-ID: <20250622-9p-usb_overflow-v3-1-ab172691b946@codewreck.org>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4002ee98c0 upstream.
While the API contract in docs doesn't specify it explicitly, the
generic implementation of the get_function_name() callback from struct
pinmux_ops - pinmux_generic_get_function_name() - can fail and return
NULL. This is already checked in pinmux_check_ops() so add a similar
check in pinmux_func_name_to_selector() instead of passing the returned
pointer right down to strcmp() where the NULL can get dereferenced. This
is normal operation when adding new pinfunctions.
Cc: stable@vger.kernel.org
Tested-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6df164e29b upstream.
In xdr_stream_decode_opaque_auth(), zero-length checksum.len causes
checksum.data to be set to NULL. This triggers a NPD when accessing
checksum.data in gss_krb5_verify_mic_v2(). This patch ensures that
the value of checksum.len is not less than XDR_UNIT.
Fixes: 0653028e8f ("SUNRPC: Convert gss_verify_header() to use xdr_stream")
Cc: stable@kernel.org
Signed-off-by: Lei Lu <llfamsec@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d3366a0477 upstream.
Struct ff_effect_compat is embedded twice inside
uinput_ff_upload_compat, contains internal padding. In particular, there
is a hole after struct ff_replay to satisfy alignment requirements for
the following union member. Without clearing the structure,
copy_to_user() may leak stack data to userspace.
Initialize ff_up_compat to zero before filling valid fields.
Fixes: 2d56f3a32c ("Input: refactor evdev 32bit compat to be shareable with uinput")
Cc: stable@vger.kernel.org
Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
Link: https://lore.kernel.org/r/20250928063737.74590-1-zhen.ni@easystack.cn
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a9e6aa9949 upstream.
devm_kcalloc() may fail. ndtest_probe() allocates three DMA address
arrays (dcr_dma, label_dma, dimm_dma) and later unconditionally uses
them in ndtest_nvdimm_init(), which can lead to a NULL pointer
dereference under low-memory conditions.
Check all three allocations and return -ENOMEM if any allocation fails,
jumping to the common error path. Do not emit an extra error message
since the allocator already warns on allocation failure.
Fixes: 9399ab61ad ("ndtest: Add dimms to the two buses")
Cc: stable@vger.kernel.org
Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0910dd7c9a upstream.
Skip the WRMSR and HLT fastpaths in SVM's VM-Exit handler if the next RIP
isn't valid, e.g. because KVM is running with nrips=false. SVM must
decode and emulate to skip the instruction if the CPU doesn't provide the
next RIP, and getting the instruction bytes to decode requires reading
guest memory. Reading guest memory through the emulator can fault, i.e.
can sleep, which is disallowed since the fastpath handlers run with IRQs
disabled.
BUG: sleeping function called from invalid context at ./include/linux/uaccess.h:106
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 32611, name: qemu
preempt_count: 1, expected: 0
INFO: lockdep is turned off.
irq event stamp: 30580
hardirqs last enabled at (30579): [<ffffffffc08b2527>] vcpu_run+0x1787/0x1db0 [kvm]
hardirqs last disabled at (30580): [<ffffffffb4f62e32>] __schedule+0x1e2/0xed0
softirqs last enabled at (30570): [<ffffffffb4247a64>] fpu_swap_kvm_fpstate+0x44/0x210
softirqs last disabled at (30568): [<ffffffffb4247a64>] fpu_swap_kvm_fpstate+0x44/0x210
CPU: 298 UID: 0 PID: 32611 Comm: qemu Tainted: G U 6.16.0-smp--e6c618b51cfe-sleep #782 NONE
Tainted: [U]=USER
Hardware name: Google Astoria-Turin/astoria, BIOS 0.20241223.2-0 01/17/2025
Call Trace:
<TASK>
dump_stack_lvl+0x7d/0xb0
__might_resched+0x271/0x290
__might_fault+0x28/0x80
kvm_vcpu_read_guest_page+0x8d/0xc0 [kvm]
kvm_fetch_guest_virt+0x92/0xc0 [kvm]
__do_insn_fetch_bytes+0xf3/0x1e0 [kvm]
x86_decode_insn+0xd1/0x1010 [kvm]
x86_emulate_instruction+0x105/0x810 [kvm]
__svm_skip_emulated_instruction+0xc4/0x140 [kvm_amd]
handle_fastpath_invd+0xc4/0x1a0 [kvm]
vcpu_run+0x11a1/0x1db0 [kvm]
kvm_arch_vcpu_ioctl_run+0x5cc/0x730 [kvm]
kvm_vcpu_ioctl+0x578/0x6a0 [kvm]
__se_sys_ioctl+0x6d/0xb0
do_syscall_64+0x8a/0x2c0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7f479d57a94b
</TASK>
Note, this is essentially a reapply of commit 5c30e8101e ("KVM: SVM:
Skip WRMSR fastpath on VM-Exit if next RIP isn't valid"), but with
different justification (KVM now grabs SRCU when skipping the instruction
for other reasons).
Fixes: b439eb8ab5 ("Revert "KVM: SVM: Skip WRMSR fastpath on VM-Exit if next RIP isn't valid"")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250805190526.1453366-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit acf943e976 upstream.
When orphan file feature is enabled, inode can be tracked as orphan
either in the standard orphan list or in the orphan file. The first can
be tested by checking ei->i_orphan list head, the second is recorded by
EXT4_STATE_ORPHAN_FILE inode state flag. There are several places where
we want to check whether inode is tracked as orphan and only some of
them properly check for both possibilities. Luckily the consequences are
mostly minor, the worst that can happen is that we track an inode as
orphan although we don't need to and e2fsck then complains (resulting in
occasional ext4/307 xfstest failures). Fix the problem by introducing a
helper for checking whether an inode is tracked as orphan and use it in
appropriate places.
Fixes: 4a79a98c7b ("ext4: Improve scalability of ext4 orphan file handling")
Cc: stable@kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Message-ID: <20250925123038.20264-2-jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 88daf2f448 upstream.
If client doesn't negotiate with SMB3.1.1 POSIX Extensions,
then proper error code won't be returned due to overwriting.
Return error immediately.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: e2f34481b2 ("cifsd: add server-side procedures for SMB3")
Cc: stable@vger.kernel.org
Signed-off-by: Matvey Kovalev <matvey.kovalev@ispras.ru>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 305853cce3 upstream.
The 'sess->rpc_handle_list' XArray manages RPC handles within a ksmbd
session. Access to this list is intended to be protected by
'sess->rpc_lock' (an rw_semaphore). However, the locking implementation was
flawed, leading to potential race conditions.
In ksmbd_session_rpc_open(), the code incorrectly acquired only a read lock
before calling xa_store() and xa_erase(). Since these operations modify
the XArray structure, a write lock is required to ensure exclusive access
and prevent data corruption from concurrent modifications.
Furthermore, ksmbd_session_rpc_method() accessed the list using xa_load()
without holding any lock at all. This could lead to reading inconsistent
data or a potential use-after-free if an entry is concurrently removed and
the pointer is dereferenced.
Fix these issues by:
1. Using down_write() and up_write() in ksmbd_session_rpc_open()
to ensure exclusive access during XArray modification, and ensuring
the lock is correctly released on error paths.
2. Adding down_read() and up_read() in ksmbd_session_rpc_method()
to safely protect the lookup.
Fixes: a1f46c99d9 ("ksmbd: fix use-after-free in ksmbd_session_rpc_open")
Fixes: b685757c7b ("ksmbd: Implements sess->rpc_handle_list as xarray")
Cc: stable@vger.kernel.org
Signed-off-by: Yunseong Kim <ysk@kzalloc.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c8168b4faf upstream.
Automatically disable kaslr when the kernel loads from kexec_file.
kexec_file loads the secondary kernel image to a non-linked address,
inherently providing KASLR-like randomization.
However, on LoongArch where System RAM may be non-contiguous, enabling
KASLR for the second kernel may relocate it to an invalid memory region
and cause a boot failure. Thus, we disable KASLR when "kexec_file" is
detected in the command line.
To ensure compatibility with older kernels loaded via kexec_file, this
patch should be backported to stable branches.
Cc: stable@vger.kernel.org
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8d33a030c5 upstream.
There is a race condition between dm device suspend and table load that
can lead to null pointer dereference. The issue occurs when suspend is
invoked before table load completes:
BUG: kernel NULL pointer dereference, address: 0000000000000054
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 6 PID: 6798 Comm: dmsetup Not tainted 6.6.0-g7e52f5f0ca9b #62
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014
RIP: 0010:blk_mq_wait_quiesce_done+0x0/0x50
Call Trace:
<TASK>
blk_mq_quiesce_queue+0x2c/0x50
dm_stop_queue+0xd/0x20
__dm_suspend+0x130/0x330
dm_suspend+0x11a/0x180
dev_suspend+0x27e/0x560
ctl_ioctl+0x4cf/0x850
dm_ctl_ioctl+0xd/0x20
vfs_ioctl+0x1d/0x50
__se_sys_ioctl+0x9b/0xc0
__x64_sys_ioctl+0x19/0x30
x64_sys_call+0x2c4a/0x4620
do_syscall_64+0x9e/0x1b0
The issue can be triggered as below:
T1 T2
dm_suspend table_load
__dm_suspend dm_setup_md_queue
dm_mq_init_request_queue
blk_mq_init_allocated_queue
=> q->mq_ops = set->ops; (1)
dm_stop_queue / dm_wait_for_completion
=> q->tag_set NULL pointer! (2)
=> q->tag_set = set; (3)
Fix this by checking if a valid table (map) exists before performing
request-based suspend and waiting for target I/O. When map is NULL,
skip these table-dependent suspend steps.
Even when map is NULL, no I/O can reach any target because there is
no table loaded; I/O submitted in this state will fail early in the
DM layer. Skipping the table-dependent suspend logic in this case
is safe and avoids NULL pointer dereferences.
Fixes: c4576aed8d ("dm: fix request-based dm's use of dm_wait_for_completion")
Cc: stable@vger.kernel.org
Signed-off-by: Zheng Qixing <zhengqixing@huawei.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7f597c2cdb upstream.
When suspend and load run concurrently, before q->mq_ops is set in
blk_mq_init_allocated_queue(), __dm_suspend() skip dm_stop_queue(). As a
result, the queue's quiesce depth is not incremented.
Later, once table load has finished and __dm_resume() runs, which triggers
q->quiesce_depth ==0 warning in blk_mq_unquiesce_queue():
Call Trace:
<TASK>
dm_start_queue+0x16/0x20 [dm_mod]
__dm_resume+0xac/0xb0 [dm_mod]
dm_resume+0x12d/0x150 [dm_mod]
do_resume+0x2c2/0x420 [dm_mod]
dev_suspend+0x30/0x130 [dm_mod]
ctl_ioctl+0x402/0x570 [dm_mod]
dm_ctl_ioctl+0x23/0x30 [dm_mod]
Fix this by explicitly tracking whether the request queue was
stopped in __dm_suspend() via a new DMF_QUEUE_STOPPED flag.
Only call dm_start_queue() in __dm_resume() if the queue was
actually stopped.
Fixes: e70feb8b3e ("blk-mq: support concurrent queue quiesce/unquiesce")
Cc: stable@vger.kernel.org
Signed-off-by: Zheng Qixing <zhengqixing@huawei.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9c328f5474 upstream.
Syzbot reported an uninitialized value bug in nci_init_req, which was
introduced by commit 5aca7966d2 ("Merge tag
'perf-tools-fixes-for-v6.17-2025-09-16' of
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools").
This bug arises due to very limited and poor input validation
that was done at nic_valid_size(). This validation only
validates the skb->len (directly reflects size provided at the
userspace interface) with the length provided in the buffer
itself (interpreted as NCI_HEADER). This leads to the processing
of memory content at the address assuming the correct layout
per what opcode requires there. This leads to the accesses to
buffer of `skb_buff->data` which is not assigned anything yet.
Following the same silent drop of packets of invalid sizes at
`nic_valid_size()`, add validation of the data in the respective
handlers and return error values in case of failure. Release
the skb if error values are returned from handlers in
`nci_nft_packet` and effectively do a silent drop
Possible TODO: because we silently drop the packets, the
call to `nci_request` will be waiting for completion of request
and will face timeouts. These timeouts can get excessively logged
in the dmesg. A proper handling of them may require to export
`nci_request_cancel` (or propagate error handling from the
nft packets handlers).
Reported-by: syzbot+740e04c2a93467a0f8c8@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=740e04c2a93467a0f8c8
Fixes: 6a2968aaf5 ("NFC: basic NCI protocol implementation")
Tested-by: syzbot+740e04c2a93467a0f8c8@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Deepak Sharma <deepak.sharma.472935@gmail.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250925132846.213425-1-deepak.sharma.472935@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3bd5e45c2c upstream.
When parsing Allocation Extent Descriptor, lengthAllocDescs comes from
on-disk data and must be validated against the block size. Crafted or
corrupted images may set lengthAllocDescs so that the total descriptor
length (sizeof(allocExtDesc) + lengthAllocDescs) exceeds the buffer,
leading udf_update_tag() to call crc_itu_t() on out-of-bounds memory and
trigger a KASAN use-after-free read.
BUG: KASAN: use-after-free in crc_itu_t+0x1d5/0x2b0 lib/crc-itu-t.c:60
Read of size 1 at addr ffff888041e7d000 by task syz-executor317/5309
CPU: 0 UID: 0 PID: 5309 Comm: syz-executor317 Not tainted 6.12.0-rc4-syzkaller-00261-g850925a8133c #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:377 [inline]
print_report+0x169/0x550 mm/kasan/report.c:488
kasan_report+0x143/0x180 mm/kasan/report.c:601
crc_itu_t+0x1d5/0x2b0 lib/crc-itu-t.c:60
udf_update_tag+0x70/0x6a0 fs/udf/misc.c:261
udf_write_aext+0x4d8/0x7b0 fs/udf/inode.c:2179
extent_trunc+0x2f7/0x4a0 fs/udf/truncate.c:46
udf_truncate_tail_extent+0x527/0x7e0 fs/udf/truncate.c:106
udf_release_file+0xc1/0x120 fs/udf/file.c:185
__fput+0x23f/0x880 fs/file_table.c:431
task_work_run+0x24f/0x310 kernel/task_work.c:239
exit_task_work include/linux/task_work.h:43 [inline]
do_exit+0xa2f/0x28e0 kernel/exit.c:939
do_group_exit+0x207/0x2c0 kernel/exit.c:1088
__do_sys_exit_group kernel/exit.c:1099 [inline]
__se_sys_exit_group kernel/exit.c:1097 [inline]
__x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1097
x64_sys_call+0x2634/0x2640 arch/x86/include/generated/asm/syscalls_64.h:232
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>
Validate the computed total length against epos->bh->b_size.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Reported-by: syzbot+8743fca924afed42f93e@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=8743fca924afed42f93e
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Larshin Sergey <Sergey.Larshin@kaspersky.com>
Link: https://patch.msgid.link/20250922131358.745579-1-Sergey.Larshin@kaspersky.com
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4e65bda827 upstream.
wcd934x_codec_parse_data() contains a device reference count leak in
of_slim_get_device() where device_find_child() increases the reference
count of the device but this reference is not properly decreased in
the success path. Add put_device() in wcd934x_codec_parse_data() and
add devm_add_action_or_reset() in the probe function, which ensures
that the reference count of the device is correctly managed.
Memory leak in regmap_init_slimbus() as the allocated regmap is not
released when the device is removed. Using devm_regmap_init_slimbus()
instead of regmap_init_slimbus() to ensure automatic regmap cleanup on
device removal.
Calling path: of_slim_get_device() -> of_find_slim_device() ->
device_find_child(). As comment of device_find_child() says, 'NOTE:
you will need to drop the reference with put_device() after use.'.
Found by code review.
Cc: stable@vger.kernel.org
Fixes: a61f3b4f47 ("ASoC: wcd934x: add support to wcd9340/wcd9341 codec")
Signed-off-by: Ma Ke <make24@iscas.ac.cn>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://patch.msgid.link/20250923065212.26660-1-make24@iscas.ac.cn
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b15b7d2a1b upstream.
Remove the logic to set interrupt mask by default in uio_hv_generic
driver as the interrupt mask value is supposed to be controlled
completely by the user space. If the mask bit gets changed
by the driver, concurrently with user mode operating on the ring,
the mask bit may be set when it is supposed to be clear, and the
user-mode driver will miss an interrupt which will cause a hang.
For eg- when the driver sets inbound ring buffer interrupt mask to 1,
the host does not interrupt the guest on the UIO VMBus channel.
However, setting the mask does not prevent the host from putting a
message in the inbound ring buffer. So let’s assume that happens,
the host puts a message into the ring buffer but does not interrupt.
Subsequently, the user space code in the guest sets the inbound ring
buffer interrupt mask to 0, saying “Hey, I’m ready for interrupts”.
User space code then calls pread() to wait for an interrupt.
Then one of two things happens:
* The host never sends another message. So the pread() waits forever.
* The host does send another message. But because there’s already a
message in the ring buffer, it doesn’t generate an interrupt.
This is the correct behavior, because the host should only send an
interrupt when the inbound ring buffer transitions from empty to
not-empty. Adding an additional message to a ring buffer that is not
empty is not supposed to generate an interrupt on the guest.
Since the guest is waiting in pread() and not removing messages from
the ring buffer, the pread() waits forever.
This could be easily reproduced in hv_fcopy_uio_daemon if we delay
setting interrupt mask to 0.
Similarly if hv_uio_channel_cb() sets the interrupt_mask to 1,
there’s a race condition. Once user space empties the inbound ring
buffer, but before user space sets interrupt_mask to 0, the host could
put another message in the ring buffer but it wouldn’t interrupt.
Then the next pread() would hang.
Fix these by removing all instances where interrupt_mask is changed,
while keeping the one in set_event() unchanged to enable userspace
control the interrupt mask by writing 0/1 to /dev/uioX.
Fixes: 95096f2fbd ("uio-hv-generic: new userspace i/o driver for VMBus")
Suggested-by: John Starks <jostarks@microsoft.com>
Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
Cc: stable@vger.kernel.org
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Long Li <longli@microsoft.com>
Reviewed-by: Tianyu Lan <tiala@microsoft.com>
Tested-by: Tianyu Lan <tiala@microsoft.com>
Link: https://lore.kernel.org/r/20250828044200.492030-1-namjain@linux.microsoft.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 74058c0a9f upstream.
Syzkaller reports a "KMSAN: uninit-value in squashfs_get_parent" bug.
This is caused by open_by_handle_at() being called with a file handle
containing an invalid parent inode number. In particular the inode number
is that of a symbolic link, rather than a directory.
Squashfs_get_parent() gets called with that symbolic link inode, and
accesses the parent member field.
unsigned int parent_ino = squashfs_i(inode)->parent;
Because non-directory inodes in Squashfs do not have a parent value, this
is uninitialised, and this causes an uninitialised value access.
The fix is to initialise parent with the invalid inode 0, which will cause
an EINVAL error to be returned.
Regular inodes used to share the parent field with the block_list_start
field. This is removed in this commit to enable the parent field to
contain the invalid inode number 0.
Link: https://lkml.kernel.org/r/20250918233308.293861-1-phillip@squashfs.org.uk
Fixes: 122601408d ("Squashfs: export operations")
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reported-by: syzbot+157bdef5cf596ad0da2c@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/68cc2431.050a0220.139b6.0001.GAE@google.com/
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4bddf4587c upstream.
After reading all the feedback, right now disabling the TPM2_TCG_HMAC
is the right call.
Other views discussed:
A. Having a kernel command-line parameter or refining the feature
otherwise. This goes to the area of improvements. E.g., one
example is my own idea where the null key specific code would be
replaced with a persistent handle parameter (which can be
*unambigously* defined as part of attestation process when
done correctly).
B. Removing the code. I don't buy this because that is same as saying
that HMAC encryption cannot work at all (if really nitpicking) in
any form. Also I disagree on the view that the feature could not
be refined to something more reasoable.
Also, both A and B are worst options in terms of backporting.
Thuss, this is the best possible choice.
Cc: stable@vger.kernel.or # v6.10+
Fixes: d2add27cf2 ("tpm: Add NULL primary creation")
Suggested-by: Chris Fenner <cfenn@google.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 55c0ced59f ]
When verifying BPF programs, the check_alu_op() function validates
instructions with ALU operations. The 'offset' field in these
instructions is a signed 16-bit integer.
The existing check 'insn->off > 1' was intended to ensure the offset is
either 0, or 1 for BPF_MOD/BPF_DIV. However, because 'insn->off' is
signed, this check incorrectly accepts all negative values (e.g., -1).
This commit tightens the validation by changing the condition to
'(insn->off != 0 && insn->off != 1)'. This ensures that any value
other than the explicitly permitted 0 and 1 is rejected, hardening the
verifier against malformed BPF programs.
Co-developed-by: Shenghao Yuan <shenghaoyuan0928@163.com>
Signed-off-by: Shenghao Yuan <shenghaoyuan0928@163.com>
Co-developed-by: Tianci Cao <ziye@zju.edu.cn>
Signed-off-by: Tianci Cao <ziye@zju.edu.cn>
Signed-off-by: Yazhou Tang <tangyazhou518@outlook.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Fixes: ec0e2da95f ("bpf: Support new signed div/mod instructions.")
Link: https://lore.kernel.org/r/tencent_70D024BAE70A0A309A4781694C7B764B0608@qq.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6f5dacf88a ]
This reverts commit ceddedc969.
Commit in question breaks the mapping of PGs to pools for some SKUs.
Specifically multi-host NICs seem to be shipped with a custom buffer
configuration which maps the lossy PG to pool 4. But the bad commit
overrides this with pool 0 which does not have sufficient buffer space
reserved. Resulting in ~40% packet loss. The commit also breaks BMC /
OOB connection completely (100% packet loss).
Revert, similarly to commit 3fbfe251cc ("Revert "net/mlx5e: Update and
set Xon/Xoff upon port speed set""). The breakage is exactly the same,
the only difference is that quoted commit would break the NIC immediately
on boot, and the currently reverted commit only when MTU is changed.
Note: "good" kernels do not restore the configuration, so downgrade isn't
enough to recover machines. A NIC power cycle seems to be necessary to
return to a healthy state (or overriding the relevant registers using
a custom patch).
Fixes: ceddedc969 ("net/mlx5e: Update and set Xon/Xoff upon MTU set")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250929181529.1848157-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5cfbe7ebfa ]
Add sync reset timeout to stop poll_sync_reset in case there was no
reset done or abort event within timeout. Otherwise poll sync reset will
just continue and in case of fw fatal error no health reporting will be
done.
Fixes: 38b9f903f2 ("net/mlx5: Handle sync reset request event")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 79a0e32b32 ]
The reclaim_pages_cmd() function sends a command to the firmware to
reclaim pages if the command interface is active.
A race condition can occur if the command interface goes down (e.g., due
to a PCI error) while the mlx5_cmd_do() call is in flight. In this
case, mlx5_cmd_do() will return an error. The original code would
propagate this error immediately, bypassing the software-based page
reclamation logic that is supposed to run when the command interface is
down.
Fix this by checking whether mlx5_cmd_do() returns -ENXIO, which mark
that command interface is down. If this is the case, fall through to
the software reclamation path. If the command failed for any another
reason, or finished successfully, return as before.
Fixes: b898ce7bcc ("net/mlx5: cmdif, Avoid skipping reclaim pages if FW is not accessible")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b1f0349bd6 ]
Stop polling on firmware response to command in polling mode if the
command interface got down. This situation can occur, for example, if a
firmware fatal error is detected during polling.
This change halts the polling process when the command interface goes
down, preventing unnecessary waits.
Fixes: b898ce7bcc ("net/mlx5: cmdif, Avoid skipping reclaim pages if FW is not accessible")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8169a6011c ]
The driver did not handle failure of `netdev_alloc_skb_ip_align()`.
If the allocation failed, dereferencing `skb->protocol` could lead to
a NULL pointer dereference.
This patch tries to allocate `skb`. If the allocation fails, it falls
back to the normal path.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Tested-on: D-Link DGE-550T Rev-A3
Signed-off-by: Yeounsu Moon <yyyynoom@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250928190124.1156-1-yyyynoom@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f017156aea ]
In EC2 instances where the RSS hash key is not configurable, ethtool
shows bogus RSS hash key since ena_get_rxfh_key_size() unconditionally
returns ENA_HASH_KEY_SIZE.
Commit 6a4f7dc82d ("net: ena: rss: do not allocate key when not
supported") added proper handling for devices that don't support RSS
hash key configuration, but ena_get_rxfh_key_size() has been unchanged.
When the RSS hash key is not configurable, return 0 instead of
ENA_HASH_KEY_SIZE to clarify getting the value is not supported.
Tested on m5 instance families.
Without patch:
# ethtool -x ens5 | grep -A 1 "RSS hash key"
RSS hash key:
00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
With patch:
# ethtool -x ens5 | grep -A 1 "RSS hash key"
RSS hash key:
Operation not supported
Fixes: 6a4f7dc82d ("net: ena: rss: do not allocate key when not supported")
Signed-off-by: Kohei Enju <enjuk@amazon.com>
Link: https://patch.msgid.link/20250929050247.51680-1-enjuk@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8425161ac1 ]
The nfp_net_get_rxfh_key_size() function returns -EOPNOTSUPP when
devices don't support RSS, and callers treat the negative value as a
large positive value since the return type is u32.
Return 0 when devices don't support RSS, aligning with the ethtool
interface .get_rxfh_key_size() that requires returning 0 in such cases.
Fixes: 9ff304bfaf ("nfp: add support for reporting CRC32 hash function")
Signed-off-by: Kohei Enju <enjuk@amazon.com>
Link: https://patch.msgid.link/20250929054230.68120-1-enjuk@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8ed4728eb9 ]
In case of a jump to the err label due to atmel_nand_create() or
atmel_nand_controller_add_nand() failure, the reference to nand_np
need to be released
Use for_each_child_of_node_scoped() to fix the issue.
Fixes: f88fc122cc ("mtd: nand: Cleanup/rework the atmel_nand driver")
Signed-off-by: Erick Karanja <karanja99erick@gmail.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 612b1dfeb4 ]
Fix division by zero in ks_sa_rng_init caused by missing clock
pointer initialization. The clk_get_rate() call is performed on
an uninitialized clk pointer, resulting in division by zero when
calculating delay values.
Add clock initialization code before using the clock.
Fixes: 6d01d8511d ("hwrng: ks-sa - Add minimum sleep time before ready-polling")
Signed-off-by: Nishanth Menon <nm@ti.com>
drivers/char/hw_random/ks-sa-rng.c | 7 +++++++
1 file changed, 7 insertions(+)
Reviewed-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5851afffe2 ]
Fix the X.509 Basic Constraints CA flag parsing to correctly handle
the ASN.1 DER encoded structure. The parser was incorrectly treating
the length field as the boolean value.
Per RFC 5280 section 4.1, X.509 certificates must use ASN.1 DER encoding.
According to ITU-T X.690, a DER-encoded BOOLEAN is represented as:
Tag (0x01), Length (0x01), Value (0x00 for FALSE, 0xFF for TRUE)
The basicConstraints extension with CA:TRUE is encoded as:
SEQUENCE (0x30) | Length | BOOLEAN (0x01) | Length (0x01) | Value (0xFF)
^-- v[2] ^-- v[3] ^-- v[4]
The parser was checking v[3] (the length field, always 0x01) instead
of v[4] (the actual boolean value, 0xFF for TRUE in DER encoding).
Also handle the case where the extension is an empty SEQUENCE (30 00),
which is valid for CA:FALSE when the default value is omitted as
required by DER encoding rules (X.690 section 11.5).
Per ITU-T X.690-0207:
- Section 11.5: Default values must be omitted in DER
- Section 11.1: DER requires TRUE to be encoded as 0xFF
Link: https://datatracker.ietf.org/doc/html/rfc5280
Link: https://www.itu.int/ITU-T/studygroups/com17/languages/X.690-0207.pdf
Fixes: 30eae2b037 ("KEYS: X.509: Parse Basic Constraints for CA")
Signed-off-by: Fan Wu <wufan@kernel.org>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 03ddb4ac25 ]
When creating an advertisement for BIG the address shall not be
non-resolvable since in case of acting as BASS/Broadcast Assistant the
address must be the same as the connection in order to use the PAST
method and even when PAST/BASS are not in the picture a Periodic
Advertisement can still be synchronized thus the same argument as to
connectable advertisements still stand.
Fixes: eca0ae4aea ("Bluetooth: Add initial implementation of BIS connections")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5bf863f4c5 ]
For ISO_CONT RX, the data from skb is copied to conn->rx_skb, but the
skb is leaked.
Free skb after copying its data.
Fixes: ccf74f2390 ("Bluetooth: Add BTPROTO_ISO socket type")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6ba85da580 ]
If iso_conn is freed when RX is incomplete, free any leftover skb piece.
Fixes: dc26097bdb ("Bluetooth: ISO: Use kref to track lifetime of iso_conn")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9950f095d6 ]
This attempt to fix similar issue to sco_conn_free where if the
conn->sk is not set to NULL may lead to UAF on iso_conn_free.
Fixes: ccf74f2390 ("Bluetooth: Add BTPROTO_ISO socket type")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 79e562a52a ]
The debug UUID was only getting set if MGMT_OP_READ_EXP_FEATURES_INFO
was not called with a specific index which breaks the likes of
bluetoothd since it only invokes MGMT_OP_READ_EXP_FEATURES_INFO when an
adapter is plugged, so instead of depending hdev not to be set just
enable the UUID on any index like it was done with iso_sock_uuid.
Fixes: e625e50cee ("Bluetooth: Introduce debug feature when dynamic debug is disabled")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 958baf5eae ]
syzbot reported WARNING in rtl8150_start_xmit/usb_submit_urb.
This is the sequence of events that leads to the warning:
rtl8150_start_xmit() {
netif_stop_queue();
usb_submit_urb(dev->tx_urb);
}
rtl8150_set_multicast() {
netif_stop_queue();
netif_wake_queue(); <-- wakes up TX queue before URB is done
}
rtl8150_start_xmit() {
netif_stop_queue();
usb_submit_urb(dev->tx_urb); <-- double submission
}
rtl8150_set_multicast being the ndo_set_rx_mode callback should not be
calling netif_stop_queue and notif_start_queue as these handle
TX queue synchronization.
The net core function dev_set_rx_mode handles the synchronization
for rtl8150_set_multicast making it safe to remove these locks.
Reported-and-tested-by: syzbot+78cae3f37c62ad092caa@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=78cae3f37c62ad092caa
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Tested-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: I Viswanath <viswanathiyyappan@gmail.com>
Link: https://patch.msgid.link/20250924134350.264597-1-viswanathiyyappan@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fdd0fe94d6 ]
In siw_post_send(), any immediate error encountered during processing of
the work request list must be reported to the caller, even if previous
work requests in that list were just accepted and added to the send queue.
Not reporting those errors confuses the caller, which would wait
indefinitely for the failing and potentially subsequently aborted work
requests completion.
This fixes a case where immediate errors were overwritten by subsequent
code in siw_post_send().
Fixes: 303ae1cdfd ("rdma/siw: application interface")
Link: https://patch.msgid.link/r/20250923144536.103825-1-bernard.metzler@linux.dev
Suggested-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Bernard Metzler <bernard.metzler@linux.dev>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 57f55048e5 ]
Dirty page tracking relies on the IOMMU atomically updating the dirty bit
in the paging-structure entry. For this operation to succeed, the paging-
structure memory must be coherent between the IOMMU and the CPU. In
another word, if the iommu page walk is incoherent, dirty page tracking
doesn't work.
The Intel VT-d specification, Section 3.10 "Snoop Behavior" states:
"Remapping hardware encountering the need to atomically update A/EA/D bits
in a paging-structure entry that is not snooped will result in a non-
recoverable fault."
To prevent an IOMMU from being incorrectly configured for dirty page
tracking when it is operating in an incoherent mode, mark SSADS as
supported only when both ecap_slads and ecap_smpwc are supported.
Fixes: f35f22cc76 ("iommu/vt-d: Access/Dirty bit support for SS domains")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20250924083447.123224-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2bdf1d428f ]
R-Car V4H Reference Manual R19UH0186EJ0130 Rev.1.30 Apr. 21, 2025 page 4581
Figure 104.3b Initial Setting of PCIEC(example), third quarter of the
figure indicates that register 0xf8 should be polled until bit 18 becomes
set to 1.
Register 0xf8, bit 18 is 0 immediately after write to PCIERSTCTRL1 and is
set to 1 in less than 1 ms afterward. The current readl_poll_timeout()
break condition is inverted and returns when register 0xf8, bit 18 is set
to 0, which in most cases means immediately. In case
CONFIG_DEBUG_LOCK_ALLOC=y, the timing changes just enough for the first
readl_poll_timeout() poll to already read register 0xf8, bit 18 as 1 and
afterward never read register 0xf8, bit 18 as 0, which leads to timeout
and failure to start the PCIe controller.
Fix this by inverting the poll condition to match the reference manual
initialization sequence.
Fixes: faf5a975ee ("PCI: rcar-gen4: Add support for R-Car V4H")
Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20250915235910.47768-1-marek.vasut+renesas@mailbox.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0056d29f8c ]
Assure the reset is latched and the core is ready for DBI access. On R-Car
V4H, the PCIe reset is asynchronous and does not take effect immediately,
but needs a short time to complete. In case DBI access happens in that
short time, that access generates an SError. Make sure that condition can
never happen, read back the state of the reset, which should turn the
asynchronous reset into a synchronous one, and wait a little over 1ms to
add additional safety margin.
Fixes: 0d0c551011 ("PCI: rcar-gen4: Add R-Car Gen4 PCIe controller support for host mode")
Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20250924005610.96484-1-marek.vasut+renesas@mailbox.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8795b70581 ]
R-Car V4H Reference Manual R19UH0186EJ0130 Rev.1.30 Apr. 21, 2025 page 585
Figure 9.3.2 Software Reset flow (B) indicates that for peripherals in HSC
domain, after reset has been asserted by writing a matching reset bit into
register SRCR, it is mandatory to wait 1ms.
Because it is the controller driver which can determine whether or not the
controller is in HSC domain based on its compatible string, add the missing
delay in the controller driver.
This 1ms delay is documented on R-Car V4H and V4M; it is currently unclear
whether S4 is affected as well. This patch does apply the extra delay on
R-Car S4 as well.
Fixes: 0d0c551011 ("PCI: rcar-gen4: Add R-Car Gen4 PCIe controller support for host mode")
Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
[mani: added the missing r-b tag from Krzysztof]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Link: https://patch.msgid.link/20250919134644.208098-1-marek.vasut+renesas@mailbox.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e40b984b6c ]
The VHCI platform driver aims to forbid entering system suspend when at
least one of the virtual USB ports are bound to an active USB/IP
connection.
However, in some cases, the detection logic doesn't work reliably, i.e.
when all devices attached to the virtual root hub have been already
suspended, leading to a broken suspend state, with unrecoverable resume.
Ensure the virtually attached devices do not enter suspend by setting
the syscore PM flag. Note this is currently limited to the client side
only, since the server side doesn't implement system suspend prevention.
Fixes: 04679b3489 ("Staging: USB/IP: add client driver")
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250902-vhci-hcd-suspend-fix-v3-1-864e4e833559@collabora.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 09efbac953 ]
During a batch replay, the nlh pointer is not reset until the parsing of
the commands. Since commit bf2ac490d2 ("netfilter: nfnetlink: Handle
ACK flags for batch messages") that is problematic as the condition to
add an ACK for batch begin will evaluate to true even if NLM_F_ACK
wasn't used for batch begin message.
If there is an error during the command processing, netlink is sending
an ACK despite that. This misleads userspace tools which think that the
return code was 0. Reset the nlh pointer to the original one when a
replay is triggered.
Fixes: bf2ac490d2 ("netfilter: nfnetlink: Handle ACK flags for batch messages")
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 134121bfd9 ]
On the netns cleanup path, __ip_vs_ftp_exit() may unregister ip_vs_ftp
before connections with valid cp->app pointers are flushed, leading to a
use-after-free.
Fix this by introducing a global `exiting_module` flag, set to true in
ip_vs_ftp_exit() before unregistering the pernet subsystem. In
__ip_vs_ftp_exit(), skip ip_vs_ftp unregister if called during netns
cleanup (when exiting_module is false) and defer it to
__ip_vs_cleanup_batch(), which unregisters all apps after all connections
are flushed. If called during module exit, unregister ip_vs_ftp
immediately.
Fixes: 61b1ab4583 ("IPVS: netns, add basic init per netns.")
Suggested-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Slavin Liu <slavin452@gmail.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 191512355e ]
When the client max_resp_sz is larger than what the server encodes in
its reply, the nfs4_verify_back_channel_attrs() check fails and this
causes nfs4_proc_create_session() to fail, in cases where the client
page size is larger than that of the server and the server does not want
to negotiate upwards.
While this is not a problem with the linux nfs server that will reflect
the proposed value in its reply irrespective of the local page size,
other nfs server implementations may insist on their own max_resp_sz
value, which could be smaller.
Fix this by accepting smaller max_resp_sz values from the server, as
this does not violate the protocol. The server is allowed to decrease
but not increase proposed the size, and as such values smaller than the
client-proposed ones are valid.
Fixes: 43c2e885be ("nfs4: fix channel attribute sanity-checks")
Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 43e0a92c04 ]
Some TPDM devices support both CMB and DSB datasets, requiring
the system to enable the port with both corresponding element sizes.
Currently, the logic treats tpdm_read_element_size as successful if
the CMB element size is retrieved correctly, regardless of whether
the DSB element size is obtained. This behavior causes issues
when parsing data from TPDM devices that depend on both element sizes.
To address this, the function should explicitly fail if the DSB
element size cannot be read correctly.
Fixes: e6d7f5252f ("coresight-tpda: Add support to configure CMB element")
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Jie Gan <jie.gan@oss.qualcomm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20250906-fix_element_size_issue-v2-1-dbb0ac2541a9@oss.qualcomm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 40c0cdc9cb ]
The atclk is an optional clock for the CoreSight ETMv4, but the driver
misses to initialize it.
This change enables atclk in probe of the ETMv4 driver, and dynamically
control the clock during suspend and resume.
No need to check the driver data and clock pointer in the runtime
suspend and resume, so remove checks. And add error handling in the
resume function.
Add a minor fix to the comment format when adding the atclk field.
Fixes: 2e1cdfe184 ("coresight-etm4x: Adding CoreSight ETM4x driver")
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20250731-arm_cs_fix_clock_v4-v6-3-1dfe10bb3f6f@arm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8a79026926 ]
The atclk is an optional clock for the CoreSight TMC, but the driver
misses to initialize it. In most cases, TMC shares the atclk clock with
other CoreSight components. Since these components enable the clock
before the TMC device is initialized, the TMC continues properly,
which is why we don’t observe any lockup issues.
This change enables atclk in probe of the TMC driver. Given the clock
is optional, it is possible to return NULL if the clock does not exist.
IS_ERR() is tolerant for this case.
Dynamically disable and enable atclk during suspend and resume. The
clock pointers will never be error values if the driver has successfully
probed, and the case of a NULL pointer case will be handled by the clock
core layer. The driver data is always valid after probe. Therefore,
remove the related checks. Also in the resume flow adds error handling.
Fixes: bc4bf7fe98 ("coresight-tmc: add CoreSight TMC driver")
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20250731-arm_cs_fix_clock_v4-v6-1-1dfe10bb3f6f@arm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9ddf6d3fcb ]
The return values of VDO_ASSERT calls that validate metadata are not acted
upon.
Return UDS_CORRUPT_DATA in case of an error.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: a4eb7e2555 ("dm vdo: implement the volume index")
Signed-off-by: Ivan Abramov <i.abramov@mt-integration.ru>
Reviewed-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 110be46f5a ]
enable_irq() and disable_irq() are reference counted, so we must make sure
that each enable_irq() is always paired with a single disable_irq(). If we
call disable_irq() twice followed by just a single enable_irq(), the IRQ
will remain disabled forever.
For the error handling path in qcom_q6v5_wait_for_start(), disable_irq()
will end up being called twice, because disable_irq() also happens in
qcom_q6v5_unprepare() when rolling back the call to qcom_q6v5_prepare().
Fix this by dropping disable_irq() in qcom_q6v5_wait_for_start(). Since
qcom_q6v5_prepare() is the function that calls enable_irq(), it makes more
sense to have the rollback handled always by qcom_q6v5_unprepare().
Fixes: 3b415c8fb2 ("remoteproc: q6v5: Extract common resource handling")
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Stephan Gerhold <stephan.gerhold@linaro.org>
Link: https://lore.kernel.org/r/20250820-rproc-qcom-q6v5-fixes-v2-1-910b1a3aff71@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4f152338e3 ]
During PERST# assertion tegra_pcie_bpmp_set_pll_state() is currently
called twice.
pex_ep_event_pex_rst_assert() should do the opposite of
pex_ep_event_pex_rst_deassert(), so it is obvious that the duplicate
tegra_pcie_bpmp_set_pll_state() is a mistake, and that the duplicate
tegra_pcie_bpmp_set_pll_state() call should instead be a call to
tegra_pcie_bpmp_set_ctrl_state().
With this, the uninitialization sequence also matches that of
tegra_pcie_unconfig_controller().
Fixes: a54e190737 ("PCI: tegra194: Add Tegra234 PCIe support")
Signed-off-by: Nagarjuna Kristam <nkristam@nvidia.com>
[cassel: improve commit log]
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Link: https://patch.msgid.link/20250911093021.1454385-2-cassel@kernel.org
[mani: added Fixes tag]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 570f945117 ]
Lockdep gives a splat [1] when ser_hdl_work item is executed. It is
scheduled at mac80211 workqueue via ieee80211_queue_work() and takes a
wiphy lock inside. However, this workqueue can be flushed when e.g.
closing the interface and wiphy lock is already taken in that case.
Choosing wiphy_work_queue() for SER is likely not suitable. Back on to
the global workqueue.
[1]:
WARNING: possible circular locking dependency detected
6.17.0-rc2 #17 Not tainted
------------------------------------------------------
kworker/u32:1/61 is trying to acquire lock:
ffff88811bc00768 (&rdev->wiphy.mtx){+.+.}-{4:4}, at: ser_state_run+0x5e/0x180 [rtw89_core]
but task is already holding lock:
ffffc9000048fd30 ((work_completion)(&ser->ser_hdl_work)){+.+.}-{0:0}, at: process_one_work+0x7b5/0x1450
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 ((work_completion)(&ser->ser_hdl_work)){+.+.}-{0:0}:
process_one_work+0x7c6/0x1450
worker_thread+0x49e/0xd00
kthread+0x313/0x640
ret_from_fork+0x221/0x300
ret_from_fork_asm+0x1a/0x30
-> #1 ((wq_completion)phy0){+.+.}-{0:0}:
touch_wq_lockdep_map+0x8e/0x180
__flush_workqueue+0x129/0x10d0
ieee80211_stop_device+0xa8/0x110
ieee80211_do_stop+0x14ce/0x2880
ieee80211_stop+0x13a/0x2c0
__dev_close_many+0x18f/0x510
__dev_change_flags+0x25f/0x670
netif_change_flags+0x7b/0x160
do_setlink.isra.0+0x1640/0x35d0
rtnl_newlink+0xd8c/0x1d30
rtnetlink_rcv_msg+0x700/0xb80
netlink_rcv_skb+0x11d/0x350
netlink_unicast+0x49a/0x7a0
netlink_sendmsg+0x759/0xc20
____sys_sendmsg+0x812/0xa00
___sys_sendmsg+0xf7/0x180
__sys_sendmsg+0x11f/0x1b0
do_syscall_64+0xbb/0x360
entry_SYSCALL_64_after_hwframe+0x77/0x7f
-> #0 (&rdev->wiphy.mtx){+.+.}-{4:4}:
__lock_acquire+0x124c/0x1d20
lock_acquire+0x154/0x2e0
__mutex_lock+0x17b/0x12f0
ser_state_run+0x5e/0x180 [rtw89_core]
rtw89_ser_hdl_work+0x119/0x220 [rtw89_core]
process_one_work+0x82d/0x1450
worker_thread+0x49e/0xd00
kthread+0x313/0x640
ret_from_fork+0x221/0x300
ret_from_fork_asm+0x1a/0x30
other info that might help us debug this:
Chain exists of:
&rdev->wiphy.mtx --> (wq_completion)phy0 --> (work_completion)(&ser->ser_hdl_work)
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock((work_completion)(&ser->ser_hdl_work));
lock((wq_completion)phy0);
lock((work_completion)(&ser->ser_hdl_work));
lock(&rdev->wiphy.mtx);
*** DEADLOCK ***
2 locks held by kworker/u32:1/61:
#0: ffff888103835148 ((wq_completion)phy0){+.+.}-{0:0}, at: process_one_work+0xefa/0x1450
#1: ffffc9000048fd30 ((work_completion)(&ser->ser_hdl_work)){+.+.}-{0:0}, at: process_one_work+0x7b5/0x1450
stack backtrace:
CPU: 0 UID: 0 PID: 61 Comm: kworker/u32:1 Not tainted 6.17.0-rc2 #17 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS edk2-20250523-14.fc42 05/23/2025
Workqueue: phy0 rtw89_ser_hdl_work [rtw89_core]
Call Trace:
<TASK>
dump_stack_lvl+0x5d/0x80
print_circular_bug.cold+0x178/0x1be
check_noncircular+0x14c/0x170
__lock_acquire+0x124c/0x1d20
lock_acquire+0x154/0x2e0
__mutex_lock+0x17b/0x12f0
ser_state_run+0x5e/0x180 [rtw89_core]
rtw89_ser_hdl_work+0x119/0x220 [rtw89_core]
process_one_work+0x82d/0x1450
worker_thread+0x49e/0xd00
kthread+0x313/0x640
ret_from_fork+0x221/0x300
ret_from_fork_asm+0x1a/0x30
</TASK>
Found by Linux Verification Center (linuxtesting.org).
Fixes: ebfc9199df ("wifi: rtw89: add wiphy_lock() to work that isn't held wiphy_lock() yet")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20250919210852.823912-5-pchelkin@ispras.ru
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8ca7eada62 ]
When do_task() exhausts its iteration budget (!ret), it sets the state
to TASK_STATE_IDLE to reschedule, without a secondary check on the
current task->state. This can overwrite the TASK_STATE_DRAINING state
set by a concurrent call to rxe_cleanup_task() or rxe_disable_task().
While state changes are protected by a spinlock, both rxe_cleanup_task()
and rxe_disable_task() release the lock while waiting for the task to
finish draining in the while(!is_done(task)) loop. The race occurs if
do_task() hits its iteration limit and acquires the lock in this window.
The cleanup logic may then proceed while the task incorrectly
reschedules itself, leading to a potential use-after-free.
This bug was introduced during the migration from tasklets to workqueues,
where the special handling for the draining case was lost.
Fix this by restoring the original pre-migration behavior. If the state is
TASK_STATE_DRAINING when iterations are exhausted, set cont to 1 to
force a new loop iteration. This allows the task to finish its work, so
that a subsequent iteration can reach the switch statement and correctly
transition the state to TASK_STATE_DRAINED, stopping the task as intended.
Fixes: 9b4b7c1f9f ("RDMA/rxe: Add workqueue support for rxe tasks")
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Gui-Dong Han <hanguidong02@gmail.com>
Link: https://patch.msgid.link/20250919025212.1682087-1-hanguidong02@gmail.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f0cafb02de ]
When the initialization of qm->debug.acc_diff_reg fails,
the probe process does not exit. However, after qm->debug.qm_diff_regs is
freed, it is not set to NULL. This can lead to a double free when the
remove process attempts to free it again. Therefore, qm->debug.qm_diff_regs
should be set to NULL after it is freed.
Fixes: 8be0913389 ("crypto: hisilicon/debugfs - Fix debugfs uninit process issue")
Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit acb59a4bb8 ]
host_seq_bmp is allocated with vzalloc but is currently freed with
bitmap_free, which uses kfree internally. This mismach prevents the
resource from being released properly and may result in memory leaks
or other issues.
Fix this by freeing host_seq_bmp with vfree to match the vzalloc
allocation.
Fixes: f232836a91 ("vfio/pds: Add support for dirty page tracking")
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Link: https://lore.kernel.org/r/20250913153154.1028835-1-zilin@seu.edu.cn
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5a746c1a2c ]
The referenced commit introduced exception handlers on user-space memory
references in copy_from_user and copy_to_user. These handlers return from
the respective function and calculate the remaining bytes left to copy
using the current register contents. This commit fixes a bad calculation.
This will fix the return value of copy_to_user in a specific faulting case.
The behaviour of memcpy stays unchanged.
Fixes: 9570770480 ("sparc64: Convert NG4copy_{from,to}_user to accurate exception reporting.")
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> # on Oracle SPARC T4-1
Signed-off-by: Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de>
Reviewed-by: Andreas Larsson <andreas@gaisler.com>
Link: https://lore.kernel.org/r/20250905-memcpy_series-v4-4-1ca72dda195b@mkarcher.dialup.fu-berlin.de
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0b67c8fc10 ]
The referenced commit introduced exception handlers on user-space memory
references in copy_from_user and copy_to_user. These handlers return from
the respective function and calculate the remaining bytes left to copy
using the current register contents. This commit fixes a couple of bad
calculations and a broken epilogue in the exception handlers. This will
prevent crashes and ensure correct return values of copy_from_user and
copy_to_user in the faulting case. The behaviour of memcpy stays unchanged.
Fixes: 7ae3aaf53f ("sparc64: Convert NGcopy_{from,to}_user to accurate exception reporting.")
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> # on SPARC T4 with modified kernel to use Niagara 1 code
Tested-by: Magnus Lindholm <linmag7@gmail.com> # on Sun Fire T2000
Signed-off-by: Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de>
Tested-by: Ethan Hawke <ehawk@ember.systems> # on Sun Fire T2000
Tested-by: Ken Link <iissmart@numberzero.org> # on Sun Fire T1000
Reviewed-by: Andreas Larsson <andreas@gaisler.com>
Link: https://lore.kernel.org/r/20250905-memcpy_series-v4-3-1ca72dda195b@mkarcher.dialup.fu-berlin.de
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 47b49c06eb ]
Anthony Yznaga tracked down that a BUG_ON in ext4 code with large folios
enabled resulted from copy_from_user() returning impossibly large values
greater than the size to be copied. This lead to __copy_from_iter()
returning impossible values instead of the actual number of bytes it was
able to copy.
The BUG_ON has been reported in
https://lore.kernel.org/r/b14f55642207e63e907965e209f6323a0df6dcee.camel@physik.fu-berlin.de
The referenced commit introduced exception handlers on user-space memory
references in copy_from_user and copy_to_user. These handlers return from
the respective function and calculate the remaining bytes left to copy
using the current register contents. The exception handlers expect that
%o2 has already been masked during the bulk copy loop, but the masking was
performed after that loop. This will fix the return value of copy_from_user
and copy_to_user in the faulting case. The behaviour of memcpy stays
unchanged.
Fixes: ee841d0aff ("sparc64: Convert U3copy_{from,to}_user to accurate exception reporting.")
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> # on Sun Netra 240
Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Tested-by: René Rebe <rene@exactcode.com> # on UltraSparc III+ and UltraSparc IIIi
Signed-off-by: Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de>
Reviewed-by: Andreas Larsson <andreas@gaisler.com>
Link: https://lore.kernel.org/r/20250905-memcpy_series-v4-2-1ca72dda195b@mkarcher.dialup.fu-berlin.de
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4fba171300 ]
The referenced commit introduced exception handlers on user-space memory
references in copy_from_user and copy_to_user. These handlers return from
the respective function and calculate the remaining bytes left to copy
using the current register contents. This commit fixes a couple of bad
calculations. This will fix the return value of copy_from_user and
copy_to_user in the faulting case. The behaviour of memcpy stays unchanged.
Fixes: cb736fdbb2 ("sparc64: Convert U1copy_{from,to}_user to accurate exception reporting.")
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> # on QEMU 10.0.3
Tested-by: René Rebe <rene@exactcode.com> # on Ultra 5 UltraSparc IIi
Tested-by: Jonathan 'theJPster' Pallant <kernel@thejpster.org.uk> # on Sun Netra T1
Signed-off-by: Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de>
Reviewed-by: Andreas Larsson <andreas@gaisler.com>
Link: https://lore.kernel.org/r/20250905-memcpy_series-v4-1-1ca72dda195b@mkarcher.dialup.fu-berlin.de
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 87cab86925 ]
In create_sdw_dailink() check that sof_end->codec_info->add_sidecar
is not NULL before calling it.
The original code assumed that if include_sidecar is true, the codec
on that link has an add_sidecar callback. But there could be other
codecs on the same link that do not have an add_sidecar callback.
Fixes: da52441802 ("ASoC: Intel: sof_sdw: Add callbacks to register sidecar devices")
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://patch.msgid.link/20250919140235.1071941-1-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 32d340ae67 ]
In ieee80211_rx_handle_packet(), if the caller does not provide pubsta
information, an attempt is made to find the station using the address 2
(source address) field in the header. Since pubsta is missing, link
information such as link_valid and link_id is also unavailable. Now if such
a situation comes, and if a matching ML station entry is found based on
the source address, currently the packet is dropped due to missing link ID
in the status field which is not correct.
Hence, to fix this issue, if link_valid is not set and the station is an
ML station, make an attempt to find a link station entry using the source
address. If a valid link station is found, derive the link ID and proceed
with packet processing. Otherwise, drop the packet as per the existing
flow.
Fixes: ea9d807b56 ("wifi: mac80211: add link information in ieee80211_rx_status")
Suggested-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com>
Signed-off-by: Aditya Kumar Singh <aditya.kumar.singh@oss.qualcomm.com>
Link: https://patch.msgid.link/20250917-fix_data_packet_rx_with_mlo_and_no_pubsta-v1-1-8cf971a958ac@oss.qualcomm.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 200651b9b8 ]
Currently, if the next-hop netdevice does not support ARP resolution,
the destination MAC address is silently set to zero without reporting
an error. This leads to incorrect behavior and may result in packet
transmission failures.
Fix this by deferring MAC resolution to the IP stack via neighbour
lookup, allowing proper resolution or error reporting as appropriate.
Fixes: 7025fcd36b ("IB: address translation to map IP toIB addresses (GIDs)")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Link: https://patch.msgid.link/20250916111103.84069-3-edwards@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 08fa726e66 ]
This reverts commit 28a76fcc4c.
No actual HW bugs are known where Endpoint Context shows Running state
but Stop Endpoint fails repeatedly with Context State Error and leaves
the endpoint state unchanged. Stop Endpoint retries on Running EPs have
been performed since early 2021 with no such issues reported so far.
Trying to handle this hypothetical case brings a more realistic danger:
if Stop Endpoint fails on an endpoint which hasn't yet started after a
doorbell ring and enough latency occurs before this completion event is
handled, the driver may time out and begin removing cancelled TDs from
a running endpoint, even though one more retry would stop it reliably.
Such high latency is rare but not impossible, and removing TDs from a
running endpoint can cause more damage than not giving back a cancelled
URB (which wasn't happening anyway). So err on the side of caution and
revert to the old policy of always retrying if the EP appears running.
[Remove stable tag as we are dealing with theoretical cases -Mathias]
Fixes: 28a76fcc4c ("usb: xhci: Avoid Stop Endpoint retry loop if the endpoint seems Running")
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20250917210726.97100-2-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 75d5546f60 ]
The handling for variable-length ioctl commands in hidraw_ioctl() is
rather complex and the check for the data direction is incomplete.
Simplify this code by factoring out the various ioctls grouped by dir
and size, and using a switch() statement with the size masked out, to
ensure the rest of the command is correctly matched.
Fixes: 9188e79ec3 ("HID: add phys and name ioctls to hidraw")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9877c004e9 ]
Change the error code EAGAIN to -EAGAIN in qla_nvme_xmt_ls_rsp() to
align with qla2x00_start_sp() returning negative error codes or
QLA_SUCCESS, preventing logical errors.
Fixes: 875386b988 ("scsi: qla2xxx: Add Unsolicited LS Request and Response Support for NVMe")
Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Message-ID: <20250905075446.381139-4-rongqianfeng@vivo.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1f037e3acd ]
Change the error code EAGAIN to -EAGAIN in START_SP_W_RETRIES() to align
with qla2x00_start_sp() returning negative error codes or QLA_SUCCESS,
preventing logical errors. Additionally, the '_rval' variable should
store negative error codes to conform to Linux kernel error code
conventions.
Fixes: 9803fb5d27 ("scsi: qla2xxx: Fix task management cmd failure")
Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Message-ID: <20250905075446.381139-3-rongqianfeng@vivo.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 066b8f3fa8 ]
Change the error code EAGAIN to -EAGAIN in qla24xx_sadb_update() and
qla_edif_process_els() to align with qla2x00_start_sp() returning
negative error codes or QLA_SUCCESS, preventing logical errors.
Fixes: 0b3f3143d4 ("scsi: qla2xxx: edif: Add retry for ELS passthrough")
Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Message-ID: <20250905075446.381139-2-rongqianfeng@vivo.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d1a599a813 ]
There appears to be a cut-n-paste error with the incorrect field
ndr_desc->numa_node being reported for the target node. Fix this by
using ndr_desc->target_node instead.
Fixes: f060db9937 ("ACPI: NFIT: Use fallback node id when numa info in NFIT table is incorrect")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9fc4a3da9a ]
snd_pcm_group_lock_irq() acquires a spinlock_t and disables interrupts
via spin_lock_irq(). This also implicitly disables the handling of
softirqs such as TIMER_SOFTIRQ.
On PREEMPT_RT softirqs are preemptible and spin_lock_irq() does not
disable them. That means a timer can be invoked during spin_lock_irq()
on the same CPU. Due to synchronisations reasons local_bh_disable() has
a per-CPU lock named softirq_ctrl.lock which synchronizes individual
softirq against each other.
syz-bot managed to trigger a lockdep report where softirq_ctrl.lock is
acquired in hrtimer_cancel() in addition to hrtimer_run_softirq(). This
is a possible deadlock.
The softirq_ctrl.lock can not be made part of spin_lock_irq() as this
would lead to too much synchronisation against individual threads on the
system. To avoid the possible deadlock, softirqs must be manually
disabled before the lock is acquired.
Disable softirqs before the lock is acquired on PREEMPT_RT.
Reported-by: syzbot+10b4363fb0f46527f3f3@syzkaller.appspotmail.com
Fixes: d2d6422f8b ("x86: Allow to enable PREEMPT_RT.")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c2f7c32b25 ]
f2fs_zero_post_eof_page() may cuase more overhead due to invalidate_lock
and page lookup, change as below to mitigate its overhead:
- check new_size before grabbing invalidate_lock
- lookup and invalidate pages only in range of [old_size, new_size]
Fixes: ba8dac350f ("f2fs: fix to zero post-eof page")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2b660ee10a ]
In vendor driver, size of group cal and dpd cal for mt7981 includes 6G
although the chip doesn't support it.
mt76 doesn't take this into account which results in reading from the
incorrect offset.
For devices with precal, this would lead to lower bitrate.
Fix this by aligning groupcal size with vendor driver and switch to
freq_list_v2 in mt7915_dpd_freq_idx in order to get the correct offset.
Below are iwinfo of the test device with two clients connected
(iPhone 16, Intel AX210).
Before :
Mode: Master Channel: 36 (5.180 GHz) HT Mode: HE80
Center Channel 1: 42 2: unknown
Tx-Power: 23 dBm Link Quality: 43/70
Signal: -67 dBm Noise: -92 dBm
Bit Rate: 612.4 MBit/s
Encryption: WPA3 SAE (CCMP)
Type: nl80211 HW Mode(s): 802.11ac/ax/n
Hardware: embedded [MediaTek MT7981]
After:
Mode: Master Channel: 36 (5.180 GHz) HT Mode: HE80
Center Channel 1: 42 2: unknown
Tx-Power: 23 dBm Link Quality: 43/70
Signal: -67 dBm Noise: -92 dBm
Bit Rate: 900.6 MBit/s
Encryption: WPA3 SAE (CCMP)
Type: nl80211 HW Mode(s): 802.11ac/ax/n
Hardware: embedded [MediaTek MT7981]
Tested-on: mt7981 20240823
Fixes: 19a954edec ("wifi: mt76: mt7915: add mt7986, mt7916 and mt7981 pre-calibration")
Signed-off-by: Zhi-Jun You <hujy652@gmail.com>
Link: https://patch.msgid.link/20250909064824.16847-1-hujy652@gmail.com
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7dfd80f70e ]
When the watchdog gets enabled with this driver, it leaves enough time
for the core watchdog subsystem to start pinging it. But when the
watchdog is already started by hardware or by the boot loader, little
time remains before it fires and it happens that the core watchdog
subsystem doesn't have time to start pinging it.
Until commit 19ce9490aa ("watchdog: mpc8xxx: use the core worker
function") pinging was managed by the driver itself and the watchdog
was immediately pinged by setting the timer expiry to 0.
So restore similar behaviour by pinging it when enabling it so that
if it was already enabled the watchdog timer counter is reloaded.
Fixes: 19ce9490aa ("watchdog: mpc8xxx: use the core worker function")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 944b6b216c ]
KCSAN reported a data-race on the `ipvs->enable` flag, which is
written in the control path and read concurrently from many other
contexts.
Following a suggestion by Julian, this patch fixes the race by
converting all accesses to use `WRITE_ONCE()/READ_ONCE()`.
This lightweight approach ensures atomic access and acts as a
compiler barrier, preventing unsafe optimizations where the flag
is checked in loops (e.g., in ip_vs_est.c).
Additionally, the `enable` checks in the fast-path hooks
(`ip_vs_in_hook`, `ip_vs_out_hook`, `ip_vs_forward_icmp`) are
removed. These are unnecessary since commit 857ca89711
("ipvs: register hooks only with services"). The `enable=0`
condition they check for can only occur in two rare and non-fatal
scenarios: 1) after hooks are registered but before the flag is set,
and 2) after hooks are unregistered on cleanup_net. In the worst
case, a single packet might be mishandled (e.g., dropped), which
does not lead to a system crash or data corruption. Adding a check
in the performance-critical fast-path to handle this harmless
condition is not a worthwhile trade-off.
Fixes: 857ca89711 ("ipvs: register hooks only with services")
Reported-by: syzbot+1651b5234028c294c339@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=1651b5234028c294c339
Suggested-by: Julian Anastasov <ja@ssi.bg>
Link: https://lore.kernel.org/lvs-devel/2189fc62-e51e-78c9-d1de-d35b8e3657e3@ssi.bg/
Signed-off-by: Zhang Tengfei <zhtfdev@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ba941796d7 ]
Since the ahash_region() macro was redefined to calculate the region
index solely from HTABLE_REGION_BITS, the htable_bits parameter became
unused.
Remove the unused htable_bits argument and its call sites, simplifying
the code without changing semantics.
Fixes: 8478a729c0 ("netfilter: ipset: fix region locking in hash types")
Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
Reviewed-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0f85406bf8 ]
There is an issue with the handling of negative channel scales
in iio_convert_raw_to_processed_unlocked() when the channel-scale
is of the IIO_VAL_INT_PLUS_[MICRO|NANO] type:
Things work for channel-scale values > -1.0 and < 0.0 because of
the use of signed values in:
*processed += div_s64(raw64 * (s64)scale_val2 * scale, 1000000LL);
Things will break however for scale values < -1.0. Lets for example say
that raw = 2, (caller-provided)scale = 10 and (channel)scale_val = -1.5.
The result should then be 2 * 10 * -1.5 = -30.
channel-scale = -1.5 means scale_val = -1 and scale_val2 = 500000,
now lets see what gets stored in processed:
1. *processed = raw64 * scale_val * scale;
2. *processed += raw64 * scale_val2 * scale / 1000000LL;
1. Sets processed to 2 * -1 * 10 = -20
2. Adds 2 * 500000 * 10 / 1000000 = 10 to processed
And the end result is processed = -20 + 10 = -10, which is not correct.
Fix this by always using the abs value of both scale_val and scale_val2
and if either is negative multiply the end-result by -1.
Note there seems to be an unwritten rule about negative
IIO_VAL_INT_PLUS_[MICRO|NANO] values that:
i. values > -1.0 and < 0.0 are written as val=0 val2=-xxx
ii. values <= -1.0 are written as val=-xxx val2=xxx
But iio_format_value() will also correctly display a third option:
iii. values <= -1.0 written as val=-xxx val2=-xxx
Since iio_format_value() uses abs(val) when val2 < 0.
This fix also makes iio_convert_raw_to_processed() properly handle
channel-scales using this third option.
Fixes: 48e44ce0f8 ("iio:inkern: Add function to read the processed value")
Cc: Matteo Martelli <matteomartelli3@gmail.com>
Reviewed-by: Andy Shevchenko <andy@kernel.org>
Signed-off-by: Hans de Goede <hansg@kernel.org>
Link: https://patch.msgid.link/20250831104825.15097-2-hansg@kernel.org
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0dc7117da8 ]
Index allocation requires at least one bit in the $BITMAP attribute to
track usage of index entries. If the bitmap is empty while index blocks
are already present, this reflects on-disk corruption.
syzbot triggered this condition using a malformed NTFS image. During a
rename() operation involving a long filename (which spans multiple
index entries), the empty bitmap allowed the name to be added without
valid tracking. Subsequent deletion of the original entry failed with
-ENOENT, due to unexpected index state.
Reject such cases by verifying that the bitmap is not empty when index
blocks exist.
Reported-by: syzbot+b0373017f711c06ada64@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b0373017f711c06ada64
Fixes: d99208b919 ("fs/ntfs3: cancle set bad inode after removing name fails")
Tested-by: syzbot+b0373017f711c06ada64@syzkaller.appspotmail.com
Signed-off-by: Moon Hee Lee <moonhee.lee.ca@gmail.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 736fc7bf5f ]
The MFT record relative to the file being opened contains its runlist,
an array containing information about the file's location on the physical
disk. Analysis of all Call Stack paths showed that the values of the
runlist array, from which LCNs are calculated, are not validated before
run_unpack function.
The run_unpack function decodes the compressed runlist data format
from MFT attributes (for example, $DATA), converting them into a runs_tree
structure, which describes the mapping of virtual clusters (VCN) to
logical clusters (LCN). The NTFS3 subsystem also has a shortcut for
deleting files from MFT records - in this case, the RUN_DEALLOCATE
command is sent to the run_unpack input, and the function logic
provides that all data transferred to the runlist about file or
directory is deleted without creating a runs_tree structure.
Substituting the runlist in the $DATA attribute of the MFT record for an
arbitrary file can lead either to access to arbitrary data on the disk
bypassing access checks to them (since the inode access check
occurs above) or to destruction of arbitrary data on the disk.
Add overflow check for addition operation.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 4342306f0f ("fs/ntfs3: Add file operations and implementation")
Signed-off-by: Vitaly Grigoryev <Vitaly.Grigoryev@kaspersky.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4336efb59e ]
When an invalid value is passed via quirk option, currently
bytcr_rt5640 driver just ignores and leaves as is, which may lead to
unepxected results like OOB access.
This patch adds the sanity check and corrects the input mapping to the
certain default value if an invalid value is passed.
Fixes: 64484ccee7 ("ASoC: Intel: bytcr_rt5651: Set card long_name based on quirks")
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Message-ID: <20250902171826.27329-4-tiwai@suse.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fba404e4b4 ]
When an invalid value is passed via quirk option, currently
bytcr_rt5640 driver only shows an error message but leaves as is.
This may lead to unepxected results like OOB access.
This patch corrects the input mapping to the certain default value if
an invalid value is passed.
Fixes: 063422ca2a ("ASoC: Intel: bytcr_rt5640: Set card long_name based on quirks")
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Message-ID: <20250902171826.27329-3-tiwai@suse.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b20eb0e8de ]
When an invalid value is passed via quirk option, currently
bytcht_es8316 driver just ignores and leaves as is, which may lead to
unepxected results like OOB access.
This patch adds the sanity check and corrects the input mapping to the
certain default value if an invalid value is passed.
Fixes: 249d2fc9e5 ("ASoC: Intel: bytcht_es8316: Set card long_name based on quirks")
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Message-ID: <20250902171826.27329-2-tiwai@suse.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c20edbacc0 ]
No idea what the current barrier position was meant for. At that point,
nothing is read from the descriptor, only the pointer to the actual one
is fetched.
The correct barrier usage here is after the generation check, so that
only the first qword is read if the descriptor is not yet ready and we
need to stop polling. Debatable on coherent DMA as the Rx descriptor
size is <= cacheline size, but anyway, the current barrier position
only makes the codegen worse.
Fixes: 3a8845af66 ("idpf: add RX splitq napi poll support")
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Tested-by: Ramu R <ramu.r@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b0531cdba5 ]
Similar to previous commit 2a934fdb01 ("media: v4l2-dev: fix error
handling in __video_register_device()"), the release hook should be set
before device_register(). Otherwise, when device_register() return error
and put_device() try to callback the release function, the below warning
may happen.
------------[ cut here ]------------
WARNING: CPU: 1 PID: 4760 at drivers/base/core.c:2567 device_release+0x1bd/0x240 drivers/base/core.c:2567
Modules linked in:
CPU: 1 UID: 0 PID: 4760 Comm: syz.4.914 Not tainted 6.17.0-rc3+ #1 NONE
RIP: 0010:device_release+0x1bd/0x240 drivers/base/core.c:2567
Call Trace:
<TASK>
kobject_cleanup+0x136/0x410 lib/kobject.c:689
kobject_release lib/kobject.c:720 [inline]
kref_put include/linux/kref.h:65 [inline]
kobject_put+0xe9/0x130 lib/kobject.c:737
put_device+0x24/0x30 drivers/base/core.c:3797
pps_register_cdev+0x2da/0x370 drivers/pps/pps.c:402
pps_register_source+0x2f6/0x480 drivers/pps/kapi.c:108
pps_tty_open+0x190/0x310 drivers/pps/clients/pps-ldisc.c:57
tty_ldisc_open+0xa7/0x120 drivers/tty/tty_ldisc.c:432
tty_set_ldisc+0x333/0x780 drivers/tty/tty_ldisc.c:563
tiocsetd drivers/tty/tty_io.c:2429 [inline]
tty_ioctl+0x5d1/0x1700 drivers/tty/tty_io.c:2728
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:598 [inline]
__se_sys_ioctl fs/ioctl.c:584 [inline]
__x64_sys_ioctl+0x194/0x210 fs/ioctl.c:584
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x5f/0x2a0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x76/0x7e
</TASK>
Before commit c79a39dc8d ("pps: Fix a use-after-free"),
pps_register_cdev() call device_create() to create pps->dev, which will
init dev->release to device_create_release(). Now the comment is outdated,
just remove it.
Thanks for the reminder from Calvin Owens, 'kfree_pps' should be removed
in pps_register_source() to avoid a double free in the failure case.
Link: https://lore.kernel.org/all/20250827065010.3208525-1-wangliang74@huawei.com/
Fixes: c79a39dc8d ("pps: Fix a use-after-free")
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Reviewed-By: Calvin Owens <calvin@wbinvd.org>
Link: https://lore.kernel.org/r/20250830075023.3498174-1-wangliang74@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3cf0b3c243 ]
Currently gsm_queue() processes incoming frames and when opening
a DLC channel it calls gsm_dlci_open() which calls gsm_modem_update().
If basic mode is used it calls gsm_modem_upd_via_msc() and it
cannot block the input queue by waiting the response to come
into the same input queue.
Instead allow sending Modem Status Command without waiting for remote
end to respond. Define a new function gsm_modem_send_initial_msc()
for this purpose. As MSC is only valid for basic encoding, it does
not do anything for advanced or when convergence layer type 2 is used.
Fixes: 4847380250 ("tty: n_gsm: fix missing update of modem controls after DLCI open")
Signed-off-by: Seppo Takalo <seppo.takalo@nordicsemi.no>
Link: https://lore.kernel.org/r/20250827123221.1148666-1-seppo.takalo@nordicsemi.no
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e271cc0d25 ]
Once the use_os_string flag is set to true for some functions
(e.g. adb/mtp) which need to response the OS string, and then
if we re-bind the ConfigFS gadget to use the other functions
(e.g. hid) which should not to response the OS string, however,
because the use_os_string flag is still true, so the usb gadget
response the OS string descriptor incorrectly, this can cause
the USB device to be unrecognizable on the Windows system.
An example of this as follows:
echo 1 > os_desc/use
ln -s functions/ffs.adb configs/b.1/function0
start adbd
echo "<udc device>" > UDC #succeed
stop adbd
rm configs/b.1/function0
echo 0 > os_desc/use
ln -s functions/hid.gs0 configs/b.1/function0
echo "<udc device>" > UDC #fail to connect on Windows
This patch sets the use_os_string flag to false at bind if
the functions not support OS Descriptors.
Signed-off-by: William Wu <william.wu@rock-chips.com>
Fixes: 87213d388e ("usb: gadget: configfs: OS String support")
Link: https://lore.kernel.org/r/1755833769-25434-1-git-send-email-william.wu@rock-chips.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cbda64f3f5 ]
Use negative error code -EINVAL instead of positive EINVAL in the default
case of svm_ioctl() to conform to Linux kernel error code conventions.
Fixes: 42de677f79 ("drm/amdkfd: register svm range")
Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e1a8805e5d ]
Fix incorrect argument order in devm_kcalloc() when allocating port->phys.
The original call used sizeof(phy) as the number of elements and
port->lanes as the element size, which is reversed. While this happens to
produce the correct total allocation size with current pointer size and
lane counts, the argument order is wrong.
Fixes: 6fe7c187e0 ("PCI: tegra: Support per-lane PHYs")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
[mani: added Fixes tag]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250819150436.3105973-1-alok.a.tiwari@oracle.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7009e3af04 ]
Port of commit 227545b9a0 ("drm/radeon/dpm: Disable sclk
switching on Oland when two 4K 60Hz monitors are connected")
This is an ad-hoc DPM fix, necessary because we don't have
proper bandwidth calculation for DCE 6.
We define "high pixelclock" for SI as higher than necessary
for 4K 30Hz. For example, 4K 60Hz and 1080p 144Hz fall into
this category.
When two high pixel clock displays are connected to Oland,
additionally disable shader clock switching, which results in
a higher voltage, thereby addressing some visible flickering.
v2:
Add more comments.
v3:
Split into two commits for easier review.
Fixes: 841686df9f ("drm/amdgpu: add SI DPM support (v4)")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ed3803533c ]
According to pp_pm_compute_clocks the non-DC display code
has "issues with mclk switching with refresh rates over 120 hz".
The workaround is to disable MCLK switching in this case.
Do the same for legacy DPM.
Fixes: 6ddbd37f10 ("drm/amd/pm: optimize the amdgpu_pm_compute_clocks() implementations")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9003a07468 ]
Some parts of the code base expect that MCLK switching is turned
off when the vblank time is set to zero.
According to pp_pm_compute_clocks the non-DC code has issues
with MCLK switching with refresh rates over 120 Hz.
v3:
Add code comment to explain this better.
Add an if statement instead of changing the switch_limit.
Fixes: 841686df9f ("drm/amdgpu: add SI DPM support (v4)")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ce02513012 ]
Based on some comments in dm_pp_display_configuration
above the crtc_index and line_time fields, these values
are programmed to the SMC to work around an SMC hang
when it switches MCLK.
According to Alex, the Windows driver programs them to:
mclk_change_block_cp_min = 200 / line_time
mclk_change_block_cp_max = 100 / line_time
Let's use the same for the sake of consistency.
Previously we used the watermark values, but it seemed buggy
as the code was mixing up low/high and A/B watermarks, and
was not saving a low watermark value on DCE 6, so
mclk_change_block_cp_max would be always zero previously.
Split this change off from the previous si_upload_smc_data
to make it easier to bisect, in case it causes any issues.
Fixes: 841686df9f ("drm/amdgpu: add SI DPM support (v4)")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a43b2cec04 ]
The si_upload_smc_data function uses si_write_smc_soft_register
to set some register values in the SMC, and expects the result
to be PPSMC_Result_OK which is 1.
The PPSMC_Result_OK / PPSMC_Result_Failed values are used for
checking the result of a command sent to the SMC.
However, the si_write_smc_soft_register actually doesn't send
any commands to the SMC and returns zero on success,
so this check was incorrect.
Fix that by not checking the return value, just like other
calls to si_write_smc_soft_register.
v3:
Additionally, when no display is plugged in, there is no need
to restrict MCLK switching, so program the registers to zero.
Fixes: 841686df9f ("drm/amdgpu: add SI DPM support (v4)")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3a0c3a4035 ]
Always send PPSMC_MSG_DisableULV to the SMC, even if ULV mode
is unsupported, to make sure it is properly turned off.
v3:
Simplify si_disable_ulv further.
Always check the return value of amdgpu_si_send_msg_to_smc.
Fixes: 841686df9f ("drm/amdgpu: add SI DPM support (v4)")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c661219cd7 ]
Unlike later versions, UVD 3 has firmware validation.
For this to work, the UVD should be powered up correctly.
When DPM is enabled and the display clock is off,
the SMU may choose a power state which doesn't power
the UVD, which can result in failure to initialize UVD.
v2:
Add code comments to explain about the UVD power state
and how UVD clock is turned on/off.
Fixes: b38f3e80ec ("drm amdgpu: SI UVD v3_1 (v2)")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 12d9a9dd9d ]
Ensure that etm_perf_add_symlink_sink() is only called for devices
that implement the alloc_buffer operation. This prevents invalid
symlink creation for dummy sinks that do not implement alloc_buffer.
Without this check, perf may attempt to use a dummy sink that lacks
alloc_buffer operationsu to initialise perf's ring buffer, leading
to runtime failures.
Fixes: 9d3ba0b6c0 ("Coresight: Add coresight dummy driver")
Signed-off-by: Yuanfang Zhang <quic_yuanfang@quicinc.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20250630-etm_perf_sink-v1-1-e4a7211f9ad7@quicinc.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6a2c9164b5 ]
Function rate limiting is set through physical function driver.
Users configure by providing function information and rate limit values.
Before configuration, it is necessary to check whether the
provided function and PF belong to the same device.
Fixes: 22d7a6c39c ("crypto: hisilicon/qm - add pci bdf number check")
Signed-off-by: Zhushuai Yin <yinzhushuai@huawei.com>
Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0dcd21443d ]
When the device resumes from a suspended state, it will revert to its
initial state and requires re-enabling. Currently, the address prefetch
function is not re-enabled after device resuming. Move the address prefetch
enable to the initialization process. In this way, the address prefetch
can be enabled when the device resumes by calling the initialization
process.
Fixes: 607c191b37 ("crypto: hisilicon - support runtime PM for accelerator device")
Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d4e0815104 ]
When configuring the high-performance mode register, there is no
need to verify whether the register has been successfully
enabled, as there is no possibility of a write failure for this
register.
Fixes: a9864bae18 ("crypto: hisilicon/zip - add zip comp high perf mode configuration")
Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5954ad7d1a ]
Building with a reduced stack warning limit shows that delta_mjpeg_decode()
copies a giant structure to the stack each time but only uses three of
its members:
drivers/media/platform/st/sti/delta/delta-mjpeg-dec.c: In function 'delta_mjpeg_decode':
drivers/media/platform/st/sti/delta/delta-mjpeg-dec.c:427:1: error: the frame size of 1296 bytes is larger than 1280 bytes [-Werror=frame-larger-than=]
Open-code the passing of the structure members that are actually used here.
Fixes: 433ff5b4a2 ("[media] st-delta: add mjpeg support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4ef353d546 ]
Change the 'ret' variable from u16 to int to store negative error codes or
zero returned by lx_message_send_atomic().
Storing the negative error codes in unsigned type, doesn't cause an issue
at runtime but it's ugly as pants. Additionally, assigning negative error
codes to unsigned type may trigger a GCC warning when the -Wsign-conversion
flag is enabled.
No effect on runtime.
Fixes: 02bec49045 ("ALSA: lx6464es - driver for the digigram lx6464es interface")
Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Link: https://patch.msgid.link/20250828081312.393148-1-rongqianfeng@vivo.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1a7c18c485 ]
The mlx5 driver currently derives max_qp_wr directly from the
log_max_qp_sz HCA capability:
props->max_qp_wr = 1 << MLX5_CAP_GEN(mdev, log_max_qp_sz);
However, this value represents the number of WQEs in units of Basic
Blocks (see MLX5_SEND_WQE_BB), not actual number of WQEs. Since the size
of a WQE can vary depending on transport type and features (e.g., atomic
operations, UMR, LSO), the actual number of WQEs can be significantly
smaller than the WQEBB count suggests.
This patch introduces a conservative estimation of the worst-case WQE size
— considering largest segments possible with 1 SGE and no inline data or
special features. It uses this to derive a more accurate max_qp_wr value.
Fixes: 938fe83c8d ("net/mlx5_core: New device capabilities handling")
Link: https://patch.msgid.link/r/7d992c9831c997ed5c33d30973406dc2dcaf5e89.1755088725.git.leon@kernel.org
Reported-by: Chuck Lever <cel@kernel.org>
Closes: https://lore.kernel.org/all/20250506142202.GJ2260621@ziepe.ca/
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 361fa7f813 ]
In otx2_cpt_dl_custom_egrp_create(), strscpy() is called with the length
of the source string rather than the size of the destination buffer.
This is fine as long as the destination buffer is larger than the source
string, but we should still use the destination buffer size instead to
call strscpy() as intended. And since 'tmp_buf' is a fixed-size buffer,
we can safely omit the size argument and let strscpy() infer it using
sizeof().
Fixes: d9d7749773 ("crypto: octeontx2 - add apis for custom engine groups")
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 251be2f603 ]
Since commit f7b705c238 ("scsi: pm80xx: Set phy_attached to zero when
device is gone") UBSAN reports:
UBSAN: array-index-out-of-bounds in drivers/scsi/pm8001/pm8001_sas.c:786:17
index 28 is out of range for type 'pm8001_phy [16]'
on rmmod when using an expander.
For a direct attached device, attached_phy contains the local phy id.
For a device behind an expander, attached_phy contains the remote phy
id, not the local phy id.
I.e. while pm8001_ha will have pm8001_ha->chip->n_phy local phys, for a
device behind an expander, attached_phy can be much larger than
pm8001_ha->chip->n_phy (depending on the amount of phys of the
expander).
E.g. on my system pm8001_ha has 8 phys with phy ids 0-7. One of the
ports has an expander connected. The expander has 31 phys with phy ids
0-30.
The pm8001_ha->phy array only contains the phys of the HBA. It does not
contain the phys of the expander. Thus, it is wrong to use attached_phy
to index the pm8001_ha->phy array for a device behind an expander.
Thus, we can only clear phy_attached for devices that are directly
attached.
Fixes: f7b705c238 ("scsi: pm80xx: Set phy_attached to zero when device is gone")
Reviewed-by: Igor Pylypiv <ipylypiv@google.com>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Link: https://lore.kernel.org/r/20250814173215.1765055-14-cassel@kernel.org
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit efaa2d815a ]
Compile-testing this driver is only possible when the AMBA bus driver is
available in the kernel:
x86_64-linux-ld: drivers/char/hw_random/nomadik-rng.o: in function `nmk_rng_remove':
nomadik-rng.c:(.text+0x67): undefined reference to `amba_release_regions'
x86_64-linux-ld: drivers/char/hw_random/nomadik-rng.o: in function `nmk_rng_probe':
nomadik-rng.c:(.text+0xee): undefined reference to `amba_request_regions'
x86_64-linux-ld: nomadik-rng.c:(.text+0x18d): undefined reference to `amba_release_regions'
The was previously implied by the 'depends on ARCH_NOMADIK', but needs to be
specified for the COMPILE_TEST case.
Fixes: d5e93b3374 ("hwrng: Kconfig - Add helper dependency on COMPILE_TEST")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c0485e864a ]
EUD_MODE_MANAGER2 register is mapped to a memory region that is marked
as read-only for operating system running at EL1, enforcing access
restrictions that prohibit direct memory-mapped writes via writel().
Attempts to write to this region from HLOS can result in silent failures
or memory access violations, particularly when toggling EUD (Embedded
USB Debugger) state. To ensure secure register access, modify the driver
to use qcom_scm_io_writel(), which routes the write operation to Qualcomm
Secure Channel Monitor (SCM). SCM has the necessary permissions to access
protected memory regions, enabling reliable control over EUD state.
SC7280, the only user of EUD is also affected, indicating that this could
never have worked on a properly fused device.
Fixes: 9a1bf58ccd ("usb: misc: eud: Add driver support for Embedded USB Debugger(EUD)")
Signed-off-by: Melody Olvera <quic_molvera@quicinc.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Souradeep Chowdhury <quic_schowdhu@quicinc.com>
Signed-off-by: Komal Bajaj <komal.bajaj@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250731-eud_mode_manager_secure_access-v8-1-4a5dcbb79f41@oss.qualcomm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7bb14b61b7 ]
The boot firmware may disable the U3 port early during boot and leave it
up to the controller or PHY driver to re-enable U3 when needed.
The Rockchip USBDP PHY driver currently does this for RK3576 and RK3588,
something the Rockchip Naneng Combo PHY driver never does for RK3568.
This may result in USB 3.0 ports being limited to only using USB 2.0 or
in special cases not working at all on RK3568.
Write to PIPE_GRF USB3OTGx_CON1 reg to ensure the U3 port is enabled
when a PHY with PHY_TYPE_USB3 mode is used.
Fixes: 7160820d74 ("phy: rockchip: add naneng combo phy for RK3568")
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Link: https://lore.kernel.org/r/20250723072324.2246498-1-jonas@kwiboo.se
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dc322d13cf ]
The zoran_fh structure is a wrapper around v4l2_fh. Its usage has been
mostly removed by commit 83f89a8bcb ("media: zoran: convert to vb2"),
but the structure stayed by mistake. It is now used in a single
location, assigned from a void pointer and then recast to a void
pointer, without being every accessed. Drop it.
Fixes: 83f89a8bcb ("media: zoran: convert to vb2")
Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e75ce11790 ]
If reserve_root mount option is not assigned, __allow_reserved_blocks()
will return false, it's not correct, fix it.
Fixes: 7e65be49ed ("f2fs: add reserved blocks for root user")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 260dcf5b06 ]
GCC 16 enables -Werror=unused-but-set-variable= which results in build
error with the following message.
drivers/gpu/drm/radeon/r600_cs.c: In function ‘r600_texture_size’:
drivers/gpu/drm/radeon/r600_cs.c:1411:29: error: variable ‘level’ set but not used [-Werror=unused-but-set-variable=]
1411 | unsigned offset, i, level;
| ^~~~~
cc1: all warnings being treated as errors
make[6]: *** [scripts/Makefile.build:287: drivers/gpu/drm/radeon/r600_cs.o] Error 1
level although is set, but in never used in the function
r600_texture_size. Thus resulting in dead code and this error getting
triggered.
Fixes: 60b212f8dd ("drm/radeon: overhaul texture checking. (v3)")
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Brahmajit Das <listout@listout.xyz>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4540aed51b ]
Yinhao et al. recently reported:
Our fuzzer tool discovered an uninitialized pointer issue in the
bpf_prog_test_run_xdp() function within the Linux kernel's BPF subsystem.
This leads to a NULL pointer dereference when a BPF program attempts to
deference the txq member of struct xdp_buff object.
The test initializes two programs of BPF_PROG_TYPE_XDP: progA acts as the
entry point for bpf_prog_test_run_xdp() and its expected_attach_type can
neither be of be BPF_XDP_DEVMAP nor BPF_XDP_CPUMAP. progA calls into a slot
of a tailcall map it owns. progB's expected_attach_type must be BPF_XDP_DEVMAP
to pass xdp_is_valid_access() validation. The program returns struct xdp_md's
egress_ifindex, and the latter is only allowed to be accessed under mentioned
expected_attach_type. progB is then inserted into the tailcall which progA
calls.
The underlying issue goes beyond XDP though. Another example are programs
of type BPF_PROG_TYPE_CGROUP_SOCK_ADDR. sock_addr_is_valid_access() as well
as sock_addr_func_proto() have different logic depending on the programs'
expected_attach_type. Similarly, a program attached to BPF_CGROUP_INET4_GETPEERNAME
should not be allowed doing a tailcall into a program which calls bpf_bind()
out of BPF which is only enabled for BPF_CGROUP_INET4_CONNECT.
In short, specifying expected_attach_type allows to open up additional
functionality or restrictions beyond what the basic bpf_prog_type enables.
The use of tailcalls must not violate these constraints. Fix it by enforcing
expected_attach_type in __bpf_prog_map_compatible().
Note that we only enforce this for tailcall maps, but not for BPF devmaps or
cpumaps: There, the programs are invoked through dev_map_bpf_prog_run*() and
cpu_map_bpf_prog_run*() which set up a new environment / context and therefore
these situations are not prone to this issue.
Fixes: 5e43f899b0 ("bpf: Check attach type at prog load time")
Reported-by: Yinhao Hu <dddddd@hust.edu.cn>
Reported-by: Kaiyan Mei <M202472210@hust.edu.cn>
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20250926171201.188490-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0cc114dc35 ]
When a module registers a struct_ops, the struct_ops type and its
corresponding map_value type ("bpf_struct_ops_") may reside in different
btf objects, here are four possible case:
+--------+---------------+-------------+---------------------------------+
| |bpf_struct_ops_| xxx_ops | |
+--------+---------------+-------------+---------------------------------+
| case 0 | btf_vmlinux | btf_vmlinux | be used and reg only in vmlinux |
+--------+---------------+-------------+---------------------------------+
| case 1 | btf_vmlinux | mod_btf | INVALID |
+--------+---------------+-------------+---------------------------------+
| case 2 | mod_btf | btf_vmlinux | reg in mod but be used both in |
| | | | vmlinux and mod. |
+--------+---------------+-------------+---------------------------------+
| case 3 | mod_btf | mod_btf | be used and reg only in mod |
+--------+---------------+-------------+---------------------------------+
Currently we figure out the mod_btf by searching with the struct_ops type,
which makes it impossible to figure out the mod_btf when the struct_ops
type is in btf_vmlinux while it's corresponding map_value type is in
mod_btf (case 2).
The fix is to use the corresponding map_value type ("bpf_struct_ops_")
as the lookup anchor instead of the struct_ops type to figure out the
`btf` and `mod_btf` via find_ksym_btf_id(), and then we can locate
the kern_type_id via btf__find_by_name_kind() with the `btf` we just
obtained from find_ksym_btf_id().
With this change the lookup obtains the correct btf and mod_btf for case 2,
preserves correct behavior for other valid cases, and still fails as
expected for the invalid scenario (case 1).
Fixes: 590a008882 ("bpf: libbpf: Add STRUCT_OPS support")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/bpf/20250926071751.108293-1-alibuda@linux.alibaba.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 70e633bede ]
When the driver is removed, the clocks are first enabled by
calling pm_runtime_get_sync(), and then disabled with
pm_runtime_put_sync().
If CONFIG_PM=y, clocks for this controller are disabled when it's in
the idle state. So the clocks are properly disabled when the driver
exits.
Othewise, the clocks are always enabled and the PM functions have
no effect. Therefore, the driver exits without disabling the clocks.
# cat /sys/kernel/debug/clk/clk-pclk/clk_enable_count
18
# echo 1214a000.i2c > /sys/bus/platform/drivers/i2c_designware/bind
# cat /sys/kernel/debug/clk/clk-pclk/clk_enable_count
20
# echo 1214a000.i2c > /sys/bus/platform/drivers/i2c_designware/unbind
# cat /sys/kernel/debug/clk/clk-pclk/clk_enable_count
20
To ensure that the clocks can be disabled correctly even without
CONFIG_PM=y, should add the following fixes:
- Replace with pm_runtime_put_noidle(), which only decrements the runtime
PM usage count.
- Call i2c_dw_prepare_clk(false) to explicitly disable the clocks.
Fixes: 7272194ed3 ("i2c-designware: add minimal support for runtime PM")
Co-developed-by: Kohei Ito <ito.kohei@socionext.com>
Signed-off-by: Kohei Ito <ito.kohei@socionext.com>
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b492183652 ]
The old IC does not support the I2C_MASTER_WRRD (write-then-read)
function, but the current code’s handling of i2c->auto_restart may
potentially lead to entering the I2C_MASTER_WRRD software flow,
resulting in unexpected bugs.
Instead of repurposing the auto_restart flag, add a separate flag
to signal I2C_MASTER_WRRD operations.
Also fix handling of msgs. If the operation (i2c->op) is
I2C_MASTER_WRRD, then the msgs pointer is incremented by 2.
For all other operations, msgs is simply incremented by 1.
Fixes: b2ed11e224 ("I2C: mediatek: Add driver for MediaTek MT8173 I2C controller")
Signed-off-by: Leilk.Liu <leilk.liu@mediatek.com>
Suggested-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: Chen-Yu Tsai <wenst@chromium.org>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c02e4644f8 ]
Distinct between fan speed setting request coming for hwmon and
thermal subsystems.
There are fields 'last_hwmon_state' and 'last_thermal_state' in the
structure 'mlxreg_fan_pwm', which respectively store the cooling state
set by the 'hwmon' and 'thermal' subsystem.
The purpose is to make arbitration of fan speed setting. For example, if
fan speed required to be not lower than some limit, such setting is to
be performed through 'hwmon' subsystem, thus 'thermal' subsystem will
not set fan below this limit.
Currently, the 'last_thermal_state' is also be updated by 'hwmon' causing
cooling state to never be set to a lower value.
Eliminate update of 'last_thermal_state', when request is coming from
'hwmon' subsystem.
Fixes: da74944d3a ("hwmon: (mlxreg-fan) Use pwm attribute for setting fan speed low limit")
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Link: https://lore.kernel.org/r/20250113084859.27064-2-vadimp@nvidia.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit edcc8a38b5 ]
The commit c2c60ea37e ("once: use __section(".data.once")") moved
DO_ONCE's ___done variable to .data.once section, which conflicts with
DO_ONCE_LITE() that also uses the same section.
This creates a race condition when clear_warn_once is used:
Thread 1 (DO_ONCE) Thread 2 (DO_ONCE)
__do_once_start
read ___done (false)
acquire once_lock
execute func
__do_once_done
write ___done (true) __do_once_start
release once_lock // Thread 3 clear_warn_once reset ___done
read ___done (false)
acquire once_lock
execute func
schedule once_work __do_once_done
once_deferred: OK write ___done (true)
static_branch_disable release once_lock
schedule once_work
once_deferred:
BUG_ON(!static_key_enabled)
DO_ONCE_LITE() in once_lite.h is used by WARN_ON_ONCE() and other warning
macros. Keep its ___done flag in the .data..once section and allow resetting
by clear_warn_once, as originally intended.
In contrast, DO_ONCE() is used for functions like get_random_once() and
relies on its ___done flag for internal synchronization. We should not reset
DO_ONCE() by clear_warn_once.
Fix it by isolating DO_ONCE's ___done into a separate .data..do_once section,
shielding it from clear_warn_once.
Fixes: c2c60ea37e ("once: use __section(".data.once")")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Qi Xi <xiqi2@huawei.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d4680a11e1 ]
Some distributions (e.g., CachyOS) support building the kernel with -O3,
but doing so may break kfuncs, resulting in their symbols not being
properly exported.
In fact, with gcc -O3, some kfuncs may be optimized away despite being
annotated as noinline. This happens because gcc can still clone the
function during IPA optimizations, e.g., by duplicating or inlining it
into callers, and then dropping the standalone symbol. This breaks BTF
ID resolution since resolve_btfids relies on the presence of a global
symbol for each kfunc.
Currently, this is not an issue for upstream, because we don't allow
building the kernel with -O3, but it may be safer to address it anyway,
to prevent potential issues in the future if compilers become more
aggressive with optimizations.
Therefore, add __noclone to __bpf_kfunc to ensure kfuncs are never
cloned and remain distinct, globally visible symbols, regardless of
the optimization level.
Fixes: 57e7c169cd ("bpf: Add __bpf_kfunc tag for marking kernel functions as kfuncs")
Acked-by: David Vernet <void@manifault.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Link: https://lore.kernel.org/r/20250924081426.156934-1-arighi@nvidia.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0ff52df6b3 ]
Commit d5094bcb5b ("tools/nolibc: define time_t in terms of
__kernel_old_time_t") made nolibc use the kernel's time type so that
`time_t` matches `timespec::tv_sec` on all ABIs (notably x32).
But since __kernel_old_time_t is fairly new, notably from 2020 in commit
94c467ddb2 ("y2038: add __kernel_old_timespec and __kernel_old_time_t"),
nolibc builds that rely on host headers may fail.
Switch to __kernel_time_t, which is the same as __kernel_old_time_t and
has existed for longer.
Tested in PPC VM of Open Source Lab of Oregon State University
(./tools/testing/selftests/rcutorture/bin/mkinitrd.sh)
Fixes: d5094bcb5b ("tools/nolibc: define time_t in terms of __kernel_old_time_t")
Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
[Thomas: Reformat commit and its message a bit]
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ccf09357ff ]
The smp_call_function_many() kerneldoc comment got out of sync with the
function definition (bool parameter "wait" is incorrectly described as a
bitmask in it), so fix it up by copying the "wait" description from the
smp_call_function() kerneldoc and add information regarding the handling
of the local CPU to it.
Fixes: 49b3bd213a ("smp: Fix all kernel-doc warnings")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e8cfc524ea ]
Check if watchdog device supports WDIOF_KEEPALIVEPING option before
entering keep_alive() ping test loop. Fix watchdog-test silently looping
if ioctl based ping is not supported by the device. Exit from test in
such case instead of getting stuck in loop executing failing keep_alive()
watchdog_info:
identity: m41t93 rtc Watchdog
firmware_version: 0
Support/Status: Set timeout (in seconds)
Support/Status: Watchdog triggers a management or other external alarm not a reboot
Watchdog card disabled.
Watchdog timeout set to 5 seconds.
Watchdog ping rate set to 2 seconds.
Watchdog card enabled.
WDIOC_KEEPALIVE not supported by this device
without this change
Watchdog card disabled.
Watchdog timeout set to 5 seconds.
Watchdog ping rate set to 2 seconds.
Watchdog card enabled.
Watchdog Ticking Away!
(Where test stuck here forver silently)
Updated change log at commit time:
Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250914152840.GA3047348@bhairav-test.ee.iitb.ac.in
Fixes: d89d08ffd2 ("selftests: watchdog: Fix ioctl SET* error paths to take oneshot exit path")
Signed-off-by: Akhilesh Patil <akhilesh@ee.iitb.ac.in>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a7869b0a25 ]
Driver wants to nack the IBI request when the target is not in the
known address list. In below code, svc_i3c_master_nack_ibi() will
cause undefined behavior when using AUTOIBI with auto response rule,
because hw always auto ack the IBI request.
switch (ibitype) {
case SVC_I3C_MSTATUS_IBITYPE_IBI:
dev = svc_i3c_master_dev_from_addr(master, ibiaddr);
if (!dev || !is_events_enabled(master, SVC_I3C_EVENT_IBI))
svc_i3c_master_nack_ibi(master);
...
break;
AutoIBI has another issue that the controller doesn't quit AutoIBI state
after IBIWON polling timeout when there is a SDA glitch(high->low->high).
1. SDA high->low: raising an interrupt to execute IBI ISR
2. SDA low->high
3. Driver writes an AutoIBI request
4. AutoIBI process does not start because SDA is not low
5. IBIWON polling times out
6. Controller reamins in AutoIBI state and doesn't accept EmitStop request
Emitting broadcast address with IBIRESP_MANUAL avoids both issues.
Fixes: dd3c52846d ("i3c: master: svc: Add Silvaco I3C master driver")
Signed-off-by: Stanley Chu <yschu@nuvoton.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://lore.kernel.org/r/20250829012309.3562585-2-yschu@nuvoton.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit db5a5406fb ]
It’s possible for more than one async command to be in flight from
__nvmet_fc_send_ls_req. For each command, a tgtport reference is taken.
In the current code, only one put work item is queued at a time, which
results in a leaked reference.
To fix this, move the work item to the nvmet_fc_ls_req_op struct, which
already tracks all resources related to the command.
Fixes: 710c69dbac ("nvmet-fc: avoid deadlock on delete association path")
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Daniel Wagner <wagi@kernel.org>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 00f83f0e07 ]
The function set_prescale_div() is responsible for calculating the clock
divisor settings such that the input clock rate is divided down such that
the required period length is at most 0x10000 clock ticks. If period_cycles
is an integer multiple of 0x10000, the divisor period_cycles / 0x10000 is
good enough. So round up in the calculation of the required divisor and
compare it using >= instead of >.
Fixes: 19891b20e7 ("pwm: pwm-tiehrpwm: PWM driver support for EHRPWM")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Link: https://lore.kernel.org/r/85488616d7bfcd9c32717651d0be7e330e761b9c.1754927682.git.u.kleine-koenig@baylibre.com
Signed-off-by: Uwe Kleine-König <ukleinek@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bc7ce5bfc5 ]
In Up-Count Mode the timer is reset to zero one tick after it reaches
TBPRD, so the period length is (TBPRD + 1) * T_TBCLK. This matches both
the documentation and measurements. So the value written to the TBPRD has
to be one less than the calculated period_cycles value.
A complication here is that for a 100% relative duty-cycle the value
written to the CMPx register has to be TBPRD + 1 which might overflow if
TBPRD is 0xffff. To handle that the calculation of the AQCTLx register
has to be moved to ehrpwm_pwm_config() and the edge at CTR = CMPx has to
be skipped.
Additionally the AQCTL_PRD register field has to be 0 because that defines
the hardware's action when the maximal counter value is reached, which is
(as above) one clock tick before the period's end. The period start edge
has to happen when the counter is reset and so is defined in the AQCTL_ZRO
field.
Fixes: 19891b20e7 ("pwm: pwm-tiehrpwm: PWM driver support for EHRPWM")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Link: https://lore.kernel.org/r/dc818c69b7cf05109ecda9ee6b0043a22de757c1.1754927682.git.u.kleine-koenig@baylibre.com
Signed-off-by: Uwe Kleine-König <ukleinek@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 21a5e91fda ]
The pwm driver calls pm_runtime_get_sync() when the hardware becomes
enabled and pm_runtime_put_sync() when it becomes disabled. The PWM's
state is kept when a consumer goes away, so the call to
pm_runtime_put_sync() in the .free() callback is unbalanced resulting in
a non-functional device and a reference underlow for the second consumer.
The easiest fix for that issue is to just not drop the runtime PM
reference in .free(), so do that.
Fixes: 19891b20e7 ("pwm: pwm-tiehrpwm: PWM driver support for EHRPWM")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Link: https://lore.kernel.org/r/bbb089c4b5650cc1f7b25cf582d817543fd25384.1754927682.git.u.kleine-koenig@baylibre.com
Signed-off-by: Uwe Kleine-König <ukleinek@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 072755cca7 ]
Rename the inner 'frm' variable to 'resp_frm' in the write path of
mmc_route_rpmb_frames() to avoid shadowing the outer 'frm' variable.
The function declares 'frm' at function scope pointing to the request
frame, but then redeclares another 'frm' variable inside the write
block pointing to the response frame. This shadowing makes the code
confusing and error-prone.
Using 'resp_frm' for the response frame makes the distinction clear
and improves code readability.
Fixes: 7852028a35 ("mmc: block: register RPMB partition with the RPMB subsystem")
Reviewed-by: Avri Altman <avri.altman@sandisk.com>
Reviewed-by: Jens Wiklander <jens.wiklander@linaro.org>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0370911565 ]
The touchscreen controller used with the original Krabby design is the
Elan eKTH6918, which is in the same family as eKTH6915, but supporting
a larger screen size with more sense lines.
OTOH, the touchscreen controller that actually shipped on the Tentacruel
devices is the Elan eKTH6A12NAY. A compatible string was added for it
specifically because it has different power sequencing timings.
Fix up the touchscreen nodes for both these. This also includes adding
a previously missing reset line. Also add "no-reset-on-power-off" since
the power is always on, and putting it in reset would consume more
power.
Fixes: 8855d01fb8 ("arm64: dts: mediatek: Add MT8186 Krabby platform based Tentacruel / Tentacool")
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Link: https://lore.kernel.org/r/20250812090135.3310374-1-wenst@chromium.org
Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cdc06f9126 ]
Make sure to drop the reference to the saw device taken by
of_find_device_by_node() after retrieving its driver data during
probe().
Also drop the reference to the CPU node sooner to avoid leaking it in
case there is no saw node or device.
Fixes: 60f3692b5f ("cpuidle: qcom_spm: Detach state machine from main SPM handling")
Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f89c7fb83a ]
On RK3588 with LPDDR4X memory, the cycle count as returned by
perf stat -a -e rockchip_ddr/cycles/ sleep 1
consistently reads half as much as what the actual DDR frequency is at.
For a LPDDR4X module running at 2112MHz, I get more like 1056059916
cycles per second, which is almost bang-on half what it should be. No,
I'm not mixing up megatransfers and megahertz.
Consulting the downstream driver, this appears to be because the RK3588
hardware specifically (and RK3528 as well, for future reference) needs a
multiplier of 2 to get to the correct frequency with everything but
LPDDR5.
The RK3588's actual memory bandwidth measurements in MB/s are correct
however, as confirmed with stress-ng --stream. This makes me think the
access counters are not affected in the same way. This tracks with the
vendor kernel not multiplying the access counts either.
Solve this by adding a new member to the dfi struct, which each SoC can
set to whatever they want, but defaults to 1 if left unset by the SoC
init functions. The event_get_count op can then use this multiplier if
the cycle count is requested.
The cycle multiplier is not used in rockchip_dfi_get_event because the
vendor driver doesn't use it there either, and we don't do other actual
bandwidth unit conversion stuff in there anyway.
Fixes: 481d97ba61 ("PM / devfreq: rockchip-dfi: add support for RK3588")
Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Link: https://lore.kernel.org/lkml/20250530-rk3588-dfi-improvements-v1-1-6e077c243a95@collabora.com/
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3374b5fb26 ]
When test suspend resume with 6.8 based kernel, system can't resume
and I got below error which can be also reproduced with 6.16 rc6+
kernel.
mtk-pcie-gen3 112f0000.pcie: PCIe link down, current LTSSM state: detect.quiet (0x0)
mtk-pcie-gen3 112f0000.pcie: PM: dpm_run_callback(): genpd_resume_noirq returns -110
mtk-pcie-gen3 112f0000.pcie: PM: failed to resume noirq: error -110
After investigation, looks pcie0 has the same problem as pcie1 as
decribed in commit 3d7fdd8e38 ("arm64: dts: mediatek: mt8195:
Remove suspend-breaking reset from pcie1").
Fixes: ecc0af6a3f ("arm64: dts: mt8195: Add pcie and pcie phy nodes")
Signed-off-by: Guoqing Jiang <guoqing.jiang@canonical.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Macpaul Lin <macpaul.lin@mediatek.com>
Link: https://lore.kernel.org/r/20250721095959.57703-1-guoqing.jiang@canonical.com
Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4b59a9f762 ]
If AT_SYSINFO_EHDR is missing the whole test needs to be skipped.
Currently this results in the following output:
TAP version 13
1..16
# AT_SYSINFO_EHDR is not present!
This output is incorrect, as "1..16" still requires the subtest lines to
be printed, which isn't done however.
Switch to the correct skipping functions, so the output now correctly
indicates that no subtests are being run:
TAP version 13
1..0 # SKIP AT_SYSINFO_EHDR is not present!
Fixes: 693f5ca08c ("kselftest: Extend vDSO selftest")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250812-vdso-tests-fixes-v2-2-90f499dd35f8@linutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9f15e0f9ef ]
The _rval register variable is meant to be an output operand of the asm
statement but is instead used as input operand.
clang 20.1 notices this and triggers -Wuninitialized warnings:
tools/testing/selftests/timers/auxclock.c:154:10: error: variable '_rval' is uninitialized when used here [-Werror,-Wuninitialized]
154 | return VDSO_CALL(self->vdso_clock_gettime64, 2, clockid, ts);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
tools/testing/selftests/timers/../vDSO/vdso_call.h:59:10: note: expanded from macro 'VDSO_CALL'
59 | : "r" (_rval) \
| ^~~~~
tools/testing/selftests/timers/auxclock.c:154:10: note: variable '_rval' is declared here
tools/testing/selftests/timers/../vDSO/vdso_call.h:47:2: note: expanded from macro 'VDSO_CALL'
47 | register long _rval asm ("r3"); \
| ^
It seems the list of input and output operands have been switched around.
However as the argument registers are not always initialized they can not
be marked as pure inputs as that would trigger -Wuninitialized warnings.
Adding _rval as another input and output operand does also not work as it
would collide with the existing _r3 variable.
Instead reuse _r3 for both the argument and the return value.
Fixes: 6eda706a53 ("selftests: vDSO: fix the way vDSO functions are called for powerpc")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/all/20250812-vdso-tests-fixes-v2-1-90f499dd35f8@linutronix.de
Closes: https://lore.kernel.org/oe-kbuild-all/202506180223.BOOk5jDK-lkp@intel.com/
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7942b226e6 ]
When executing modinfo null_blk, there is an error in the description
of module parameter mbps, and the output information of cache_size is
incomplete.The output of modinfo before and after applying this patch
is as follows:
Before:
[...]
parm: cache_size:ulong
[...]
parm: mbps:Cache size in MiB for memory-backed device.
Default: 0 (none) (uint)
[...]
After:
[...]
parm: cache_size:Cache size in MiB for memory-backed device.
Default: 0 (none) (ulong)
[...]
parm: mbps:Limit maximum bandwidth (in MiB/s).
Default: 0 (no limit) (uint)
[...]
Fixes: 058efe000b ("null_blk: add module parameters for 4 options")
Signed-off-by: Genjian Zhang <zhanggenjian@kylinos.cn>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit def5612170 ]
Fix the checkpatch warning:
CHECK: Alignment should match open parenthesis
Fixes: 0cb172a491 ("power: supply: cw2015: Use device managed API to simplify the code")
Signed-off-by: Andy Yan <andyshrk@163.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 27322753c8 ]
The dtbs_check validation for am335x-cm-t335.dtb flags an error
for an unevaluated 'num-serializer' property in the mcasp0 node.
This property is obsolete; it is not defined in the davinci-mcasp-audio
schema and is not used by the corresponding (or any) driver.
Remove this unused property to fix the schema validation warning.
Fixes: 48ab364478 ("ARM: dts: cm-t335: add audio support")
Signed-off-by: Jihed Chaibi <jihed.chaibi.dev@gmail.com>
Link: https://lore.kernel.org/r/20250830215957.285694-1-jihed.chaibi.dev@gmail.com
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5af5b85505 ]
The ti,keep-vref-on property, defined as a boolean flag in the Device
Tree schema, was incorrectly assigned a value (<1>) in the DTS file,
causing a validation error: "size (4) error for type flag". Remove
the value to match the schema and ensure compatibility with the driver
using device_property_read_bool(). This fixes the dtbs_check error.
Fixes: ed05637c30 ("ARM: dts: omap3-devkit8000: Add ADS7846 Touchscreen support")
Signed-off-by: Jihed Chaibi <jihed.chaibi.dev@gmail.com>
Link: https://lore.kernel.org/r/20250822225052.136919-1-jihed.chaibi.dev@gmail.com
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit be82483d1b ]
If system suspend is aborted in the "noirq" phase (for instance, due to
an error returned by one of the device callbacks), power.is_noirq_suspended
will not be set for some devices and device_resume_noirq() will return
early for them. Consequently, noirq resume callbacks will not run for
them at all because the noirq suspend callbacks have not run for them
yet.
If any of them has power.must_resume set and late suspend has been
skipped for it (due to power.smart_suspend), early resume should be
skipped for it either, or its state may become inconsistent (for
instance, if the early resume assumes that it will always follow
noirq resume).
Make that happen by clearing power.must_resume in device_resume_noirq()
for devices with power.is_noirq_suspended clear that have been left in
suspend by device_suspend_late(), which will subsequently cause
device_resume_early() to leave the device in suspend and avoid
changing its state.
Fixes: 0d4b54c6fe ("PM / core: Add LEAVE_SUSPENDED driver flag")
Link: https://lore.kernel.org/linux-pm/5d692b81-6f58-4e86-9cb0-ede69a09d799@rowland.harvard.edu/
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://patch.msgid.link/3381776.aeNJFYEL58@rafael.j.wysocki
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d6058316d1 ]
Memory programming doesn't work for devices without page support.
For example, LP5562 has 3 engines but doesn't support pages,
the start address is changed depending on engine number.
According to datasheet [1], the PROG MEM register addresses for each
engine are as follows:
Engine 1: 0x10
Engine 2: 0x30
Engine 3: 0x50
However, the current implementation incorrectly calculates the address
of PROG MEM register using the engine index starting from 1:
prog_mem_base = 0x10
LP55xx_BYTES_PER_PAGE = 0x20
Engine 1: 0x10 + 0x20 * 1 = 0x30
Engine 2: 0x10 + 0x20 * 2 = 0x50
Engine 3: 0x10 + 0x20 * 3 = 0x70
This results in writing to the wrong engine memory, causing pattern
programming to fail.
To correct it, the engine index should be decreased:
Engine 1: 0x10 + 0x20 * 0 = 0x10
Engine 2: 0x10 + 0x20 * 1 = 0x30
Engine 3: 0x10 + 0x20 * 2 = 0x50
1 - https://www.ti.com/lit/ds/symlink/lp5562.pdf
Fixes: 31379a57cf ("leds: leds-lp55xx: Generalize update_program_memory function")
Signed-off-by: Andrei Lalaev <andrei.lalaev@anton-paar.com>
Link: https://lore.kernel.org/r/20250820-lp5562-prog-mem-address-v1-1-8569647fa71d@anton-paar.com
Signed-off-by: Lee Jones <lee@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9d35d068fb ]
Change the 'ret' variable from u32 to int to store negative error codes or
zero returned by of_property_read_u32().
Storing the negative error codes in unsigned type, doesn't cause an issue
at runtime but it's ugly as pants. Additionally, assigning negative error
codes to unsigned type may trigger a GCC warning when the -Wsign-conversion
flag is enabled.
No effect on runtime.
Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Fixes: 0fbeae70ee ("regulator: add SCMI driver")
Link: https://patch.msgid.link/20250829101411.625214-1-rongqianfeng@vivo.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6e08cdd604 ]
PCIe `port01` of t8103-j457 (iMac, M1, 2 USB-C ports, 2021) is unused
and disabled. Linux' PCI subsystem assigns the ethernet nic from
`port02` to bus 02. This results into assigning `pcie0_dart_1` from the
disabled port as iommu. The `pcie0_dart_1` instance is disabled and
probably fused off (it is on the M2 Pro Mac mini which has a disabled
PCIe port as well).
Without iommu the ethernet nic is not expected work.
Adjusts the "bus-range" and the PCIe devices "reg" property to PCI
subsystem's bus number.
Fixes: 7c77ab91b3 ("arm64: dts: apple: Add missing M1 (t8103) devices")
Reviewed-by: Neal Gompa <neal@gompa.dev>
Reviewed-by: Sven Peter <sven@kernel.org>
Signed-off-by: Janne Grunau <j@jannau.net>
Link: https://lore.kernel.org/r/20250823-apple-dt-sync-6-17-v2-1-6dc0daeb4786@jannau.net
Signed-off-by: Sven Peter <sven@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 296302d3d8 ]
The at91_mckx_ps_restore() assembly function is responsible for setting
back MCKx system bus clocks after exiting low power modes.
Fix a typo and use tmp3 variable instead of tmp2 to correctly set MCKx
to previously saved state.
Tmp2 was used without the needed changes in CSS and DIV. Moreover the
required bit 7, telling that MCR register's content is to be changed
(CMD/write), was not set.
Fix function comment to match tmp variables actually used.
Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Fixes: 28eb1d40fe ("ARM: at91: pm: add support for MCK1..4 save/restore for ulp modes")
Link: https://lore.kernel.org/r/20250827145427.46819-3-nicolas.ferre@microchip.com
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
[claudiu.beznea: s/sate/state in commit description]
Signed-off-by: Claudiu Beznea <claudiu.beznea@tuxon.dev>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4c7ef92f6d ]
In __blk_mq_update_nr_hw_queues() the return value of
blk_mq_sysfs_register_hctxs() is not checked. If sysfs creation for hctx
fails, later changing the number of hw_queues or removing disk will
trigger the following warning:
kernfs: can not remove 'nr_tags', no directory
WARNING: CPU: 2 PID: 637 at fs/kernfs/dir.c:1707 kernfs_remove_by_name_ns+0x13f/0x160
Call Trace:
remove_files.isra.1+0x38/0xb0
sysfs_remove_group+0x4d/0x100
sysfs_remove_groups+0x31/0x60
__kobject_del+0x23/0xf0
kobject_del+0x17/0x40
blk_mq_unregister_hctx+0x5d/0x80
blk_mq_sysfs_unregister_hctxs+0x94/0xd0
blk_mq_update_nr_hw_queues+0x124/0x760
nullb_update_nr_hw_queues+0x71/0xf0 [null_blk]
nullb_device_submit_queues_store+0x92/0x120 [null_blk]
kobjct_del() was called unconditionally even if sysfs creation failed.
Fix it by checkig the kobject creation statusbefore deleting it.
Fixes: 477e19dedc ("blk-mq: adjust debugfs and sysfs register when updating nr_hw_queues")
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20250826084854.1030545-1-linan666@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f87412d18e ]
Unconditionally clear the TCS_AMC_MODE_TRIGGER bit when a
transaction completes. Previously this bit was only cleared when
a wake TCS was borrowed as an AMC TCS but not for dedicated
AMC TCS. Leaving this bit set for AMC TCS and entering deeper low
power modes can generate a false completion IRQ.
Prevent this scenario by always clearing the TCS_AMC_MODE_TRIGGER
bit upon receiving a completion IRQ.
Fixes: 15b3bf61b8 ("soc: qcom: rpmh-rsc: Clear active mode configuration for wake TCS")
Signed-off-by: Sneh Mankad <sneh.mankad@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250825-rpmh_rsc_change-v1-1-138202c31bf6@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6fdaf3b183 ]
According to the imx95 RM, the lpuart7 rx and tx DMA's srcid are 88 and 87,
and the lpuart8 rx and tx DMA's srcid are 90 and 89. So correct them.
Fixes: 915fd2e127 ("arm64: dts: imx95: add edma[1..3] nodes")
Signed-off-by: Joy Zou <joy.zou@nxp.com>
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c94737568b ]
The assignment of the USB ports is wrong and needs to be swapped.
The OTG (USB-C) port is on the first port and the host port with
the onboard hub is on the second port.
Signed-off-by: Frieder Schrempf <frieder.schrempf@kontron.de>
Fixes: 2b52fd6035 ("arm64: dts: Add support for Kontron i.MX93 OSM-S SoM and BL carrier board")
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e8faa8a466 ]
After commit 20bda12a0e (“firmware: arm_scmi: Make VirtIO transport a
standalone driver”), the VirtIO transport probes independently. During
scmi_virtio_probe, scmi_probe() is called, which intune invokes
scmi_protocol_acquire() that sends a message over the virtqueue and
waits for a reply.
Previously, DRIVER_OK was only set after scmi_vio_probe, in the core
virtio via virtio_dev_probe(). According to the Virtio spec (3.1 Device
Initialization):
| The driver MUST NOT send any buffer available notifications to the
| device before setting DRIVER_OK.
Some type-1 hypervisors block available-buffer notifications until the
driver is marked OK. In such cases, scmi_vio_probe stalls in
scmi_wait_for_reply(), and the probe never completes.
Resolve this by setting DRIVER_OK immediately after the device-specific
setup, so scmi_probe() can safely send notifications.
Note after splitting the transports into modules, the probe sequence
changed a bit. We can no longer rely on virtio_device_ready() being
called by the core in virtio_dev_probe(), because scmi_vio_probe()
doesn’t complete until the core SCMI stack runs scmi_probe(), which
immediately issues the initial BASE protocol exchanges.
Fixes: 20bda12a0e ("firmware: arm_scmi: Make VirtIO transport a standalone driver")
Signed-off-by: Junnan Wu <junnan01.wu@samsung.com>
Message-Id: <20250812075343.3201365-1-junnan01.wu@samsung.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cd5d4621ba ]
Broadcom STB platforms were early adopters (2017) of the SCMI framework and as
a result, not all deployed systems have a Device Tree entry where SCMI
protocol 0x13 (PERFORMANCE) is declared as a clock provider, nor are the
CPU Device Tree node(s) referencing protocol 0x13 as their clock
provider. This was clarified in commit e11c480b6d ("dt-bindings:
firmware: arm,scmi: Extend bindings for protocol@13") in 2023.
For those platforms, we allow the checks done by scmi_dev_used_by_cpus()
to continue, and in the event of not having done an early return, we key
off the documented compatible string and give them a pass to continue to
use scmi-cpufreq.
Fixes: 6c9bb86922 ("cpufreq: scmi: Skip SCMI devices that aren't used by the CPUs")
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit abdaf49be5 ]
Graph tracer framework ensures we won't migrate, kprobe_multi_link_prog_run
called all the way from graph tracer, which disables preemption in
function_graph_enter_regs, as Jiri and Yonghong suggested, there is no
need to use migrate_disable. As a result, some overhead may will be reduced.
And add cant_sleep check for __this_cpu_inc_return.
Fixes: 0dcac27254 ("bpf: Add multi kprobe link")
Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250814121430.2347454-1-chen.dylane@linux.dev
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c80d797206 ]
Based on a bisect, it appears that commit 7ee9887703 ("timers:
Implement the hierarchical pull model") has somehow inadvertently
broken BPF selftest test_tcpnotify_user. The error that is being
generated by this test is as follows:
FAILED: Wrong stats Expected 10 calls, got 8
It looks like the change allows timer functions to be run on CPUs
different from the one they are armed on. The test had pinned itself
to CPU 0, and in the past the retransmit attempts also occurred on CPU
0. The test had set the max_entries attribute for
BPF_MAP_TYPE_PERF_EVENT_ARRAY to 2 and was calling
bpf_perf_event_output() with BPF_F_CURRENT_CPU, so the entry was
likely to be in range. With the change to allow timers to run on other
CPUs, the current CPU tasked with performing the retransmit might be
bumped and in turn fall out of range, as the event will be filtered
out via __bpf_perf_event_output() using:
if (unlikely(index >= array->map.max_entries))
return -E2BIG;
A possible change would be to explicitly set the max_entries attribute
for perf_event_map in test_tcpnotify_kern.c to a value that's at least
as large as the number of CPUs. As it turns out however, if the field
is left unset, then the libbpf will determine the number of CPUs available
on the underlying system and update the max_entries attribute accordingly
in map_set_def_max_entries().
A further problem with the test is that it has a thread that continues
running up until the program exits. The main thread cleans up some
LIBBPF data structures, while the other thread continues to use them,
which inevitably will trigger a SIGSEGV. This can be dealt with by
telling the thread to run for as long as necessary and doing a
pthread_join on it before exiting the program.
Finally, I don't think binding the process to CPU 0 is meaningful for
this test any more, so get rid of that.
Fixes: 435f90a338 ("selftests/bpf: add a test case for sock_ops perf-event notification")
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/aJ8kHhwgATmA3rLf@google.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8912b2862b ]
rzg3s_oen_read() returns a u32 value, but previously propagated a negative
error code from rzg3s_pin_to_oen_bit(), resulting in an unintended large
positive value due to unsigned conversion. This caused incorrect
output-enable reporting for certain pins.
Without this patch, pins P1_0-P1_4 and P7_0-P7_4 are incorrectly reported
as "output enabled" in the pinconf-pins debugfs file. With this fix, only
P1_0-P1_1 and P7_0-P7_1 are shown as "output enabled", which matches the
hardware manual.
Fix this by returning 0 when the OEN bit lookup fails, treating the pin
as output-disabled by default.
Fixes: a9024a323a ("pinctrl: renesas: rzg2l: Clean up and refactor OEN read/write functions")
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/20250709160819.306875-2-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2d83ed6c6c ]
Since the support of bs < ps support, extent_writepage_io() will submit
multiple blocks inside the folio.
But if we hit error submitting one sector, but the next sector can still
be submitted successfully, the function extent_writepage_io() will still
return 0.
This will make btrfs to silently ignore the error without setting error
flag for the filemap.
Fix it by recording the first error hit, and always return that value.
Fixes: 8bf334beb3 ("btrfs: fix double accounting race when extent_writepage_io() failed")
Reviewed-by: Daniel Vacek <neelx@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 54d94c422f ]
When CONFIG_SECURITY is not set, CONFIG_LSM (builtin_lsm_order) does
not need to be visible and settable since builtin_lsm_order is defined in
security.o, which is only built when CONFIG_SECURITY=y.
So make CONFIG_LSM depend on CONFIG_SECURITY.
Fixes: 13e735c0e9 ("LSM: Introduce CONFIG_LSM")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
[PM: subj tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ac9c408ed1 ]
RDPID instruction outputs to a word-sized register (64-bit on x86_64 and
32-bit on x86_32). Use an unsigned long variable to store the correct size.
LSL outputs to 32-bit register, use %k operand prefix to always print the
32-bit name of the register.
Use RDPID insn mnemonic while at it as the minimum binutils version of
2.30 supports it.
[ bp: Merge two patches touching the same function into a single one. ]
Fixes: ffebbaedc8 ("x86/vdso: Introduce helper functions for CPU and node number")
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250616095315.230620-1-ubizjak@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2e6fe1bbef ]
When loading the i10nm_edac driver on some Intel Granite Rapids servers,
a call trace may appear as follows:
UBSAN: shift-out-of-bounds in drivers/edac/skx_common.c:453:16
shift exponent -66 is negative
...
__ubsan_handle_shift_out_of_bounds+0x1e3/0x390
skx_get_dimm_info.cold+0x47/0xd40 [skx_edac_common]
i10nm_get_dimm_config+0x23e/0x390 [i10nm_edac]
skx_register_mci+0x159/0x220 [skx_edac_common]
i10nm_init+0xcb0/0x1ff0 [i10nm_edac]
...
This occurs because some BIOS may disable a memory controller if there
aren't any memory DIMMs populated on this memory controller. The DIMMMTR
register of this disabled memory controller contains the invalid value
~0, resulting in the call trace above.
Fix this call trace by skipping DIMM enumeration on a disabled memory
controller.
Fixes: ba987eaaab ("EDAC/i10nm: Add Intel Granite Rapids server support")
Reported-by: Jose Jesus Ambriz Meza <jose.jesus.ambriz.meza@intel.com>
Reported-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
Closes: https://lore.kernel.org/all/20250730063155.2612379-1-acelan.kao@canonical.com/
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
Link: https://lore.kernel.org/r/20250806065707.3533345-1-qiuxu.zhuo@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fad988a215 ]
Already do real negotiation in smb_direct_handle_connect_request()
where we see the requested initiator_depth and responder_resources
from the client.
We should detect legacy iwarp clients using MPA v1
with the custom IRD/ORD negotiation.
We need to send the custom IRD/ORD in big endian,
but we need to try to let clients with broken requests
using little endian (older cifs.ko) to work.
Note the reason why this uses u8 for
initiator_depth and responder_resources is
that the rdma layer also uses it.
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Cc: linux-rdma@vger.kernel.org
Fixes: 0626e6641f ("cifsd: add server handler for central processing and tranport layers")
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a29fea30dd ]
Cast nr_pages to unsigned long to avoid overflow when handling large
AUX buffer sizes (>= 2 GiB).
Fixes: d5d9696b03 ("drivers/perf: Add support for ARMv8.2 Statistical Profiling Extension")
Signed-off-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 105f56877f ]
Cast nr_pages to unsigned long to avoid overflow when handling large
AUX buffer sizes (>= 2 GiB).
Fixes: 3fbf7f011f ("coresight: sink: Add TRBE driver")
Signed-off-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ba1afc94de ]
uprobe_warn() is passed a task structure, yet its using current. For
the most part this shouldn't matter, but since a task structure is
provided, lets use it.
Fixes: 248d3a7b2f ("uprobes: Change uprobe_copy_process() to dup return_instances")
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d9e46de4bf ]
Commit ac9f97ff8b ("powerpc/8xx: Inconditionally use task PGDIR in
DTLB misses") removed the test that needed the valeur in SPRN_EPN but
failed to remove the read.
Remove it.
And remove related comments, including the very same comment
in InstructionTLBMiss that should have been removed by
commit 33c527522f ("powerpc/8xx: Inconditionally use task PGDIR in
ITLB misses").
Also update the comment about absence of a second level table which
has been handled implicitely since commit 5ddb75cee5 ("powerpc/8xx:
remove tests on PGDIR entry validity").
Fixes: ac9f97ff8b ("powerpc/8xx: Inconditionally use task PGDIR in DTLB misses")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/5811c8d1d6187f280ad140d6c0ad6010e41eeaeb.1755361995.git.christophe.leroy@csgroup.eu
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 061df28b82 ]
Commit 865cc3e9cc ("gfs2: fix a deadlock on withdraw-during-mount")
added a statement to do_xmote() to clear the GLF_INVALIDATE_IN_PROGRESS
flag a second time after it has already been cleared. Fix that.
Fixes: 865cc3e9cc ("gfs2: fix a deadlock on withdraw-during-mount")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a679e5683d ]
Fix -Wunused-result warning generated when compiled with gcc 13.3.0,
by checking fread's return value and handling errors, preventing
potential failures when reading from stdin.
Fixes compiler warning:
warning: ignoring return value of 'fread' declared with attribute
'warn_unused_result' [-Wunused-result]
Fixes: 806a15b254 ("kselftests/arm64: add PAuth test for whether exec() changes keys")
Signed-off-by: Bala-Vignesh-Reddy <reddybalavignesh9979@gmail.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cce436aafc ]
Normally the tracee starts in SECCOMP_NOTIFY_INIT, sends an
event to the tracer, and starts to wait interruptibly. With
SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV, if the tracer receives the
message (SECCOMP_NOTIFY_SENT is reached) while the tracee was waiting
and is subsequently interrupted, the tracee begins to wait again
uninterruptibly (but killable).
This fails if SECCOMP_NOTIFY_REPLIED is reached before the tracee
is interrupted, as the check only considered SECCOMP_NOTIFY_SENT as a
condition to begin waiting again. In this case the tracee is interrupted
even though the tracer already acted on its behalf. This breaks the
assumption SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV wanted to ensure,
namely that the tracer can be sure the syscall is not interrupted or
restarted on the tracee after it is received on the tracer. Fix this
by also considering SECCOMP_NOTIFY_REPLIED when evaluating whether to
switch to uninterruptible waiting.
With the condition changed the loop in seccomp_do_user_notification()
would exit immediately after deciding that noninterruptible waiting
is required if the operation already reached SECCOMP_NOTIFY_REPLIED,
skipping the code that processes pending addfd commands first. Prevent
this by executing the remaining loop body one last time in this case.
Fixes: c2aa2dfef2 ("seccomp: Add wait_killable semantic to seccomp user notifier")
Reported-by: Ali Polatel <alip@chesswob.org>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220291
Signed-off-by: Johannes Nixdorf <johannes@nixdorf.dev>
Link: https://lore.kernel.org/r/20250725-seccomp-races-v2-1-cf8b9d139596@nixdorf.dev
Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7479260860 ]
INITRAMFS_PRESERVE_MTIME is only used in init/initramfs.c and
init/initramfs_test.c. Hence add a dependency on BLK_DEV_INITRD, to
prevent asking the user about this feature when configuring a kernel
without initramfs support.
Fixes: 1274aea127 ("initramfs: add INITRAMFS_PRESERVE_MTIME Kconfig option")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Upstream series https://lore.kernel.org/linux-media/20250917-csi-bgr-rgb-v3-0-0145571b3aa4@kernel.org/
The tc358743 is an HDMI to MIPI-CSI2 bridge. It can output all three
HDMI 1.4 video formats: RGB 4:4:4, YCbCr 4:2:2, and YCbCr 4:4:4.
RGB 4:4:4 is converted to the MIPI-CSI2 RGB888 video format, and listed
in the driver as MEDIA_BUS_FMT_RGB888_1X24.
Most CSI2 receiver drivers then map MEDIA_BUS_FMT_RGB888_1X24 to
V4L2_PIX_FMT_RGB24.
However, V4L2_PIX_FMT_RGB24 is defined as having its color components in
the R, G and B order, from left to right. MIPI-CSI2 however defines the
RGB888 format with blue first.
This essentially means that the R and B will be swapped compared to what
V4L2_PIX_FMT_RGB24 defines.
The proper MBUS format would be BGR888, so let's use that.
Fixes: d32d98642d ("[media] Driver for Toshiba TC358743 HDMI to CSI-2 bridge")
Signed-off-by: Maxime Ripard <mripard@kernel.org>
commit e750f85391 upstream.
When completing emulation of instruction that generated a userspace exit
for I/O, don't recheck L1 intercepts as KVM has already finished that
phase of instruction execution, i.e. has already committed to allowing L2
to perform I/O. If L1 (or host userspace) modifies the I/O permission
bitmaps during the exit to userspace, KVM will treat the access as being
intercepted despite already having emulated the I/O access.
Pivot on EMULTYPE_NO_DECODE to detect that KVM is completing emulation.
Of the three users of EMULTYPE_NO_DECODE, only complete_emulated_io() (the
intended "recipient") can reach the code in question. gp_interception()'s
use is mutually exclusive with is_guest_mode(), and
complete_emulated_insn_gp() unconditionally pairs EMULTYPE_NO_DECODE with
EMULTYPE_SKIP.
The bad behavior was detected by a syzkaller program that toggles port I/O
interception during the userspace I/O exit, ultimately resulting in a WARN
on vcpu->arch.pio.count being non-zero due to KVM no completing emulation
of the I/O instruction.
WARNING: CPU: 23 PID: 1083 at arch/x86/kvm/x86.c:8039 emulator_pio_in_out+0x154/0x170 [kvm]
Modules linked in: kvm_intel kvm irqbypass
CPU: 23 UID: 1000 PID: 1083 Comm: repro Not tainted 6.16.0-rc5-c1610d2d66b1-next-vm #74 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:emulator_pio_in_out+0x154/0x170 [kvm]
PKRU: 55555554
Call Trace:
<TASK>
kvm_fast_pio+0xd6/0x1d0 [kvm]
vmx_handle_exit+0x149/0x610 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0xda8/0x1ac0 [kvm]
kvm_vcpu_ioctl+0x244/0x8c0 [kvm]
__x64_sys_ioctl+0x8a/0xd0
do_syscall_64+0x5d/0xc60
entry_SYSCALL_64_after_hwframe+0x4b/0x53
</TASK>
Reported-by: syzbot+cc2032ba16cc2018ca25@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/68790db4.a00a0220.3af5df.0020.GAE@google.com
Fixes: 8a76d7f25f ("KVM: x86: Add x86 callback for intercept check")
Cc: stable@vger.kernel.org
Cc: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20250715190638.1899116-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 674b56aa57 upstream.
Syzkaller reports a KASAN issue as below:
general protection fault, probably for non-canonical address 0xfbd59c0000000021: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: maybe wild-memory-access in range [0xdead000000000108-0xdead00000000010f]
CPU: 0 PID: 5083 Comm: syz-executor.2 Not tainted 6.1.134-syzkaller-00037-g855bd1d7d838 #0
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
RIP: 0010:__list_del include/linux/list.h:114 [inline]
RIP: 0010:__list_del_entry include/linux/list.h:137 [inline]
RIP: 0010:list_del include/linux/list.h:148 [inline]
RIP: 0010:p9_fd_cancelled+0xe9/0x200 net/9p/trans_fd.c:734
Call Trace:
<TASK>
p9_client_flush+0x351/0x440 net/9p/client.c:614
p9_client_rpc+0xb6b/0xc70 net/9p/client.c:734
p9_client_version net/9p/client.c:920 [inline]
p9_client_create+0xb51/0x1240 net/9p/client.c:1027
v9fs_session_init+0x1f0/0x18f0 fs/9p/v9fs.c:408
v9fs_mount+0xba/0xcb0 fs/9p/vfs_super.c:126
legacy_get_tree+0x108/0x220 fs/fs_context.c:632
vfs_get_tree+0x8e/0x300 fs/super.c:1573
do_new_mount fs/namespace.c:3056 [inline]
path_mount+0x6a6/0x1e90 fs/namespace.c:3386
do_mount fs/namespace.c:3399 [inline]
__do_sys_mount fs/namespace.c:3607 [inline]
__se_sys_mount fs/namespace.c:3584 [inline]
__x64_sys_mount+0x283/0x300 fs/namespace.c:3584
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x35/0x80 arch/x86/entry/common.c:81
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
This happens because of a race condition between:
- The 9p client sending an invalid flush request and later cleaning it up;
- The 9p client in p9_read_work() canceled all pending requests.
Thread 1 Thread 2
...
p9_client_create()
...
p9_fd_create()
...
p9_conn_create()
...
// start Thread 2
INIT_WORK(&m->rq, p9_read_work);
p9_read_work()
...
p9_client_rpc()
...
...
p9_conn_cancel()
...
spin_lock(&m->req_lock);
...
p9_fd_cancelled()
...
...
spin_unlock(&m->req_lock);
// status rewrite
p9_client_cb(m->client, req, REQ_STATUS_ERROR)
// first remove
list_del(&req->req_list);
...
spin_lock(&m->req_lock)
...
// second remove
list_del(&req->req_list);
spin_unlock(&m->req_lock)
...
Commit 74d6a5d566 ("9p/trans_fd: Fix concurrency del of req_list in
p9_fd_cancelled/p9_read_work") fixes a concurrency issue in the 9p filesystem
client where the req_list could be deleted simultaneously by both
p9_read_work and p9_fd_cancelled functions, but for the case where req->status
equals REQ_STATUS_RCVD.
Update the check for req->status in p9_fd_cancelled to skip processing not
just received requests, but anything that is not SENT, as whatever
changed the state from SENT also removed the request from its list.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: afd8d65411 ("9P: Add cancelled() to the transport functions.")
Cc: stable@vger.kernel.org
Signed-off-by: Nalivayko Sergey <Sergey.Nalivayko@kaspersky.com>
Message-ID: <20250715154815.3501030-1-Sergey.Nalivayko@kaspersky.com>
[updated the check from status == RECV || status == ERROR to status != SENT]
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c2ce245341 upstream.
Devices with power.no_pm set are not expected to need any power
management at all, so modify device_set_pm_not_required() to set
power.no_callbacks for them too in case runtime PM will be enabled
for any of them (which in principle may be done for convenience if
such a device participates in a dependency chain).
Since device_set_pm_not_required() must be called before device_add()
or it would not have any effect, it can update power.no_callbacks
without locking, unlike pm_runtime_no_callbacks() that can be called
after registering the target device.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: stable <stable@kernel.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/1950054.tdWV9SEqCh@rafael.j.wysocki
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6d07bee10e upstream.
If copy_from_user() fails, write() currently returns -EFAULT, but any
partially written data leaves the TX FIFO in an inconsistent state.
Subsequent write() calls then fail with "transmit length mismatch"
errors.
Once partial data is written to the hardware FIFO, it cannot be removed
without a TX reset. Commit c6e8d85faf ("staging: axis-fifo: Remove
hardware resets for user errors") removed a full FIFO reset for this case,
which fixed a potential RX data loss, but introduced this TX issue.
Fix this by introducing a bounce buffer: copy the full packet from
userspace first, and write to the hardware FIFO only if the copy
was successful.
Fixes: c6e8d85faf ("staging: axis-fifo: Remove hardware resets for user errors")
Cc: stable@vger.kernel.org
Signed-off-by: Ovidiu Panait <ovidiu.panait.oss@gmail.com>
Link: https://lore.kernel.org/r/20250912101322.1282507-1-ovidiu.panait.oss@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 52ff2b840b upstream.
Since commit 2ca34b5087 ("staging: axis-fifo: Correct handling of
tx_fifo_depth for size validation"), write() operations with packets
larger than 'tx_fifo_depth - 4' words are no longer rejected with -EINVAL.
Fortunately, the packets are not actually getting transmitted to hardware,
otherwise they would be raising a 'Transmit Packet Overrun Error'
interrupt, which requires a reset of the TX circuit to recover from.
Instead, the request times out inside wait_event_interruptible_timeout()
and always returns -EAGAIN, since the wake up condition can never be true
for these packets. But still, they unnecessarily block other tasks from
writing to the FIFO and the EAGAIN return code signals userspace to retry
the write() call, even though it will always fail and time out.
According to the AXI4-Stream FIFO reference manual (PG080), the maximum
valid packet length is 'tx_fifo_depth - 4' words, so attempting to send
larger packets is invalid and should not be happening in the first place:
> The maximum packet that can be transmitted is limited by the size of
> the FIFO, which is (C_TX_FIFO_DEPTH–4)*(data interface width/8) bytes.
Therefore, bring back the old behavior and outright reject packets larger
than 'tx_fifo_depth - 4' with -EINVAL. Add a comment to explain why the
check is necessary. The dev_err() message was removed to avoid cluttering
the dmesg log if an invalid packet is received from userspace.
Fixes: 2ca34b5087 ("staging: axis-fifo: Correct handling of tx_fifo_depth for size validation")
Cc: stable@vger.kernel.org
Signed-off-by: Ovidiu Panait <ovidiu.panait.oss@gmail.com>
Link: https://lore.kernel.org/r/20250817171350.872105-1-ovidiu.panait.oss@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3ebcd3460c upstream.
A process might fail to allocate a new bitmap when trying to expand its
proc->dmap. In that case, dbitmap_grow() fails and frees the old bitmap
via dbitmap_free(). However, the driver calls dbitmap_free() again when
the same process terminates, leading to a double-free error:
==================================================================
BUG: KASAN: double-free in binder_proc_dec_tmpref+0x2e0/0x55c
Free of addr ffff00000b7c1420 by task kworker/9:1/209
CPU: 9 UID: 0 PID: 209 Comm: kworker/9:1 Not tainted 6.17.0-rc6-dirty #5 PREEMPT
Hardware name: linux,dummy-virt (DT)
Workqueue: events binder_deferred_func
Call trace:
kfree+0x164/0x31c
binder_proc_dec_tmpref+0x2e0/0x55c
binder_deferred_func+0xc24/0x1120
process_one_work+0x520/0xba4
[...]
Allocated by task 448:
__kmalloc_noprof+0x178/0x3c0
bitmap_zalloc+0x24/0x30
binder_open+0x14c/0xc10
[...]
Freed by task 449:
kfree+0x184/0x31c
binder_inc_ref_for_node+0xb44/0xe44
binder_transaction+0x29b4/0x7fbc
binder_thread_write+0x1708/0x442c
binder_ioctl+0x1b50/0x2900
[...]
==================================================================
Fix this issue by marking proc->map NULL in dbitmap_free().
Cc: stable@vger.kernel.org
Fixes: 15d9da3f81 ("binder: use bitmap for faster descriptor lookup")
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Tiffany Yang <ynaffit@google.com>
Link: https://lore.kernel.org/r/20250915221248.3470154-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9f2c0ac142 upstream.
The previous commit 0718a78f6a ("ALSA: usb-audio: Kill timer properly at
removal") patched a UAF issue caused by the error timer.
However, because the error timer kill added in this patch occurs after the
endpoint delete, a race condition to UAF still occurs, albeit rarely.
Additionally, since kill-cleanup for urb is also missing, freed memory can
be accessed in interrupt context related to urb, which can cause UAF.
Therefore, to prevent this, error timer and urb must be killed before
freeing the heap memory.
Cc: <stable@vger.kernel.org>
Reported-by: syzbot+f02665daa2abeef4a947@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f02665daa2abeef4a947
Fixes: 0718a78f6a ("ALSA: usb-audio: Kill timer properly at removal")
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 15d8c92f10 upstream.
MES fence_value will be updated in fence_addr if API success,
otherwise upper 32 bit will be used to indicate error code.
In any case, MES will trigger an EOP interrupt with 0xb1 as
context id in the interrupt cookie
Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6b69680847 ]
This issue is similar to the vulnerability in the `mcp251x` driver,
which was fixed in commit 03c427147b ("can: mcp251x: fix resume from
sleep before interface was brought up").
In the `hi311x` driver, when the device resumes from sleep, the driver
schedules `priv->restart_work`. However, if the network interface was
not previously enabled, the `priv->wq` (workqueue) is not allocated and
initialized, leading to a null pointer dereference.
To fix this, we move the allocation and initialization of the workqueue
from the `hi3110_open` function to the `hi3110_can_probe` function.
This ensures that the workqueue is properly initialized before it is
used during device resume. And added logic to destroy the workqueue
in the error handling paths of `hi3110_can_probe` and in the
`hi3110_can_remove` function to prevent resource leaks.
Signed-off-by: Chen Yufeng <chenyufeng@iie.ac.cn>
Link: https://patch.msgid.link/20250911150820.250-1-chenyufeng@iie.ac.cn
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a5edf3550f ]
When cross-compiling the perf tool for ARM64, `perf help` may crash
with the following assertion failure:
help.c:122: exclude_cmds: Assertion `cmds->names[ci] == NULL' failed.
This happens when the perf binary is not named exactly "perf" or when
multiple "perf-*" binaries exist in the same directory. In such cases,
the `excludes` command list can be empty, which leads to the final
assertion in exclude_cmds() being triggered.
Add a simple guard at the beginning of exclude_cmds() to return early
if excludes->cnt is zero, preventing the crash.
Signed-off-by: hupu <hupu.gm@gmail.com>
Reported-by: Guilherme Amadio <amadio@gentoo.org>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20250909094953.106706-1-amadio@gentoo.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 77b8e6fbf9 ]
MAX_TAG_SIZE was 0x1a8 and it may be truncated in the "bi->metadata_size
= ic->tag_size" assignment. We need to limit it to 255.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 208d7f788e upstream.
This `srctree/` link pointed to a file with an underscore, but the header
used a dash instead.
Thus fix it.
This cleans a future warning that will check our `srctree/` links.
Cc: stable@vger.kernel.org
Fixes: 3253aba340 ("rust: block: introduce `kernel::block::mq` module")
Reviewed-by: Daniel Almeida <daniel.almeida@collabora.com>
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 79d10f4f21 ]
The state->timer is a cyclic timer that schedules work_i2c_poll and
delayed_work_enable_hotplug, while rearming itself. Using timer_delete()
fails to guarantee the timer isn't still running when destroyed, similarly
cancel_delayed_work() cannot ensure delayed_work_enable_hotplug has
terminated if already executing. During probe failure after timer
initialization, these may continue running as orphans and reference the
already-freed tc358743_state object through tc358743_irq_poll_timer.
The following is the trace captured by KASAN.
BUG: KASAN: slab-use-after-free in __run_timer_base.part.0+0x7d7/0x8c0
Write of size 8 at addr ffff88800ded83c8 by task swapper/1/0
...
Call Trace:
<IRQ>
dump_stack_lvl+0x55/0x70
print_report+0xcf/0x610
? __pfx_sched_balance_find_src_group+0x10/0x10
? __run_timer_base.part.0+0x7d7/0x8c0
kasan_report+0xb8/0xf0
? __run_timer_base.part.0+0x7d7/0x8c0
__run_timer_base.part.0+0x7d7/0x8c0
? rcu_sched_clock_irq+0xb06/0x27d0
? __pfx___run_timer_base.part.0+0x10/0x10
? try_to_wake_up+0xb15/0x1960
? tmigr_update_events+0x280/0x740
? _raw_spin_lock_irq+0x80/0xe0
? __pfx__raw_spin_lock_irq+0x10/0x10
tmigr_handle_remote_up+0x603/0x7e0
? __pfx_tmigr_handle_remote_up+0x10/0x10
? sched_balance_trigger+0x98/0x9f0
? sched_tick+0x221/0x5a0
? _raw_spin_lock_irq+0x80/0xe0
? __pfx__raw_spin_lock_irq+0x10/0x10
? tick_nohz_handler+0x339/0x440
? __pfx_tmigr_handle_remote_up+0x10/0x10
__walk_groups.isra.0+0x42/0x150
tmigr_handle_remote+0x1f4/0x2e0
? __pfx_tmigr_handle_remote+0x10/0x10
? ktime_get+0x60/0x140
? lapic_next_event+0x11/0x20
? clockevents_program_event+0x1d4/0x2a0
? hrtimer_interrupt+0x322/0x780
handle_softirqs+0x16a/0x550
irq_exit_rcu+0xaf/0xe0
sysvec_apic_timer_interrupt+0x70/0x80
</IRQ>
...
Allocated by task 141:
kasan_save_stack+0x24/0x50
kasan_save_track+0x14/0x30
__kasan_kmalloc+0x7f/0x90
__kmalloc_node_track_caller_noprof+0x198/0x430
devm_kmalloc+0x7b/0x1e0
tc358743_probe+0xb7/0x610 i2c_device_probe+0x51d/0x880
really_probe+0x1ca/0x5c0
__driver_probe_device+0x248/0x310
driver_probe_device+0x44/0x120
__device_attach_driver+0x174/0x220
bus_for_each_drv+0x100/0x190
__device_attach+0x206/0x370
bus_probe_device+0x123/0x170
device_add+0xd25/0x1470
i2c_new_client_device+0x7a0/0xcd0
do_one_initcall+0x89/0x300
do_init_module+0x29d/0x7f0
load_module+0x4f48/0x69e0
init_module_from_file+0xe4/0x150
idempotent_init_module+0x320/0x670
__x64_sys_finit_module+0xbd/0x120
do_syscall_64+0xac/0x280
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Freed by task 141:
kasan_save_stack+0x24/0x50
kasan_save_track+0x14/0x30
kasan_save_free_info+0x3a/0x60
__kasan_slab_free+0x3f/0x50
kfree+0x137/0x370
release_nodes+0xa4/0x100
devres_release_group+0x1b2/0x380
i2c_device_probe+0x694/0x880
really_probe+0x1ca/0x5c0
__driver_probe_device+0x248/0x310
driver_probe_device+0x44/0x120
__device_attach_driver+0x174/0x220
bus_for_each_drv+0x100/0x190
__device_attach+0x206/0x370
bus_probe_device+0x123/0x170
device_add+0xd25/0x1470
i2c_new_client_device+0x7a0/0xcd0
do_one_initcall+0x89/0x300
do_init_module+0x29d/0x7f0
load_module+0x4f48/0x69e0
init_module_from_file+0xe4/0x150
idempotent_init_module+0x320/0x670
__x64_sys_finit_module+0xbd/0x120
do_syscall_64+0xac/0x280
entry_SYSCALL_64_after_hwframe+0x77/0x7f
...
Replace timer_delete() with timer_delete_sync() and cancel_delayed_work()
with cancel_delayed_work_sync() to ensure proper termination of timer and
work items before resource cleanup.
This bug was initially identified through static analysis. For reproduction
and testing, I created a functional emulation of the tc358743 device via a
kernel module and introduced faults through the debugfs interface.
Fixes: 869f38ae07 ("media: i2c: tc358743: Fix crash in the probe error path when using polling")
Fixes: d32d98642d ("[media] Driver for Toshiba TC358743 HDMI to CSI-2 bridge")
Cc: stable@vger.kernel.org
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
[ replaced del_timer() instead of timer_delete() ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3e31a6bc07 ]
There is a bug observed when rtw89_core_tx_kick_off_and_wait() tries to
access already freed skb_data:
BUG: KFENCE: use-after-free write in rtw89_core_tx_kick_off_and_wait drivers/net/wireless/realtek/rtw89/core.c:1110
CPU: 6 UID: 0 PID: 41377 Comm: kworker/u64:24 Not tainted 6.17.0-rc1+ #1 PREEMPT(lazy)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS edk2-20250523-14.fc42 05/23/2025
Workqueue: events_unbound cfg80211_wiphy_work [cfg80211]
Use-after-free write at 0x0000000020309d9d (in kfence-#251):
rtw89_core_tx_kick_off_and_wait drivers/net/wireless/realtek/rtw89/core.c:1110
rtw89_core_scan_complete drivers/net/wireless/realtek/rtw89/core.c:5338
rtw89_hw_scan_complete_cb drivers/net/wireless/realtek/rtw89/fw.c:7979
rtw89_chanctx_proceed_cb drivers/net/wireless/realtek/rtw89/chan.c:3165
rtw89_chanctx_proceed drivers/net/wireless/realtek/rtw89/chan.h:141
rtw89_hw_scan_complete drivers/net/wireless/realtek/rtw89/fw.c:8012
rtw89_mac_c2h_scanofld_rsp drivers/net/wireless/realtek/rtw89/mac.c:5059
rtw89_fw_c2h_work drivers/net/wireless/realtek/rtw89/fw.c:6758
process_one_work kernel/workqueue.c:3241
worker_thread kernel/workqueue.c:3400
kthread kernel/kthread.c:463
ret_from_fork arch/x86/kernel/process.c:154
ret_from_fork_asm arch/x86/entry/entry_64.S:258
kfence-#251: 0x0000000056e2393d-0x000000009943cb62, size=232, cache=skbuff_head_cache
allocated by task 41377 on cpu 6 at 77869.159548s (0.009551s ago):
__alloc_skb net/core/skbuff.c:659
__netdev_alloc_skb net/core/skbuff.c:734
ieee80211_nullfunc_get net/mac80211/tx.c:5844
rtw89_core_send_nullfunc drivers/net/wireless/realtek/rtw89/core.c:3431
rtw89_core_scan_complete drivers/net/wireless/realtek/rtw89/core.c:5338
rtw89_hw_scan_complete_cb drivers/net/wireless/realtek/rtw89/fw.c:7979
rtw89_chanctx_proceed_cb drivers/net/wireless/realtek/rtw89/chan.c:3165
rtw89_chanctx_proceed drivers/net/wireless/realtek/rtw89/chan.c:3194
rtw89_hw_scan_complete drivers/net/wireless/realtek/rtw89/fw.c:8012
rtw89_mac_c2h_scanofld_rsp drivers/net/wireless/realtek/rtw89/mac.c:5059
rtw89_fw_c2h_work drivers/net/wireless/realtek/rtw89/fw.c:6758
process_one_work kernel/workqueue.c:3241
worker_thread kernel/workqueue.c:3400
kthread kernel/kthread.c:463
ret_from_fork arch/x86/kernel/process.c:154
ret_from_fork_asm arch/x86/entry/entry_64.S:258
freed by task 1045 on cpu 9 at 77869.168393s (0.001557s ago):
ieee80211_tx_status_skb net/mac80211/status.c:1117
rtw89_pci_release_txwd_skb drivers/net/wireless/realtek/rtw89/pci.c:564
rtw89_pci_release_tx_skbs.isra.0 drivers/net/wireless/realtek/rtw89/pci.c:651
rtw89_pci_release_tx drivers/net/wireless/realtek/rtw89/pci.c:676
rtw89_pci_napi_poll drivers/net/wireless/realtek/rtw89/pci.c:4238
__napi_poll net/core/dev.c:7495
net_rx_action net/core/dev.c:7557 net/core/dev.c:7684
handle_softirqs kernel/softirq.c:580
do_softirq.part.0 kernel/softirq.c:480
__local_bh_enable_ip kernel/softirq.c:407
rtw89_pci_interrupt_threadfn drivers/net/wireless/realtek/rtw89/pci.c:927
irq_thread_fn kernel/irq/manage.c:1133
irq_thread kernel/irq/manage.c:1257
kthread kernel/kthread.c:463
ret_from_fork arch/x86/kernel/process.c:154
ret_from_fork_asm arch/x86/entry/entry_64.S:258
It is a consequence of a race between the waiting and the signaling side
of the completion:
Waiting thread Completing thread
rtw89_core_tx_kick_off_and_wait()
rcu_assign_pointer(skb_data->wait, wait)
/* start waiting */
wait_for_completion_timeout()
rtw89_pci_tx_status()
rtw89_core_tx_wait_complete()
rcu_read_lock()
/* signals completion and
* proceeds further
*/
complete(&wait->completion)
rcu_read_unlock()
...
/* frees skb_data */
ieee80211_tx_status_ni()
/* returns (exit status doesn't matter) */
wait_for_completion_timeout()
...
/* accesses the already freed skb_data */
rcu_assign_pointer(skb_data->wait, NULL)
The completing side might proceed and free the underlying skb even before
the waiting side is fully awoken and run to execution. Actually the race
happens regardless of wait_for_completion_timeout() exit status, e.g.
the waiting side may hit a timeout and the concurrent completing side is
still able to free the skb.
Skbs which are sent by rtw89_core_tx_kick_off_and_wait() are owned by the
driver. They don't come from core ieee80211 stack so no need to pass them
to ieee80211_tx_status_ni() on completing side.
Introduce a work function which will act as a garbage collector for
rtw89_tx_wait_info objects and the associated skbs. Thus no potentially
heavy locks are required on the completing side.
Found by Linux Verification Center (linuxtesting.org).
Fixes: 1ae5ca6152 ("wifi: rtw89: add function to wait for completion of TX skbs")
Cc: stable@vger.kernel.org
Suggested-by: Zong-Zhe Yang <kevin_yang@realtek.com>
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20250919210852.823912-2-pchelkin@ispras.ru
[ added wiphy variable declarations ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Output of raspinfo is now over 65,000 characters, which is more than
GitHub allows in a single form field!
Also adds Pi 500+ and CM0 to the list of models.
commit 1367da7eb8 upstream.
It is possible to hit a zero entry while traversing the vmas in unuse_mm()
called from swapoff path and accessing it causes the OOPS:
Unable to handle kernel NULL pointer dereference at virtual address
0000000000000446--> Loading the memory from offset 0x40 on the
XA_ZERO_ENTRY as address.
Mem abort info:
ESR = 0x0000000096000005
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x05: level 1 translation fault
The issue is manifested from the below race between the fork() on a
process and swapoff:
fork(dup_mmap()) swapoff(unuse_mm)
--------------- -----------------
1) Identical mtree is built using
__mt_dup().
2) copy_pte_range()-->
copy_nonpresent_pte():
The dst mm is added into the
mmlist to be visible to the
swapoff operation.
3) Fatal signal is sent to the parent
process(which is the current during the
fork) thus skip the duplication of the
vmas and mark the vma range with
XA_ZERO_ENTRY as a marker for this process
that helps during exit_mmap().
4) swapoff is tried on the
'mm' added to the 'mmlist' as
part of the 2.
5) unuse_mm(), that iterates
through the vma's of this 'mm'
will hit the non-NULL zero entry
and operating on this zero entry
as a vma is resulting into the
oops.
The proper fix would be around not exposing this partially-valid tree to
others when droping the mmap lock, which is being solved with [1]. A
simpler solution would be checking for MMF_UNSTABLE, as it is set if
mm_struct is not fully initialized in dup_mmap().
Thanks to Liam/Lorenzo/David for all the suggestions in fixing this
issue.
Link: https://lkml.kernel.org/r/20250924181138.1762750-1-charan.kalla@oss.qualcomm.com
Link: https://lore.kernel.org/all/20250815191031.3769540-1-Liam.Howlett@oracle.com/ [1]
Fixes: d240629148 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
Signed-off-by: Charan Teja Kalla <charan.kalla@oss.qualcomm.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Chris Li <chrisl@kernel.org>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0e2ee70291 upstream.
Per UVC 1.1+ specification 3.7.2, units and terminals must have a non-zero
unique ID.
```
Each Unit and Terminal within the video function is assigned a unique
identification number, the Unit ID (UID) or Terminal ID (TID), contained in
the bUnitID or bTerminalID field of the descriptor. The value 0x00 is
reserved for undefined ID,
```
If we add a new entity with id 0 or a duplicated ID, it will be marked
as UVC_INVALID_ENTITY_ID.
In a previous attempt commit 3dd075fe8e ("media: uvcvideo: Require
entities to have a non-zero unique ID"), we ignored all the invalid units,
this broke a lot of non-compatible cameras. Hopefully we are more lucky
this time.
This also prevents some syzkaller reproducers from triggering warnings due
to a chain of entities referring to themselves. In one particular case, an
Output Unit is connected to an Input Unit, both with the same ID of 1. But
when looking up for the source ID of the Output Unit, that same entity is
found instead of the input entity, which leads to such warnings.
In another case, a backward chain was considered finished as the source ID
was 0. Later on, that entity was found, but its pads were not valid.
Here is a sample stack trace for one of those cases.
[ 20.650953] usb 1-1: new high-speed USB device number 2 using dummy_hcd
[ 20.830206] usb 1-1: Using ep0 maxpacket: 8
[ 20.833501] usb 1-1: config 0 descriptor??
[ 21.038518] usb 1-1: string descriptor 0 read error: -71
[ 21.038893] usb 1-1: Found UVC 0.00 device <unnamed> (2833:0201)
[ 21.039299] uvcvideo 1-1:0.0: Entity type for entity Output 1 was not initialized!
[ 21.041583] uvcvideo 1-1:0.0: Entity type for entity Input 1 was not initialized!
[ 21.042218] ------------[ cut here ]------------
[ 21.042536] WARNING: CPU: 0 PID: 9 at drivers/media/mc/mc-entity.c:1147 media_create_pad_link+0x2c4/0x2e0
[ 21.043195] Modules linked in:
[ 21.043535] CPU: 0 UID: 0 PID: 9 Comm: kworker/0:1 Not tainted 6.11.0-rc7-00030-g3480e43aeccf #444
[ 21.044101] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
[ 21.044639] Workqueue: usb_hub_wq hub_event
[ 21.045100] RIP: 0010:media_create_pad_link+0x2c4/0x2e0
[ 21.045508] Code: fe e8 20 01 00 00 b8 f4 ff ff ff 48 83 c4 30 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc 0f 0b eb e9 0f 0b eb 0a 0f 0b eb 06 <0f> 0b eb 02 0f 0b b8 ea ff ff ff eb d4 66 2e 0f 1f 84 00 00 00 00
[ 21.046801] RSP: 0018:ffffc9000004b318 EFLAGS: 00010246
[ 21.047227] RAX: ffff888004e5d458 RBX: 0000000000000000 RCX: ffffffff818fccf1
[ 21.047719] RDX: 000000000000007b RSI: 0000000000000000 RDI: ffff888004313290
[ 21.048241] RBP: ffff888004313290 R08: 0001ffffffffffff R09: 0000000000000000
[ 21.048701] R10: 0000000000000013 R11: 0001888004313290 R12: 0000000000000003
[ 21.049138] R13: ffff888004313080 R14: ffff888004313080 R15: 0000000000000000
[ 21.049648] FS: 0000000000000000(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000
[ 21.050271] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 21.050688] CR2: 0000592cc27635b0 CR3: 000000000431c000 CR4: 0000000000750ef0
[ 21.051136] PKRU: 55555554
[ 21.051331] Call Trace:
[ 21.051480] <TASK>
[ 21.051611] ? __warn+0xc4/0x210
[ 21.051861] ? media_create_pad_link+0x2c4/0x2e0
[ 21.052252] ? report_bug+0x11b/0x1a0
[ 21.052540] ? trace_hardirqs_on+0x31/0x40
[ 21.052901] ? handle_bug+0x3d/0x70
[ 21.053197] ? exc_invalid_op+0x1a/0x50
[ 21.053511] ? asm_exc_invalid_op+0x1a/0x20
[ 21.053924] ? media_create_pad_link+0x91/0x2e0
[ 21.054364] ? media_create_pad_link+0x2c4/0x2e0
[ 21.054834] ? media_create_pad_link+0x91/0x2e0
[ 21.055131] ? _raw_spin_unlock+0x1e/0x40
[ 21.055441] ? __v4l2_device_register_subdev+0x202/0x210
[ 21.055837] uvc_mc_register_entities+0x358/0x400
[ 21.056144] uvc_register_chains+0x1fd/0x290
[ 21.056413] uvc_probe+0x380e/0x3dc0
[ 21.056676] ? __lock_acquire+0x5aa/0x26e0
[ 21.056946] ? find_held_lock+0x33/0xa0
[ 21.057196] ? kernfs_activate+0x70/0x80
[ 21.057533] ? usb_match_dynamic_id+0x1b/0x70
[ 21.057811] ? find_held_lock+0x33/0xa0
[ 21.058047] ? usb_match_dynamic_id+0x55/0x70
[ 21.058330] ? lock_release+0x124/0x260
[ 21.058657] ? usb_match_one_id_intf+0xa2/0x100
[ 21.058997] usb_probe_interface+0x1ba/0x330
[ 21.059399] really_probe+0x1ba/0x4c0
[ 21.059662] __driver_probe_device+0xb2/0x180
[ 21.059944] driver_probe_device+0x5a/0x100
[ 21.060170] __device_attach_driver+0xe9/0x160
[ 21.060427] ? __pfx___device_attach_driver+0x10/0x10
[ 21.060872] bus_for_each_drv+0xa9/0x100
[ 21.061312] __device_attach+0xed/0x190
[ 21.061812] device_initial_probe+0xe/0x20
[ 21.062229] bus_probe_device+0x4d/0xd0
[ 21.062590] device_add+0x308/0x590
[ 21.062912] usb_set_configuration+0x7b6/0xaf0
[ 21.063403] usb_generic_driver_probe+0x36/0x80
[ 21.063714] usb_probe_device+0x7b/0x130
[ 21.063936] really_probe+0x1ba/0x4c0
[ 21.064111] __driver_probe_device+0xb2/0x180
[ 21.064577] driver_probe_device+0x5a/0x100
[ 21.065019] __device_attach_driver+0xe9/0x160
[ 21.065403] ? __pfx___device_attach_driver+0x10/0x10
[ 21.065820] bus_for_each_drv+0xa9/0x100
[ 21.066094] __device_attach+0xed/0x190
[ 21.066535] device_initial_probe+0xe/0x20
[ 21.066992] bus_probe_device+0x4d/0xd0
[ 21.067250] device_add+0x308/0x590
[ 21.067501] usb_new_device+0x347/0x610
[ 21.067817] hub_event+0x156b/0x1e30
[ 21.068060] ? process_scheduled_works+0x48b/0xaf0
[ 21.068337] process_scheduled_works+0x5a3/0xaf0
[ 21.068668] worker_thread+0x3cf/0x560
[ 21.068932] ? kthread+0x109/0x1b0
[ 21.069133] kthread+0x197/0x1b0
[ 21.069343] ? __pfx_worker_thread+0x10/0x10
[ 21.069598] ? __pfx_kthread+0x10/0x10
[ 21.069908] ret_from_fork+0x32/0x40
[ 21.070169] ? __pfx_kthread+0x10/0x10
[ 21.070424] ret_from_fork_asm+0x1a/0x30
[ 21.070737] </TASK>
Reported-by: syzbot+0584f746fde3d52b4675@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=0584f746fde3d52b4675
Reported-by: syzbot+dd320d114deb3f5bb79b@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=dd320d114deb3f5bb79b
Reported-by: Youngjun Lee <yjjuny.lee@samsung.com>
Fixes: a3fbc2e6bb ("media: mc-entity.c: use WARN_ON, validate link pads")
Cc: stable@vger.kernel.org
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Co-developed-by: Ricardo Ribalda <ribalda@chromium.org>
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Hans de Goede <hansg@kernel.org>
Signed-off-by: Hans de Goede <hansg@kernel.org>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fa0f61cc1d upstream.
Syzbot reports a KASAN issue as below:
BUG: KASAN: use-after-free in __create_pipe include/linux/usb.h:1945 [inline]
BUG: KASAN: use-after-free in send_packet+0xa2d/0xbc0 drivers/media/rc/imon.c:627
Read of size 4 at addr ffff8880256fb000 by task syz-executor314/4465
CPU: 2 PID: 4465 Comm: syz-executor314 Not tainted 6.0.0-rc1-syzkaller #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x6e9 mm/kasan/report.c:433
kasan_report+0xb1/0x1e0 mm/kasan/report.c:495
__create_pipe include/linux/usb.h:1945 [inline]
send_packet+0xa2d/0xbc0 drivers/media/rc/imon.c:627
vfd_write+0x2d9/0x550 drivers/media/rc/imon.c:991
vfs_write+0x2d7/0xdd0 fs/read_write.c:576
ksys_write+0x127/0x250 fs/read_write.c:631
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
The iMON driver improperly releases the usb_device reference in
imon_disconnect without coordinating with active users of the
device.
Specifically, the fields usbdev_intf0 and usbdev_intf1 are not
protected by the users counter (ictx->users). During probe,
imon_init_intf0 or imon_init_intf1 increments the usb_device
reference count depending on the interface. However, during
disconnect, usb_put_dev is called unconditionally, regardless of
actual usage.
As a result, if vfd_write or other operations are still in
progress after disconnect, this can lead to a use-after-free of
the usb_device pointer.
Thread 1 vfd_write Thread 2 imon_disconnect
...
if
usb_put_dev(ictx->usbdev_intf0)
else
usb_put_dev(ictx->usbdev_intf1)
...
while
send_packet
if
pipe = usb_sndintpipe(
ictx->usbdev_intf0) UAF
else
pipe = usb_sndctrlpipe(
ictx->usbdev_intf0, 0) UAF
Guard access to usbdev_intf0 and usbdev_intf1 after disconnect by
checking ictx->disconnected in all writer paths. Add early return
with -ENODEV in send_packet(), vfd_write(), lcd_write() and
display_open() if the device is no longer present.
Set and read ictx->disconnected under ictx->lock to ensure memory
synchronization. Acquire the lock in imon_disconnect() before setting
the flag to synchronize with any ongoing operations.
Ensure writers exit early and safely after disconnect before the USB
core proceeds with cleanup.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Reported-by: syzbot+f1a69784f6efe748c3bf@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f1a69784f6efe748c3bf
Fixes: 21677cfc56 ("V4L/DVB: ir-core: add imon driver")
Cc: stable@vger.kernel.org
Signed-off-by: Larshin Sergey <Sergey.Larshin@kaspersky.com>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 40b7a19f32 upstream.
The original code uses cancel_delayed_work() in xc5000_release(), which
does not guarantee that the delayed work item timer_sleep has fully
completed if it was already running. This leads to use-after-free scenarios
where xc5000_release() may free the xc5000_priv while timer_sleep is still
active and attempts to dereference the xc5000_priv.
A typical race condition is illustrated below:
CPU 0 (release thread) | CPU 1 (delayed work callback)
xc5000_release() | xc5000_do_timer_sleep()
cancel_delayed_work() |
hybrid_tuner_release_state(priv) |
kfree(priv) |
| priv = container_of() // UAF
Replace cancel_delayed_work() with cancel_delayed_work_sync() to ensure
that the timer_sleep is properly canceled before the xc5000_priv memory
is deallocated.
A deadlock concern was considered: xc5000_release() is called in a process
context and is not holding any locks that the timer_sleep work item might
also need. Therefore, the use of the _sync() variant is safe here.
This bug was initially identified through static analysis.
Fixes: f7a27ff1fb ("[media] xc5000: delay tuner sleep to 5 seconds")
Cc: stable@vger.kernel.org
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
[hverkuil: fix typo in Subject: tunner -> tuner]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 01e03fb7db upstream.
The original code uses cancel_delayed_work() in flexcop_pci_remove(), which
does not guarantee that the delayed work item irq_check_work has fully
completed if it was already running. This leads to use-after-free scenarios
where flexcop_pci_remove() may free the flexcop_device while irq_check_work
is still active and attempts to dereference the device.
A typical race condition is illustrated below:
CPU 0 (remove) | CPU 1 (delayed work callback)
flexcop_pci_remove() | flexcop_pci_irq_check_work()
cancel_delayed_work() |
flexcop_device_kfree(fc_pci->fc_dev) |
| fc = fc_pci->fc_dev; // UAF
This is confirmed by a KASAN report:
==================================================================
BUG: KASAN: slab-use-after-free in __run_timer_base.part.0+0x7d7/0x8c0
Write of size 8 at addr ffff8880093aa8c8 by task bash/135
...
Call Trace:
<IRQ>
dump_stack_lvl+0x55/0x70
print_report+0xcf/0x610
? __run_timer_base.part.0+0x7d7/0x8c0
kasan_report+0xb8/0xf0
? __run_timer_base.part.0+0x7d7/0x8c0
__run_timer_base.part.0+0x7d7/0x8c0
? __pfx___run_timer_base.part.0+0x10/0x10
? __pfx_read_tsc+0x10/0x10
? ktime_get+0x60/0x140
? lapic_next_event+0x11/0x20
? clockevents_program_event+0x1d4/0x2a0
run_timer_softirq+0xd1/0x190
handle_softirqs+0x16a/0x550
irq_exit_rcu+0xaf/0xe0
sysvec_apic_timer_interrupt+0x70/0x80
</IRQ>
...
Allocated by task 1:
kasan_save_stack+0x24/0x50
kasan_save_track+0x14/0x30
__kasan_kmalloc+0x7f/0x90
__kmalloc_noprof+0x1be/0x460
flexcop_device_kmalloc+0x54/0xe0
flexcop_pci_probe+0x1f/0x9d0
local_pci_probe+0xdc/0x190
pci_device_probe+0x2fe/0x470
really_probe+0x1ca/0x5c0
__driver_probe_device+0x248/0x310
driver_probe_device+0x44/0x120
__driver_attach+0xd2/0x310
bus_for_each_dev+0xed/0x170
bus_add_driver+0x208/0x500
driver_register+0x132/0x460
do_one_initcall+0x89/0x300
kernel_init_freeable+0x40d/0x720
kernel_init+0x1a/0x150
ret_from_fork+0x10c/0x1a0
ret_from_fork_asm+0x1a/0x30
Freed by task 135:
kasan_save_stack+0x24/0x50
kasan_save_track+0x14/0x30
kasan_save_free_info+0x3a/0x60
__kasan_slab_free+0x3f/0x50
kfree+0x137/0x370
flexcop_device_kfree+0x32/0x50
pci_device_remove+0xa6/0x1d0
device_release_driver_internal+0xf8/0x210
pci_stop_bus_device+0x105/0x150
pci_stop_and_remove_bus_device_locked+0x15/0x30
remove_store+0xcc/0xe0
kernfs_fop_write_iter+0x2c3/0x440
vfs_write+0x871/0xd70
ksys_write+0xee/0x1c0
do_syscall_64+0xac/0x280
entry_SYSCALL_64_after_hwframe+0x77/0x7f
...
Replace cancel_delayed_work() with cancel_delayed_work_sync() to ensure
that the delayed work item is properly canceled and any executing delayed
work has finished before the device memory is deallocated.
This bug was initially identified through static analysis. To reproduce
and test it, I simulated the B2C2 FlexCop PCI device in QEMU and introduced
artificial delays within the flexcop_pci_irq_check_work() function to
increase the likelihood of triggering the bug.
Fixes: 382c5546d6 ("V4L/DVB (10694): [PATCH] software IRQ watchdog for Flexcop B2C2 DVB PCI cards")
Cc: stable@vger.kernel.org
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 27e06650a5 upstream.
A buffer overflow arises from the usage of snprintf to write into the
buffer "buf" in target_lu_gp_members_show function located in
/drivers/target/target_core_configfs.c. This buffer is allocated with
size LU_GROUP_NAME_BUF (256 bytes).
snprintf(...) formats multiple strings into buf with the HBA name
(hba->hba_group.cg_item), a slash character, a devicename (dev->
dev_group.cg_item) and a newline character, the total formatted string
length may exceed the buffer size of 256 bytes.
Since snprintf() returns the total number of bytes that would have been
written (the length of %s/%sn ), this value may exceed the buffer length
(256 bytes) passed to memcpy(), this will ultimately cause function
memcpy reporting a buffer overflow error.
An additional check of the return value of snprintf() can avoid this
buffer overflow.
Reported-by: Wang Haoran <haoranwangsec@gmail.com>
Reported-by: ziiiro <yuanmingbuaa@gmail.com>
Signed-off-by: Wang Haoran <haoranwangsec@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Loading a large (~2.1G) files with kexec crashes the host with when
running:
# kexec --load kernel --initrd initrd_with_2G_or_more
UBSAN: signed-integer-overflow in ./include/crypto/sha256_base.h:64:19
34152083 * 64 cannot be represented in type 'int'
...
BUG: unable to handle page fault for address: ff9fffff83b624c0
sha256_update (lib/crypto/sha256.c:137)
crypto_sha256_update (crypto/sha256_generic.c:40)
kexec_calculate_store_digests (kernel/kexec_file.c:769)
__se_sys_kexec_file_load (kernel/kexec_file.c:397 kernel/kexec_file.c:332)
...
(Line numbers based on commit da274362a7 ("Linux 6.12.49")
This started happening after commit f4da7afe07
("kexec_file: increase maximum file size to 4G") that landed in v6.0,
which increased the file size for kexec.
This is not happening upstream (v6.16+), given that `block` type was
upgraded from "int" to "size_t" in commit 74a43a2cf5 ("crypto:
lib/sha256 - Move partial block handling out")
Upgrade the block type similar to the commit above, avoiding hitting the
overflow.
This patch is only suitable for the stable tree, and before 6.16, which
got commit 74a43a2cf5 ("crypto: lib/sha256 - Move partial block
handling out"). This is not required before f4da7afe07 ("kexec_file:
increase maximum file size to 4G"). In other words, this fix is required
between versions v6.0 and v6.16.
Signed-off-by: Breno Leitao <leitao@debian.org>
Fixes: f4da7afe07 ("kexec_file: increase maximum file size to 4G") # Before v6.16
Reported-by: Michael van der Westhuizen <rmikey@meta.com>
Reported-by: Tobias Fleig <tfleig@meta.com>
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6f71507415 upstream.
The scale() functions detects invalid parameters, but continues
its calculations anyway. This causes bad results if negative values
are used for unsigned operations. Worst case, a division by 0 error
will be seen if source_min == source_max.
On top of that, after v6.13, the sequence of WARN_ON() followed by clamp()
may result in a build error with gcc 13.x.
drivers/gpu/drm/i915/display/intel_backlight.c: In function 'scale':
include/linux/compiler_types.h:542:45: error:
call to '__compiletime_assert_415' declared with attribute error:
clamp() low limit source_min greater than high limit source_max
This happens if the compiler decides to rearrange the code as follows.
if (source_min > source_max) {
WARN(..);
/* Do the clamp() knowing that source_min > source_max */
source_val = clamp(source_val, source_min, source_max);
} else {
/* Do the clamp knowing that source_min <= source_max */
source_val = clamp(source_val, source_min, source_max);
}
Fix the problem by evaluating the return values from WARN_ON and returning
immediately after a warning. While at it, fix divide by zero error seen
if source_min == source_max.
Analyzed-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: David Laight <david.laight.linux@gmail.com>
Cc: David Laight <david.laight.linux@gmail.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250121145203.2851237-1-linux@roeck-us.net
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Revert commit 9b28ef1e4c which is upstream
commit e1b0fa8639
It causes regression in 6.12.49 stable, no issues in upstream.
Commit 9b28ef1e4c ("usb: xhci: remove option to change a default
ring's TRB cycle bit") introduced a regression in 6.12.49 stable kernel.
The original commit was never intended for stable kernels, but was added
as a dependency for commit a5c98e8b13 ("xhci: dbc: Fix full DbC
transfer ring after several reconnects").
Since this commit is more of an optimization, revert it and solve the
dependecy by modifying one line in xhci_dbc_ring_init(). Specifically,
commit a5c98e8b13 ("xhci: dbc: Fix full DbC transfer ring after
several reconnects") moved function call xhci_initialize_ring_info()
into a separate function. To resolve the dependency, the arguments for
this function call are also reverted.
Closes: https://lore.kernel.org/stable/01b8c8de46251cfaad1329a46b7e3738@stwm.de/
Tested-by: Wolfgang Walter <linux@stwm.de>
Cc: stable@vger.kernel.org # v6.12.49
Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
Acked-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4e034bf045 ]
fput() doesn't actually call file_operations release() synchronously, it
puts the file on a work queue and it will be released eventually.
This is normally fine, except for iommufd the file and the iommufd_object
are tied to gether. The file has the object as it's private_data and holds
a users refcount, while the object is expected to remain alive as long as
the file is.
When the allocation of a new object aborts before installing the file it
will fput() the file and then go on to immediately kfree() the obj. This
causes a UAF once the workqueue completes the fput() and tries to
decrement the users refcount.
Fix this by putting the core code in charge of the file lifetime, and call
__fput_sync() during abort to ensure that release() is called before
kfree. __fput_sync() is a bit too tricky to open code in all the object
implementations. Instead the objects tell the core code where the file
pointer is and the core will take care of the life cycle.
If the object is successfully allocated then the file will hold a users
refcount and the iommufd_object cannot be destroyed.
It is worth noting that close(); ioctl(IOMMU_DESTROY); doesn't have an
issue because close() is already using a synchronous version of fput().
The UAF looks like this:
BUG: KASAN: slab-use-after-free in iommufd_eventq_fops_release+0x45/0xc0 drivers/iommu/iommufd/eventq.c:376
Write of size 4 at addr ffff888059c97804 by task syz.0.46/6164
CPU: 0 UID: 0 PID: 6164 Comm: syz.0.46 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/18/2025
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0xcd/0x630 mm/kasan/report.c:482
kasan_report+0xe0/0x110 mm/kasan/report.c:595
check_region_inline mm/kasan/generic.c:183 [inline]
kasan_check_range+0x100/0x1b0 mm/kasan/generic.c:189
instrument_atomic_read_write include/linux/instrumented.h:96 [inline]
atomic_fetch_sub_release include/linux/atomic/atomic-instrumented.h:400 [inline]
__refcount_dec include/linux/refcount.h:455 [inline]
refcount_dec include/linux/refcount.h:476 [inline]
iommufd_eventq_fops_release+0x45/0xc0 drivers/iommu/iommufd/eventq.c:376
__fput+0x402/0xb70 fs/file_table.c:468
task_work_run+0x14d/0x240 kernel/task_work.c:227
resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
exit_to_user_mode_loop+0xeb/0x110 kernel/entry/common.c:43
exit_to_user_mode_prepare include/linux/irq-entry-common.h:225 [inline]
syscall_exit_to_user_mode_work include/linux/entry-common.h:175 [inline]
syscall_exit_to_user_mode include/linux/entry-common.h:210 [inline]
do_syscall_64+0x41c/0x4c0 arch/x86/entry/syscall_64.c:100
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Link: https://patch.msgid.link/r/1-v1-02cd136829df+31-iommufd_syz_fput_jgg@nvidia.com
Cc: stable@vger.kernel.org
Fixes: 07838f7fd5 ("iommufd: Add iommufd fault object")
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Nirmoy Das <nirmoyd@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Reported-by: syzbot+80620e2d0d0a33b09f93@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/r/68c8583d.050a0220.2ff435.03a2.GAE@google.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[ applied eventq.c changes to fault.c, drop veventq bits ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7b7387650d upstream.
Migration may be raced with fallocating hole. remove_inode_single_folio
will unmap the folio if the folio is still mapped. However, it's called
without folio lock. If the folio is migrated and the mapped pte has been
converted to migration entry, folio_mapped() returns false, and won't
unmap it. Due to extra refcount held by remove_inode_single_folio,
migration fails, restores migration entry to normal pte, and the folio is
mapped again. As a result, we triggered BUG in filemap_unaccount_folio.
The log is as follows:
BUG: Bad page cache in process hugetlb pfn:156c00
page: refcount:515 mapcount:0 mapping:0000000099fef6e1 index:0x0 pfn:0x156c00
head: order:9 mapcount:1 entire_mapcount:1 nr_pages_mapped:0 pincount:0
aops:hugetlbfs_aops ino:dcc dentry name(?):"my_hugepage_file"
flags: 0x17ffffc00000c1(locked|waiters|head|node=0|zone=2|lastcpupid=0x1fffff)
page_type: f4(hugetlb)
page dumped because: still mapped when deleted
CPU: 1 UID: 0 PID: 395 Comm: hugetlb Not tainted 6.17.0-rc5-00044-g7aac71907bde-dirty #484 NONE
Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
Call Trace:
<TASK>
dump_stack_lvl+0x4f/0x70
filemap_unaccount_folio+0xc4/0x1c0
__filemap_remove_folio+0x38/0x1c0
filemap_remove_folio+0x41/0xd0
remove_inode_hugepages+0x142/0x250
hugetlbfs_fallocate+0x471/0x5a0
vfs_fallocate+0x149/0x380
Hold folio lock before checking if the folio is mapped to avold race with
migration.
Link: https://lkml.kernel.org/r/20250912074139.3575005-1-tujinjiang@huawei.com
Fixes: 4aae8d1c05 ("mm/hugetlbfs: unmap pages if page fault raced with hole punch")
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 85e1ff6106 upstream.
Running sha224_kunit on a KMSAN-enabled kernel results in a crash in
kmsan_internal_set_shadow_origin():
BUG: unable to handle page fault for address: ffffbc3840291000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 1810067 P4D 1810067 PUD 192d067 PMD 3c17067 PTE 0
Oops: 0000 [#1] SMP NOPTI
CPU: 0 UID: 0 PID: 81 Comm: kunit_try_catch Tainted: G N 6.17.0-rc3 #10 PREEMPT(voluntary)
Tainted: [N]=TEST
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
RIP: 0010:kmsan_internal_set_shadow_origin+0x91/0x100
[...]
Call Trace:
<TASK>
__msan_memset+0xee/0x1a0
sha224_final+0x9e/0x350
test_hash_buffer_overruns+0x46f/0x5f0
? kmsan_get_shadow_origin_ptr+0x46/0xa0
? __pfx_test_hash_buffer_overruns+0x10/0x10
kunit_try_run_case+0x198/0xa00
This occurs when memset() is called on a buffer that is not 4-byte aligned
and extends to the end of a guard page, i.e. the next page is unmapped.
The bug is that the loop at the end of kmsan_internal_set_shadow_origin()
accesses the wrong shadow memory bytes when the address is not 4-byte
aligned. Since each 4 bytes are associated with an origin, it rounds the
address and size so that it can access all the origins that contain the
buffer. However, when it checks the corresponding shadow bytes for a
particular origin, it incorrectly uses the original unrounded shadow
address. This results in reads from shadow memory beyond the end of the
buffer's shadow memory, which crashes when that memory is not mapped.
To fix this, correctly align the shadow address before accessing the 4
shadow bytes corresponding to each origin.
Link: https://lkml.kernel.org/r/20250911195858.394235-1-ebiggers@kernel.org
Fixes: 2ef3cec44c ("kmsan: do not wipe out origin when doing partial unpoisoning")
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Tested-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c6ccc4dde1 upstream.
When a software-node gets added to a device which already has another
fwnode as primary node it will become the secondary fwnode for that
device.
Currently if a software-node with GPIO properties ends up as the secondary
fwnode then gpiod_find_by_fwnode() will fail to find the GPIOs.
Add a new gpiod_fwnode_lookup() helper which falls back to calling
gpiod_find_by_fwnode() with the secondary fwnode if the GPIO was not
found in the primary fwnode.
Fixes: e7f9ff5dc9 ("gpiolib: add support for software nodes")
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hansg@kernel.org>
Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Link: https://lore.kernel.org/r/20250920200955.20403-1-hansg@kernel.org
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 28aa29986d upstream.
When the PAGEMAP_SCAN ioctl is invoked with vec_len = 0 reaches
pagemap_scan_backout_range(), kernel panics with null-ptr-deref:
[ 44.936808] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
[ 44.937797] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
[ 44.938391] CPU: 1 UID: 0 PID: 2480 Comm: reproducer Not tainted 6.17.0-rc6 #22 PREEMPT(none)
[ 44.939062] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 44.939935] RIP: 0010:pagemap_scan_thp_entry.isra.0+0x741/0xa80
<snip registers, unreliable trace>
[ 44.946828] Call Trace:
[ 44.947030] <TASK>
[ 44.949219] pagemap_scan_pmd_entry+0xec/0xfa0
[ 44.952593] walk_pmd_range.isra.0+0x302/0x910
[ 44.954069] walk_pud_range.isra.0+0x419/0x790
[ 44.954427] walk_p4d_range+0x41e/0x620
[ 44.954743] walk_pgd_range+0x31e/0x630
[ 44.955057] __walk_page_range+0x160/0x670
[ 44.956883] walk_page_range_mm+0x408/0x980
[ 44.958677] walk_page_range+0x66/0x90
[ 44.958984] do_pagemap_scan+0x28d/0x9c0
[ 44.961833] do_pagemap_cmd+0x59/0x80
[ 44.962484] __x64_sys_ioctl+0x18d/0x210
[ 44.962804] do_syscall_64+0x5b/0x290
[ 44.963111] entry_SYSCALL_64_after_hwframe+0x76/0x7e
vec_len = 0 in pagemap_scan_init_bounce_buffer() means no buffers are
allocated and p->vec_buf remains set to NULL.
This breaks an assumption made later in pagemap_scan_backout_range(), that
page_region is always allocated for p->vec_buf_index.
Fix it by explicitly checking p->vec_buf for NULL before dereferencing.
Other sites that might run into same deref-issue are already (directly or
transitively) protected by checking p->vec_buf.
Note:
From PAGEMAP_SCAN man page, it seems vec_len = 0 is valid when no output
is requested and it's only the side effects caller is interested in,
hence it passes check in pagemap_scan_get_args().
This issue was found by syzkaller.
Link: https://lkml.kernel.org/r/20250922082206.6889-1-acsjakub@amazon.de
Fixes: 52526ca7fd ("fs/proc/task_mmu: implement IOCTL to get and optionally clear info about PTEs")
Signed-off-by: Jakub Acs <acsjakub@amazon.de>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Jinjiang Tu <tujinjiang@huawei.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Penglei Jiang <superman.xpt@gmail.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 794a066688 upstream.
The mvebu-comphy driver does not currently know how to pass correct
lane-count to ATF while configuring the serdes lanes.
This causes the system to hard reset during reconfiguration, if a pci
card is present and has established a link during bootloader.
Remove the comphy handles from the respective pci nodes to avoid runtime
reconfiguration, relying solely on bootloader configuration - while
avoiding the hard reset.
When bootloader has configured the lanes correctly, the pci ports are
functional under Linux.
This issue may be addressed in the comphy driver at a future point.
Fixes: e9ff907f40 ("arm64: dts: add description for solidrun cn9132 cex7 module and clearfog board")
Cc: stable@vger.kernel.org
Signed-off-by: Josua Mayer <josua@solid-run.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 48b51799a5 upstream.
Similar to MacchiatoBIN the high-speed modes are unstable on the CN9132
CEX-7 module, leading to failed transactions under normal use.
Disable all high-speed modes including UHS.
Additionally add no-sdio and non-removable properties as appropriate for
eMMC.
Fixes: e9ff907f40 ("arm64: dts: add description for solidrun cn9132 cex7 module and clearfog board")
Cc: stable@vger.kernel.org
Signed-off-by: Josua Mayer <josua@solid-run.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ea9da67e2a upstream.
On SoCFPGA/Sodia board, mdio bus cannot be probed, so the PHY cannot be
found and the network device does not work.
```
stmmaceth ff702000.ethernet eth0: __stmmac_open: Cannot attach to PHY (error: -19)
```
To probe the mdio bus, add "snps,dwmac-mdio" as compatible string of the
mdio bus. Also the PHY address connected to this board is 4. Therefore,
change to 4.
Cc: stable@vger.kernel.org # 6.3+
Signed-off-by: Nobuhiro Iwamatsu <iwamatsu@nigauri.org>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d0ca0df179 upstream.
Commit 1b34cbbf4f ("crypto: af_alg - Disallow concurrent writes in
af_alg_sendmsg") changed some fields from bool to 1-bit bitfields of
type u32.
However, some assignments to these fields, specifically 'more' and
'merge', assign values greater than 1. These relied on C's implicit
conversion to bool, such that zero becomes false and nonzero becomes
true.
With a 1-bit bitfields of type u32 instead, mod 2 of the value is taken
instead, resulting in 0 being assigned in some cases when 1 was intended.
Fix this by restoring the bool type.
Fixes: 1b34cbbf4f ("crypto: af_alg - Disallow concurrent writes in af_alg_sendmsg")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b99dd77076 upstream.
When adding new VM MAC, driver checks only *active* filters in
vsi->mac_filter_hash. Each MAC, even in non-active state is using resources.
To determine number of MACs VM uses, count VSI filters in *any* state.
Add i40e_count_all_filters() to simply count all filters, and rename
i40e_count_filters() to i40e_count_active_filters() to avoid ambiguity.
Fixes: cfb1d572c9 ("i40e: Add ensurance of MacVlan resources for every trusted VF")
Cc: stable@vger.kernel.org
Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 55d225670d upstream.
The `ring_len` parameter provided by the virtual function (VF)
is assigned directly to the hardware memory context (HMC) without
any validation.
To address this, introduce an upper boundary check for both Tx and Rx
queue lengths. The maximum number of descriptors supported by the
hardware is 8k-32.
Additionally, enforce alignment constraints: Tx rings must be a multiple
of 8, and Rx rings must be a multiple of 32.
Fixes: 5c3c48ac6b ("i40e: implement virtual device interface")
Cc: stable@vger.kernel.org
Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 831f70a5b9 upstream.
Add support for missing hotkey keycodes affecting Asus PX13 and PX16 families
so userspace can use them.
Signed-off-by: Amit Chaudhari <amitchaudhari@mac.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit fbe2dc6a9c ]
In smb2_compound_op(), the loop that processes each command's response
uses wrong indices when accessing response bufferes.
This incorrect indexing leads to improper handling of command results.
Also, if incorrectly computed index is greather than or equal to
MAX_COMPOUND, it can cause out-of-bounds accesses.
Fixes: 3681c74d34 ("smb: client: handle lack of EA support in smb2_query_path_info()") # 6.14
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Sang-Heon Jeon <ekffu200098@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3ed17349f1 ]
When WMAB is called to set the fan mode, the new mode is read from either
bits 0-1 or bits 4-5 (depending on the value of some other EC register).
Thus when WMAB is called with bits 4-5 zeroed and called again with
bits 0-1 zeroed, the second call undoes the effect of the first call.
This causes writes to /sys/devices/platform/lg-laptop/fan_mode to have
no effect (and causes reads to always report a status of zero).
Fix this by calling WMAB once, with the mode set in bits 0,1 and 4,5.
When the fan mode is returned from WMAB it always has this form, so
there is no need to preserve the other bits. As a bonus, the driver
now supports the "Performance" fan mode seen in the LG-provided Windows
control app, which provides less aggressive CPU throttling but louder
fan noise and shorter battery life.
Also, correct the documentation to reflect that 0 corresponds to the
default mode (what the Windows app calls "Optimal") and 1 corresponds
to the silent mode.
Fixes: dbf0c5a6b1 ("platform/x86: Add LG Gram laptop special features driver")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=204913#c4
Signed-off-by: Daniel Lee <dany97@live.ca>
Link: https://patch.msgid.link/MN2PR06MB55989CB10E91C8DA00EE868DDC1CA@MN2PR06MB5598.namprd06.prod.outlook.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7d9c3442b0 ]
Commit de85488138 ("drm/panthor: Add the scheduler logical block")
handled destruction of a group's queues' drm scheduler entities early
into the group destruction procedure.
However, that races with the group submit ioctl, because by the time
entities are destroyed (through the group destroy ioctl), the submission
procedure might've already obtained a group handle, and therefore the
ability to push jobs into entities. This is met with a DRM error message
within the drm scheduler core as a situation that should never occur.
Fix by deferring drm scheduler entity destruction to queue release time.
Fixes: de85488138 ("drm/panthor: Add the scheduler logical block")
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Link: https://lore.kernel.org/r/20250919164436.531930-1-adrian.larumbe@collabora.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b549113738 ]
syzbot managed to trigger the following race:
T1 T2
futex_wait_requeue_pi()
futex_do_wait()
schedule()
futex_requeue()
futex_proxy_trylock_atomic()
futex_requeue_pi_prepare()
requeue_pi_wake_futex()
futex_requeue_pi_complete()
/* preempt */
* timeout/ signal wakes T1 *
futex_requeue_pi_wakeup_sync() // Q_REQUEUE_PI_LOCKED
futex_hash_put()
// back to userland, on stack futex_q is garbage
/* back */
wake_up_state(q->task, TASK_NORMAL);
In this scenario futex_wait_requeue_pi() is able to leave without using
futex_q::lock_ptr for synchronization.
This can be prevented by reading futex_q::task before updating the
futex_q::requeue_state. A reference on the task_struct is not needed
because requeue_pi_wake_futex() is invoked with a spinlock_t held which
implies a RCU read section.
Even if T1 terminates immediately after, the task_struct will remain valid
during T2's wake_up_state(). A READ_ONCE on futex_q::task before
futex_requeue_pi_complete() is enough because it ensures that the variable
is read before the state is updated.
Read futex_q::task before updating the requeue state, use it for the
following wakeup.
Fixes: 07d91ef510 ("futex: Prevent requeue_pi() lock nesting issue on RT")
Reported-by: syzbot+034246a838a10d181e78@syzkaller.appspotmail.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Closes: https://lore.kernel.org/all/68b75989.050a0220.3db4df.01dd.GAE@google.com/
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 987afe1479 ]
The blamed commit and others in that patch set started the trend
of reusing existing DSA driver API for a new purpose: calling
ds->ops->port_fdb_add() on the CPU port.
The lantiq_gswip driver was not prepared to handle that, as can be seen
from the many errors that Daniel presents in the logs:
[ 174.050000] gswip 1e108000.switch: port 2 failed to add fa:aa:72:f4:8b:1e vid 1 to fdb: -22
[ 174.060000] gswip 1e108000.switch lan2: entered promiscuous mode
[ 174.070000] gswip 1e108000.switch: port 2 failed to add 00:01:02:03:04:02 vid 0 to fdb: -22
[ 174.090000] gswip 1e108000.switch: port 2 failed to add 00:01:02:03:04:02 vid 1 to fdb: -22
[ 174.090000] gswip 1e108000.switch: port 2 failed to delete fa:aa:72:f4:8b:1e vid 1 from fdb: -2
The errors are because gswip_port_fdb() wants to get a handle to the
bridge that originated these FDB events, to associate it with a FID.
Absolutely honourable purpose, however this only works for user ports.
To get the bridge that generated an FDB entry for the CPU port, one
would need to look at the db.bridge.dev argument. But this was
introduced in commit c26933639b ("net: dsa: request drivers to perform
FDB isolation"), first appeared in v5.18, and when the blamed commit was
introduced in v5.14, no such API existed.
So the core DSA feature was introduced way too soon for lantiq_gswip.
Not acting on these host FDB entries and suppressing any errors has no
other negative effect, and practically returns us to not supporting the
host filtering feature at all - peacefully, this time.
Fixes: 10fae4ac89 ("net: dsa: include bridge addresses which are local in the host fdb list")
Reported-by: Daniel Golle <daniel@makrotopia.org>
Closes: https://lore.kernel.org/netdev/aJfNMLNoi1VOsPrN@pidgin.makrotopia.org/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250918072142.894692-3-vladimir.oltean@nxp.com
Tested-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c0054b25e2 ]
A port added to a "single port bridge" operates as standalone, and this
is mutually exclusive to being part of a Linux bridge. In fact,
gswip_port_bridge_join() calls gswip_add_single_port_br() with
add=false, i.e. removes the port from the "single port bridge" to enable
autonomous forwarding.
The blamed commit seems to have incorrectly thought that ds->ops->port_enable()
is called one time per port, during the setup phase of the switch.
However, it is actually called during the ndo_open() implementation of
DSA user ports, which is to say that this sequence of events:
1. ip link set swp0 down
2. ip link add br0 type bridge
3. ip link set swp0 master br0
4. ip link set swp0 up
would cause swp0 to join back the "single port bridge" which step 3 had
just removed it from.
The correct DSA hook for one-time actions per port at switch init time
is ds->ops->port_setup(). This is what seems to match the coder's
intention; also see the comment at the beginning of the file:
* At the initialization the driver allocates one bridge table entry for
~~~~~~~~~~~~~~~~~~~~~
* each switch port which is used when the port is used without an
* explicit bridge.
Fixes: 8206e0ce96 ("net: dsa: lantiq: Add VLAN unaware bridge offloading")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250918072142.894692-2-vladimir.oltean@nxp.com
Tested-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c29913109c ]
The test creates non-FDB nexthops without a nexthop device which leads
to the expected failure, but for the wrong reason:
# ./fib_nexthops.sh -t "ipv6_fdb_grp_fcnal ipv4_fdb_grp_fcnal" -v
IPv6 fdb groups functional
--------------------------
[...]
COMMAND: ip -netns me-nRsN3E nexthop add id 63 via 2001:db8:91::4
Error: Device attribute required for non-blackhole and non-fdb nexthops.
COMMAND: ip -netns me-nRsN3E nexthop add id 64 via 2001:db8:91::5
Error: Device attribute required for non-blackhole and non-fdb nexthops.
COMMAND: ip -netns me-nRsN3E nexthop add id 103 group 63/64 fdb
Error: Invalid nexthop id.
TEST: Fdb Nexthop group with non-fdb nexthops [ OK ]
[...]
IPv4 fdb groups functional
--------------------------
[...]
COMMAND: ip -netns me-nRsN3E nexthop add id 14 via 172.16.1.2
Error: Device attribute required for non-blackhole and non-fdb nexthops.
COMMAND: ip -netns me-nRsN3E nexthop add id 15 via 172.16.1.3
Error: Device attribute required for non-blackhole and non-fdb nexthops.
COMMAND: ip -netns me-nRsN3E nexthop add id 103 group 14/15 fdb
Error: Invalid nexthop id.
TEST: Fdb Nexthop group with non-fdb nexthops [ OK ]
COMMAND: ip -netns me-nRsN3E nexthop add id 16 via 172.16.1.2 fdb
COMMAND: ip -netns me-nRsN3E nexthop add id 17 via 172.16.1.3 fdb
COMMAND: ip -netns me-nRsN3E nexthop add id 104 group 14/15
Error: Invalid nexthop id.
TEST: Non-Fdb Nexthop group with fdb nexthops [ OK ]
[...]
COMMAND: ip -netns me-0dlhyd ro add 172.16.0.0/22 nhid 15
Error: Nexthop id does not exist.
TEST: Route add with fdb nexthop [ OK ]
In addition, as can be seen in the above output, a couple of IPv4 test
cases used the non-FDB nexthops (14 and 15) when they intended to use
the FDB nexthops (16 and 17). These test cases only passed because
failure was expected, but they failed for the wrong reason.
Fix the test to create the non-FDB nexthops with a nexthop device and
adjust the IPv4 test cases to use the FDB nexthops instead of the
non-FDB nexthops.
Output after the fix:
# ./fib_nexthops.sh -t "ipv6_fdb_grp_fcnal ipv4_fdb_grp_fcnal" -v
IPv6 fdb groups functional
--------------------------
[...]
COMMAND: ip -netns me-lNzfHP nexthop add id 63 via 2001:db8:91::4 dev veth1
COMMAND: ip -netns me-lNzfHP nexthop add id 64 via 2001:db8:91::5 dev veth1
COMMAND: ip -netns me-lNzfHP nexthop add id 103 group 63/64 fdb
Error: FDB nexthop group can only have fdb nexthops.
TEST: Fdb Nexthop group with non-fdb nexthops [ OK ]
[...]
IPv4 fdb groups functional
--------------------------
[...]
COMMAND: ip -netns me-lNzfHP nexthop add id 14 via 172.16.1.2 dev veth1
COMMAND: ip -netns me-lNzfHP nexthop add id 15 via 172.16.1.3 dev veth1
COMMAND: ip -netns me-lNzfHP nexthop add id 103 group 14/15 fdb
Error: FDB nexthop group can only have fdb nexthops.
TEST: Fdb Nexthop group with non-fdb nexthops [ OK ]
COMMAND: ip -netns me-lNzfHP nexthop add id 16 via 172.16.1.2 fdb
COMMAND: ip -netns me-lNzfHP nexthop add id 17 via 172.16.1.3 fdb
COMMAND: ip -netns me-lNzfHP nexthop add id 104 group 16/17
Error: Non FDB nexthop group cannot have fdb nexthops.
TEST: Non-Fdb Nexthop group with fdb nexthops [ OK ]
[...]
COMMAND: ip -netns me-lNzfHP ro add 172.16.0.0/22 nhid 16
Error: Route cannot point to a fdb nexthop.
TEST: Route add with fdb nexthop [ OK ]
[...]
Tests passed: 30
Tests failed: 0
Tests skipped: 0
Fixes: 0534c5489c ("selftests: net: add fdb nexthop tests")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250921150824.149157-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 390b3a300d ]
The kernel forbids the creation of non-FDB nexthop groups with FDB
nexthops:
# ip nexthop add id 1 via 192.0.2.1 fdb
# ip nexthop add id 2 group 1
Error: Non FDB nexthop group cannot have fdb nexthops.
And vice versa:
# ip nexthop add id 3 via 192.0.2.2 dev dummy1
# ip nexthop add id 4 group 3 fdb
Error: FDB nexthop group can only have fdb nexthops.
However, as long as no routes are pointing to a non-FDB nexthop group,
the kernel allows changing the type of a nexthop from FDB to non-FDB and
vice versa:
# ip nexthop add id 5 via 192.0.2.2 dev dummy1
# ip nexthop add id 6 group 5
# ip nexthop replace id 5 via 192.0.2.2 fdb
# echo $?
0
This configuration is invalid and can result in a NPD [1] since FDB
nexthops are not associated with a nexthop device:
# ip route add 198.51.100.1/32 nhid 6
# ping 198.51.100.1
Fix by preventing nexthop FDB status change while the nexthop is in a
group:
# ip nexthop add id 7 via 192.0.2.2 dev dummy1
# ip nexthop add id 8 group 7
# ip nexthop replace id 7 via 192.0.2.2 fdb
Error: Cannot change nexthop FDB status while in a group.
[1]
BUG: kernel NULL pointer dereference, address: 00000000000003c0
[...]
Oops: Oops: 0000 [#1] SMP
CPU: 6 UID: 0 PID: 367 Comm: ping Not tainted 6.17.0-rc6-virtme-gb65678cacc03 #1 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-4.fc41 04/01/2014
RIP: 0010:fib_lookup_good_nhc+0x1e/0x80
[...]
Call Trace:
<TASK>
fib_table_lookup+0x541/0x650
ip_route_output_key_hash_rcu+0x2ea/0x970
ip_route_output_key_hash+0x55/0x80
__ip4_datagram_connect+0x250/0x330
udp_connect+0x2b/0x60
__sys_connect+0x9c/0xd0
__x64_sys_connect+0x18/0x20
do_syscall_64+0xa4/0x2a0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
Fixes: 38428d6871 ("nexthop: support for fdb ecmp nexthops")
Reported-by: syzbot+6596516dd2b635ba2350@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/68c9a4d2.050a0220.3c6139.0e63.GAE@google.com/
Tested-by: syzbot+6596516dd2b635ba2350@syzkaller.appspotmail.com
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250921150824.149157-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ca9f9cdc4d ]
Currently, alloc_skb_with_frags() will only fill (MAX_SKB_FRAGS - 1)
slots. I think it should use all MAX_SKB_FRAGS slots, as callers of
alloc_skb_with_frags() will size their allocation of frags based
on MAX_SKB_FRAGS.
This issue was discovered via a test patch that sets 'order' to 0
in alloc_skb_with_frags(), which effectively tests/simulates high
fragmentation. In this case sendmsg() on unix sockets will fail every
time for large allocations. If the PAGE_SIZE is 4K, then data_len will
request 68K or 17 pages, but alloc_skb_with_frags() can only allocate
64K in this case or 16 pages.
Fixes: 09c2c90705 ("net: allow alloc_skb_with_frags() to allocate bigger packets")
Signed-off-by: Jason Baron <jbaron@akamai.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250922191957.2855612-1-jbaron@akamai.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3d3aa9472c ]
In bnxt_tc_parse_pedit(), the code incorrectly writes IPv6
destination values to the source address field (saddr) when
processing pedit offsets within the destination address range.
This patch corrects the assignment to use daddr instead of saddr,
ensuring that pedit operations on IPv6 destination addresses are
applied correctly.
Fixes: 9b9eb518e3 ("bnxt_en: Add support for NAT(L3/L4 rewrite)")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Link: https://patch.msgid.link/20250920121157.351921-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit afe16653e0 ]
vhost_task_create() creates a task and keeps a reference to its
task_struct. That task may exit early via a signal and its task_struct
will be released.
A pending vhost_task_wake() will then attempt to wake the task and
access a task_struct which is no longer there.
Acquire a reference on the task_struct while creating the thread and
release the reference while the struct vhost_task itself is removed.
If the task exits early due to a signal, then the vhost_task_wake() will
still access a valid task_struct. The wake is safe and will be skipped
in this case.
Fixes: f9010dbdce ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression")
Reported-by: Sean Christopherson <seanjc@google.com>
Closes: https://lore.kernel.org/all/aKkLEtoDXKxAAWju@google.com/
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Message-Id: <20250918181144.Ygo8BZ-R@linutronix.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1488af7b8b ]
hci_resume_advertising_sync is suppose to resume all instance paused by
hci_pause_advertising_sync, this logic is used for procedures are only
allowed when not advertising, but instance 0x00 was not being
re-enabled.
Fixes: ad383c2c65 ("Bluetooth: hci_sync: Enable advertising when LL privacy is enabled")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 17c8d79452 ]
Sending an PF_PACKET allows to bypass the CAN framework logic and to
directly reach the xmit() function of a CAN driver. The only check
which is performed by the PF_PACKET framework is to make sure that
skb->len fits the interface's MTU.
Unfortunately, because the mcba_usb driver does not populate its
net_device_ops->ndo_change_mtu(), it is possible for an attacker to
configure an invalid MTU by doing, for example:
$ ip link set can0 mtu 9999
After doing so, the attacker could open a PF_PACKET socket using the
ETH_P_CANXL protocol:
socket(PF_PACKET, SOCK_RAW, htons(ETH_P_CANXL))
to inject a malicious CAN XL frames. For example:
struct canxl_frame frame = {
.flags = 0xff,
.len = 2048,
};
The CAN drivers' xmit() function are calling can_dev_dropped_skb() to
check that the skb is valid, unfortunately under above conditions, the
malicious packet is able to go through can_dev_dropped_skb() checks:
1. the skb->protocol is set to ETH_P_CANXL which is valid (the
function does not check the actual device capabilities).
2. the length is a valid CAN XL length.
And so, mcba_usb_start_xmit() receives a CAN XL frame which it is not
able to correctly handle and will thus misinterpret it as a CAN frame.
This can result in a buffer overflow. The driver will consume cf->len
as-is with no further checks on these lines:
usb_msg.dlc = cf->len;
memcpy(usb_msg.data, cf->data, usb_msg.dlc);
Here, cf->len corresponds to the flags field of the CAN XL frame. In
our previous example, we set canxl_frame->flags to 0xff. Because the
maximum expected length is 8, a buffer overflow of 247 bytes occurs!
Populate net_device_ops->ndo_change_mtu() to ensure that the
interface's MTU can not be set to anything bigger than CAN_MTU. By
fixing the root cause, this prevents the buffer overflow.
Fixes: 51f3baad7d ("can: mcba_usb: Add support for Microchip CAN BUS Analyzer")
Signed-off-by: Vincent Mailhol <mailhol@kernel.org>
Link: https://patch.msgid.link/20250918-can-fix-mtu-v1-4-0d1cada9393b@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 61da0bd410 ]
Sending an PF_PACKET allows to bypass the CAN framework logic and to
directly reach the xmit() function of a CAN driver. The only check
which is performed by the PF_PACKET framework is to make sure that
skb->len fits the interface's MTU.
Unfortunately, because the sun4i_can driver does not populate its
net_device_ops->ndo_change_mtu(), it is possible for an attacker to
configure an invalid MTU by doing, for example:
$ ip link set can0 mtu 9999
After doing so, the attacker could open a PF_PACKET socket using the
ETH_P_CANXL protocol:
socket(PF_PACKET, SOCK_RAW, htons(ETH_P_CANXL))
to inject a malicious CAN XL frames. For example:
struct canxl_frame frame = {
.flags = 0xff,
.len = 2048,
};
The CAN drivers' xmit() function are calling can_dev_dropped_skb() to
check that the skb is valid, unfortunately under above conditions, the
malicious packet is able to go through can_dev_dropped_skb() checks:
1. the skb->protocol is set to ETH_P_CANXL which is valid (the
function does not check the actual device capabilities).
2. the length is a valid CAN XL length.
And so, sun4ican_start_xmit() receives a CAN XL frame which it is not
able to correctly handle and will thus misinterpret it as a CAN frame.
This can result in a buffer overflow. The driver will consume cf->len
as-is with no further checks on this line:
dlc = cf->len;
Here, cf->len corresponds to the flags field of the CAN XL frame. In
our previous example, we set canxl_frame->flags to 0xff. Because the
maximum expected length is 8, a buffer overflow of 247 bytes occurs a
couple line below when doing:
for (i = 0; i < dlc; i++)
writel(cf->data[i], priv->base + (dreg + i * 4));
Populate net_device_ops->ndo_change_mtu() to ensure that the
interface's MTU can not be set to anything bigger than CAN_MTU. By
fixing the root cause, this prevents the buffer overflow.
Fixes: 0738eff14d ("can: Allwinner A10/A20 CAN Controller support - Kernel module")
Signed-off-by: Vincent Mailhol <mailhol@kernel.org>
Link: https://patch.msgid.link/20250918-can-fix-mtu-v1-3-0d1cada9393b@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ac1c7656fa ]
Sending an PF_PACKET allows to bypass the CAN framework logic and to
directly reach the xmit() function of a CAN driver. The only check
which is performed by the PF_PACKET framework is to make sure that
skb->len fits the interface's MTU.
Unfortunately, because the sun4i_can driver does not populate its
net_device_ops->ndo_change_mtu(), it is possible for an attacker to
configure an invalid MTU by doing, for example:
$ ip link set can0 mtu 9999
After doing so, the attacker could open a PF_PACKET socket using the
ETH_P_CANXL protocol:
socket(PF_PACKET, SOCK_RAW, htons(ETH_P_CANXL))
to inject a malicious CAN XL frames. For example:
struct canxl_frame frame = {
.flags = 0xff,
.len = 2048,
};
The CAN drivers' xmit() function are calling can_dev_dropped_skb() to
check that the skb is valid, unfortunately under above conditions, the
malicious packet is able to go through can_dev_dropped_skb() checks:
1. the skb->protocol is set to ETH_P_CANXL which is valid (the
function does not check the actual device capabilities).
2. the length is a valid CAN XL length.
And so, hi3110_hard_start_xmit() receives a CAN XL frame which it is
not able to correctly handle and will thus misinterpret it as a CAN
frame. The driver will consume frame->len as-is with no further
checks.
This can result in a buffer overflow later on in hi3110_hw_tx() on
this line:
memcpy(buf + HI3110_FIFO_EXT_DATA_OFF,
frame->data, frame->len);
Here, frame->len corresponds to the flags field of the CAN XL frame.
In our previous example, we set canxl_frame->flags to 0xff. Because
the maximum expected length is 8, a buffer overflow of 247 bytes
occurs!
Populate net_device_ops->ndo_change_mtu() to ensure that the
interface's MTU can not be set to anything bigger than CAN_MTU. By
fixing the root cause, this prevents the buffer overflow.
Fixes: 57e83fb9b7 ("can: hi311x: Add Holt HI-311x CAN driver")
Signed-off-by: Vincent Mailhol <mailhol@kernel.org>
Link: https://patch.msgid.link/20250918-can-fix-mtu-v1-2-0d1cada9393b@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 38c0abad45 ]
Sending an PF_PACKET allows to bypass the CAN framework logic and to
directly reach the xmit() function of a CAN driver. The only check
which is performed by the PF_PACKET framework is to make sure that
skb->len fits the interface's MTU.
Unfortunately, because the etas_es58x driver does not populate its
net_device_ops->ndo_change_mtu(), it is possible for an attacker to
configure an invalid MTU by doing, for example:
$ ip link set can0 mtu 9999
After doing so, the attacker could open a PF_PACKET socket using the
ETH_P_CANXL protocol:
socket(PF_PACKET, SOCK_RAW, htons(ETH_P_CANXL));
to inject a malicious CAN XL frames. For example:
struct canxl_frame frame = {
.flags = 0xff,
.len = 2048,
};
The CAN drivers' xmit() function are calling can_dev_dropped_skb() to
check that the skb is valid, unfortunately under above conditions, the
malicious packet is able to go through can_dev_dropped_skb() checks:
1. the skb->protocol is set to ETH_P_CANXL which is valid (the
function does not check the actual device capabilities).
2. the length is a valid CAN XL length.
And so, es58x_start_xmit() receives a CAN XL frame which it is not
able to correctly handle and will thus misinterpret it as a CAN(FD)
frame.
This can result in a buffer overflow. For example, using the es581.4
variant, the frame will be dispatched to es581_4_tx_can_msg(), go
through the last check at the beginning of this function:
if (can_is_canfd_skb(skb))
return -EMSGSIZE;
and reach this line:
memcpy(tx_can_msg->data, cf->data, cf->len);
Here, cf->len corresponds to the flags field of the CAN XL frame. In
our previous example, we set canxl_frame->flags to 0xff. Because the
maximum expected length is 8, a buffer overflow of 247 bytes occurs!
Populate net_device_ops->ndo_change_mtu() to ensure that the
interface's MTU can not be set to anything bigger than CAN_MTU or
CANFD_MTU (depending on the device capabilities). By fixing the root
cause, this prevents the buffer overflow.
Fixes: 8537257874 ("can: etas_es58x: add core support for ETAS ES58X CAN USB interfaces")
Signed-off-by: Vincent Mailhol <mailhol@kernel.org>
Link: https://patch.msgid.link/20250918-can-fix-mtu-v1-1-0d1cada9393b@kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e25ddfb388 ]
When enable CONFIG_PREEMPT_RT, the kernel will warn when run timer
selftests by './test_progs -t timer':
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
In order to avoid such warning, reject bpf_timer in verifier when
PREEMPT_RT is enabled.
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20250910125740.52172-2-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3d1267475b ]
Commit 15ae0410c37a79 ("btrfs-progs: add error handling for
device_get_partition_size_fd_stat()") in btrfs-progs inadvertently
changed it so that if the BLKGETSIZE64 ioctl on a block device returned
a size of 0, this was no longer seen as an error condition.
Unfortunately this is how disconnected NBD devices behave, meaning that
with btrfs-progs 6.16 it's now possible to add a device you can't
remove:
# btrfs device add /dev/nbd0 /root/temp
# btrfs device remove /dev/nbd0 /root/temp
ERROR: error removing device '/dev/nbd0': Invalid argument
This check should always have been done kernel-side anyway, so add a
check in btrfs_init_new_device() that the new device doesn't have a size
less than BTRFS_DEVICE_RANGE_RESERVED (i.e. 1 MB).
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8ffe28b4e8 ]
commit 2a6c727387 ("cpufreq: Initialize cpufreq-based
frequency-invariance later") postponed the frequency invariance
initialization to avoid disabling it in the error case.
This isn't locking safe, instead move the initialization up before
the subsys interface is registered (which will rebuild the
sched_domains) and add the corresponding disable on the error path.
Observed lockdep without this patch:
[ 0.989686] ======================================================
[ 0.989688] WARNING: possible circular locking dependency detected
[ 0.989690] 6.17.0-rc4-cix-build+ #31 Tainted: G S
[ 0.989691] ------------------------------------------------------
[ 0.989692] swapper/0/1 is trying to acquire lock:
[ 0.989693] ffff800082ada7f8 (sched_energy_mutex){+.+.}-{4:4}, at: rebuild_sched_domains_energy+0x30/0x58
[ 0.989705]
but task is already holding lock:
[ 0.989706] ffff000088c89bc8 (&policy->rwsem){+.+.}-{4:4}, at: cpufreq_online+0x7f8/0xbe0
[ 0.989713]
which lock already depends on the new lock.
Fixes: 2a6c727387 ("cpufreq: Initialize cpufreq-based frequency-invariance later")
Signed-off-by: Christian Loehle <christian.loehle@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 29341c6c18 ]
A previous commit changed the '#sound-dai-cells' property for the
kirkwood audio controller from 1 to 0 in the kirkwood.dtsi file,
but did not update the corresponding 'sound-dai' property in the
kirkwood-openrd-client.dts file.
This created a mismatch, causing a dtbs_check validation error where
the dts provides one cell (<&audio0 0>) while the .dtsi expects zero.
Remove the extraneous cell from the 'sound-dai' property to fix the
schema validation warning and align with the updated binding.
Fixes: e662e70fa4 ("arm: dts: kirkwood: fix error in #sound-dai-cells size")
Signed-off-by: Jihed Chaibi <jihed.chaibi.dev@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a50342f976 ]
The TMU has two temperature measurement sites located on the chip. The
probe 0 is located inside of the ANAMIX, while the probe 1 is located near
the ARM core. This has been confirmed by checking with HW design team and
checking RTL code.
So correct the {cpu,soc}-thermal sensor index.
Fixes: 30cdd62dce ("arm64: dts: imx8mp: Add thermal zones support")
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b2461e20fa ]
To ensure successful builds when CONFIG_IMX_SCMI_MISC_DRV is not enabled,
this patch adds static inline stub implementations for the following
functions:
- scmi_imx_misc_ctrl_get()
- scmi_imx_misc_ctrl_set()
These stubs return -EOPNOTSUPP to indicate that the functionality is not
supported in the current configuration. This avoids potential build or
link errors in code that conditionally calls these functions based on
feature availability.
This patch also drops the changes in commit 540c830212 ("firmware: imx:
remove duplicate scmi_imx_misc_ctrl_get()").
The original change aimed to simplify the handling of optional features by
removing conditional stubs. However, the use of conditional stubs is
necessary when CONFIG_IMX_SCMI_MISC_DRV is n, while consumer driver is
set to y.
This is not a matter of preserving legacy patterns, but rather to ensure
that there is no link error whether for module or built-in.
Fixes: 0b4f8a68b2 ("firmware: imx: Add i.MX95 MISC driver")
Reviewed-by: Cristian Marussi <cristian.marussi@arm.com>
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 85fe9f565d ]
Fix a bug where the driver's event subscription logic for SRQ-related
events incorrectly sets obj_type for RMP objects.
When subscribing to SRQ events, get_legacy_obj_type() did not handle
the MLX5_CMD_OP_CREATE_RMP case, which caused obj_type to be 0
(default).
This led to a mismatch between the obj_type used during subscription
(0) and the value used during notification (1, taken from the event's
type field). As a result, event mapping for SRQ objects could fail and
event notification would not be delivered correctly.
This fix adds handling for MLX5_CMD_OP_CREATE_RMP in get_legacy_obj_type,
returning MLX5_EVENT_QUEUE_TYPE_RQ so obj_type is consistent between
subscription and notification.
Fixes: 7597385371 ("IB/mlx5: Enable subscription for device events over DEVX")
Link: https://patch.msgid.link/r/8f1048e3fdd1fde6b90607ce0ed251afaf8a148c.1755088962.git.leon@kernel.org
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Edward Srouji <edwards@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e9df175548 ]
The vendor Comtrue Inc. (0x2fc6) produces USB audio chipsets like
the CT7601 which are capable of Native DSD playback.
This patch adds QUIRK_FLAG_DSD_RAW for Comtrue (VID 0x2fc6), which enables
native DSD playback (DSD_U32_LE) on their USB Audio device. This has been
verified under Ubuntu 25.04 with JRiver.
Signed-off-by: noble.yang <noble.yang@comtrue-inc.com>
Link: https://patch.msgid.link/20250731110614.4070-1-noble228@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 79d561c4ec ]
The Sony DualSense wireless controller (PS5) features an internal mono
speaker, but it also provides a 3.5mm jack socket for headphone output
and headset microphone input.
Since this is a UAC1 device, it doesn't advertise any jack detection
capability. However, the controller is able to report HP & MIC insert
events via HID, i.e. through a dedicated input device managed by the
hid-playstation driver.
Add a quirk to create the jack controls for headphone and headset mic,
respectively, and setup an input handler for each of them in order to
intercept the related hotplug events.
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20250526-dualsense-alsa-jack-v1-9-1a821463b632@collabora.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9cea742559 ]
Adding a memory barrier before wake_up() in
snd_usb_soundblaster_remote_complete() is supposed to ensure the write
to mixer->rc_code is visible in wait_event_interruptible() from
snd_usb_sbrc_hwdep_read().
However, this is not really necessary, since wake_up() is just a wrapper
over __wake_up() which already executes a full memory barrier before
accessing the state of the task to be waken up.
Drop the redundant call to wmb() and implicitly fix the checkpatch
complaint:
WARNING: memory barrier without comment
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20250526-dualsense-alsa-jack-v1-8-1a821463b632@collabora.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 853a57ba26 ]
In kernel v6.5, several functions were added to the cdev layer. This
required updating the default version of subsystem ABI up to 6, but
this requirement was overlooked.
This commit updates the version accordingly.
Fixes: 6add87e976 ("firewire: cdev: add new version of ABI to notify time stamp at request/response subaction of transaction#")
Link: https://lore.kernel.org/r/20250920025148.163402-1-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5cb782ff3c ]
Previous checks incorrectly tested the DMA addresses (dma_handle) for
NULL. Since dma_alloc_coherent() returns the CPU (virtual) address, the
NULL check should be performed on the *_base_addr pointer to correctly
detect allocation failures.
Update the checks to validate sqe_base_addr and cqe_base_addr instead of
sqe_dma_addr and cqe_dma_addr.
Fixes: 4682abfae2 ("scsi: ufs: core: mcq: Allocate memory for MCQ mode")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Peter Wang <peter.wang@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
8 bit readout is only a reconfiguration of the CSI2 block,
and recomputation of horizontal blanking. Enable it.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The sensor supports readout as 10 or 12 bit. As we are now
computing the horizontal blanking limits dynamically, adding
support for both readout modes falls out trivially, so add
them both.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The timing registers configured are for 450MHz.
If running at a different link frequency, use the automatic
timing control.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Now that the link frequency can be varied, write the link bit
rate registers to reflect the speed being used.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
As we now support variable link frequency, compute the minimum
line_length value that the sensor will work with, and set
V4L2_CID_HBLANK based on that number.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Rather than the hard coded PLL settings for fixed frequencies,
compute the PLL settings based on device tree, validating that
the specified rate can be achieved.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
There are a fair number of registers duplicated in all the mode
tables, so move those into the common table.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
This removes a load of boilerplate code around how registers
are grouped into multiple word values.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The frame length default value doesn't change dynamically, and
neither does any of the other parameters that configure it,
so precompute it instead of working from a frame duration to
get to the value.
The minimum value was also computed, when actually the sensor
will take any value down to 4 lines.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
line_length_pix is a value that the developer wants to know,
so write the values in decimal.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Register line_length_pix was being written by both the tables
of registers and the control handler for V4L2_CID_HBLANK.
Remove the duplication in the tables.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
This is largely to test a previous change that made IOMMU aperture
configurable and allocated lazily; it may be useful in its own right.
We expect IOMMU2 to be well-utilized e.g. when using 64MPix cameras.
Signed-off-by: Nick Hollinghurst <nick.hollinghurst@raspberrypi.com>
Allocate space for level-2 IOMMU translation tables on demand.
This should save memory in most cases but means that map_pages()
can now fail with -ENOMEM. Unused pages are retained for re-use.
Move all dma_sync* calls into map and unmap functions rather than
batching them up. This makes it easier to ensure they are safely
balanced, now that the tables are held as separate pages.
Add OF properties to override the default aperture size (2GB)
and base address (40GB); this doesn't include any dma-iova-offset.
Various tidy-ups. The internal representation of aperture limits
*does* include dma_iova_offset, as that is more commonly useful.
Clarify the distinction between Linux pages and IOMMU table pages.
Fix wrong definition of MMMU_CTRL_PT_INVALID_EN flag.
Signed-off-by: Nick Hollinghurst <nick.hollinghurst@raspberrypi.com>
At a high FPS with RAW10, there is frame corruption for 480p because the
rate_factor of 2 is used with the normal 2x2 bining [1]. This commit
ties the rate_factor to the selected binning mode. For the 480p mode,
analog 2x2 binning mode with a rate_factor of 2 is always used. For the
1232p mode the normal 2x2 binning mode is used for RAW10 while analog
2x2 binning mode is used for RAW8.
[1] raspberrypi#5493
Signed-off-by: Vinay Varma <varmavinaym@gmail.com>
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
Reworked due to upstream changes
The enum binning_mode is redundant, it only uses values from the earlier
defined binning modes. Remove it.
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
The PL011 driver in this downstream kernel tree supports an extra
compatible string - arm,pl011-axi - for use by RP1. This registers as a
platform driver, not an AMBA driver, and has the advantage of responding
to dynamic Device Tree changes such as loading one of the "uart<n>"
overlays.
Change all of the downstream Raspberry Pi dts files to use the new
compatible string. At the same time, remove the override of the periphid
as the upstream code now has the correct value.
See: https://github.com/raspberrypi/linux/issues/7019
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
The D-PHY on RP1 support lane polarity swapping, and there
is a standard device tree mechanism for configuring this,
so tie the two together.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
There was lots of register definition information dumped from
the some source into the driver but unused. Remove it, and
format the remaining lines according to the Linux kernel coding
style.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Put particularly the PHY registers into order, bitmasks
defined alongside the registers, and Use tabs for indentation.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
When commit fa78e9b606 upstream was
backported to LTS branches linux-6.12.y and linux-6.6.y, the SPI regmap
config fix got applied to the I2C regmap config. Most likely due to a new
RTC get/set parm feature introduced in 6.14 causing regmap config sections
in the buttom of the driver to move. LTS branch linux-6.1.y and earlier
does not have PCF2131 device support.
Issue can be seen in buttom of this diff in stable/linux.git tree:
git diff master..linux-6.12.y -- drivers/rtc/rtc-pcf2127.c
Fixes: ee61aec852 ("rtc: pcf2127: fix SPI command byte for PCF2131")
Fixes: 5cdd1f73401d ("rtc: pcf2127: fix SPI command byte for PCF2131")
Cc: stable@vger.kernel.org
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Cc: Elena Popa <elena.popa@nxp.com>
Cc: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Signed-off-by: Bruno Thomsen <bruno.thomsen@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a5c98e8b13 ]
Pending requests will be flushed on disconnect, and the corresponding
TRBs will be turned into No-op TRBs, which are ignored by the xHC
controller once it starts processing the ring.
If the USB debug cable repeatedly disconnects before ring is started
then the ring will eventually be filled with No-op TRBs.
No new transfers can be queued when the ring is full, and driver will
print the following error message:
"xhci_hcd 0000:00:14.0: failed to queue trbs"
This is a normal case for 'in' transfers where TRBs are always enqueued
in advance, ready to take on incoming data. If no data arrives, and
device is disconnected, then ring dequeue will remain at beginning of
the ring while enqueue points to first free TRB after last cancelled
No-op TRB.
s
Solve this by reinitializing the rings when the debug cable disconnects
and DbC is leaving the configured state.
Clear the whole ring buffer and set enqueue and dequeue to the beginning
of ring, and set cycle bit to its initial state.
Cc: stable@vger.kernel.org
Fixes: dfba2174dc ("usb: xhci: Add DbC support in xHCI driver")
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20250902105306.877476-3-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 220a0ffde0 ]
Decouple allocation of endpoint ring buffer from initialization
of the buffer, and initialization of endpoint context parts from
from the rest of the contexts.
It allows driver to clear up and reinitialize endpoint rings
after disconnect without reallocating everything.
This is a prerequisite for the next patch that prevents the transfer
ring from filling up with cancelled (no-op) TRBs if a debug cable is
reconnected several times without transferring anything.
Cc: stable@vger.kernel.org
Fixes: dfba2174dc ("usb: xhci: Add DbC support in xHCI driver")
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20250902105306.877476-2-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stable-dep-of: a5c98e8b13 ("xhci: dbc: Fix full DbC transfer ring after several reconnects")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e1b0fa8639 ]
The TRB cycle bit indicates TRB ownership by the Host Controller (HC) or
Host Controller Driver (HCD). New rings are initialized with 'cycle_state'
equal to one, and all its TRBs' cycle bits are set to zero. When handling
ring expansion, set the source ring cycle bits to the same value as the
destination ring.
Move the cycle bit setting from xhci_segment_alloc() to xhci_link_rings(),
and remove the 'cycle_state' argument from xhci_initialize_ring_info().
The xhci_segment_alloc() function uses kzalloc_node() to allocate segments,
ensuring that all TRB cycle bits are initialized to zero.
Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-12-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stable-dep-of: a5c98e8b13 ("xhci: dbc: Fix full DbC transfer ring after several reconnects")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2293c57484 upstream.
During the connection establishment, a peer can tell the other one that
it cannot establish new subflows to the initial IP address and port by
setting the 'C' flag [1]. Doing so makes sense when the sender is behind
a strict NAT, operating behind a legacy Layer 4 load balancer, or using
anycast IP address for example.
When this 'C' flag is set, the path-managers must then not try to
establish new subflows to the other peer's initial IP address and port.
The in-kernel PM has access to this info, but the userspace PM didn't.
The RFC8684 [1] is strict about that:
(...) therefore the receiver MUST NOT try to open any additional
subflows toward this address and port.
So it is important to tell the userspace about that as it is responsible
for the respect of this flag.
When a new connection is created and established, the Netlink events
now contain the existing but not currently used 'flags' attribute. When
MPTCP_PM_EV_FLAG_DENY_JOIN_ID0 is set, it means no other subflows
to the initial IP address and port -- info that are also part of the
event -- can be established.
Link: https://datatracker.ietf.org/doc/html/rfc8684#section-3.1-20.6 [1]
Fixes: 702c2f646d ("mptcp: netlink: allow userspace-driven subflow establishment")
Reported-by: Marek Majkowski <marek@cloudflare.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/532
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-2-40171884ade8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[ Conflicts in mptcp_pm.yaml, because the indentation has been modified
in commit ec362192aa ("netlink: specs: fix up indentation errors"),
which is not in this version. Applying the same modifications, but at
a different level. ]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 98c6d25931 ]
Patch series "mm: better GUP pin lru_add_drain_all()", v2.
Series of lru_add_drain_all()-related patches, arising from recent mm/gup
migration report from Will Deacon.
This patch (of 5):
Will Deacon reports:-
When taking a longterm GUP pin via pin_user_pages(),
__gup_longterm_locked() tries to migrate target folios that should not be
longterm pinned, for example because they reside in a CMA region or
movable zone. This is done by first pinning all of the target folios
anyway, collecting all of the longterm-unpinnable target folios into a
list, dropping the pins that were just taken and finally handing the list
off to migrate_pages() for the actual migration.
It is critically important that no unexpected references are held on the
folios being migrated, otherwise the migration will fail and
pin_user_pages() will return -ENOMEM to its caller. Unfortunately, it is
relatively easy to observe migration failures when running pKVM (which
uses pin_user_pages() on crosvm's virtual address space to resolve stage-2
page faults from the guest) on a 6.15-based Pixel 6 device and this
results in the VM terminating prematurely.
In the failure case, 'crosvm' has called mlock(MLOCK_ONFAULT) on its
mapping of guest memory prior to the pinning. Subsequently, when
pin_user_pages() walks the page-table, the relevant 'pte' is not present
and so the faulting logic allocates a new folio, mlocks it with
mlock_folio() and maps it in the page-table.
Since commit 2fbb0c10d1 ("mm/munlock: mlock_page() munlock_page() batch
by pagevec"), mlock/munlock operations on a folio (formerly page), are
deferred. For example, mlock_folio() takes an additional reference on the
target folio before placing it into a per-cpu 'folio_batch' for later
processing by mlock_folio_batch(), which drops the refcount once the
operation is complete. Processing of the batches is coupled with the LRU
batch logic and can be forcefully drained with lru_add_drain_all() but as
long as a folio remains unprocessed on the batch, its refcount will be
elevated.
This deferred batching therefore interacts poorly with the pKVM pinning
scenario as we can find ourselves in a situation where the migration code
fails to migrate a folio due to the elevated refcount from the pending
mlock operation.
Hugh Dickins adds:-
!folio_test_lru() has never been a very reliable way to tell if an
lru_add_drain_all() is worth calling, to remove LRU cache references to
make the folio migratable: the LRU flag may be set even while the folio is
held with an extra reference in a per-CPU LRU cache.
5.18 commit 2fbb0c10d1 may have made it more unreliable. Then 6.11
commit 33dfe9204f ("mm/gup: clear the LRU flag of a page before adding
to LRU batch") tried to make it reliable, by moving LRU flag clearing; but
missed the mlock/munlock batches, so still unreliable as reported.
And it turns out to be difficult to extend 33dfe9204f29's LRU flag
clearing to the mlock/munlock batches: if they do benefit from batching,
mlock/munlock cannot be so effective when easily suppressed while !LRU.
Instead, switch to an expected ref_count check, which was more reliable
all along: some more false positives (unhelpful drains) than before, and
never a guarantee that the folio will prove migratable, but better.
Note on PG_private_2: ceph and nfs are still using the deprecated
PG_private_2 flag, with the aid of netfs and filemap support functions.
Although it is consistently matched by an increment of folio ref_count,
folio_expected_ref_count() intentionally does not recognize it, and ceph
folio migration currently depends on that for PG_private_2 folios to be
rejected. New references to the deprecated flag are discouraged, so do
not add it into the collect_longterm_unpinnable_folios() calculation: but
longterm pinning of transiently PG_private_2 ceph and nfs folios (an
uncommon case) may invoke a redundant lru_add_drain_all(). And this makes
easy the backport to earlier releases: up to and including 6.12, btrfs
also used PG_private_2, but without a ref_count increment.
Note for stable backports: requires 6.16 commit 86ebd50224 ("mm:
add folio_expected_ref_count() for reference count calculation").
Link: https://lkml.kernel.org/r/41395944-b0e3-c3ac-d648-8ddd70451d28@google.com
Link: https://lkml.kernel.org/r/bd1f314a-fca1-8f19-cac0-b936c9614557@google.com
Fixes: 9a4e9f3b2d ("mm: update get_user_pages_longterm to migrate pages allocated from CMA region")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Will Deacon <will@kernel.org>
Closes: https://lore.kernel.org/linux-mm/20250815101858.24352-1-will@kernel.org/
Acked-by: Kiryl Shutsemau <kas@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Keir Fraser <keirf@google.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Li Zhe <lizhe.67@bytedance.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shivank Garg <shivankg@amd.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Xu <weixugc@google.com>
Cc: yangge <yangge1116@126.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 86ebd50224 ]
Patch series " JFS: Implement migrate_folio for jfs_metapage_aops" v5.
This patchset addresses a warning that occurs during memory compaction due
to JFS's missing migrate_folio operation. The warning was introduced by
commit 7ee3647243 ("migrate: Remove call to ->writepage") which added
explicit warnings when filesystem don't implement migrate_folio.
The syzbot reported following [1]:
jfs_metapage_aops does not implement migrate_folio
WARNING: CPU: 1 PID: 5861 at mm/migrate.c:955 fallback_migrate_folio mm/migrate.c:953 [inline]
WARNING: CPU: 1 PID: 5861 at mm/migrate.c:955 move_to_new_folio+0x70e/0x840 mm/migrate.c:1007
Modules linked in:
CPU: 1 UID: 0 PID: 5861 Comm: syz-executor280 Not tainted 6.15.0-rc1-next-20250411-syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025
RIP: 0010:fallback_migrate_folio mm/migrate.c:953 [inline]
RIP: 0010:move_to_new_folio+0x70e/0x840 mm/migrate.c:1007
To fix this issue, this series implement metapage_migrate_folio() for JFS
which handles both single and multiple metapages per page configurations.
While most filesystems leverage existing migration implementations like
filemap_migrate_folio(), buffer_migrate_folio_norefs() or
buffer_migrate_folio() (which internally used folio_expected_refs()),
JFS's metapage architecture requires special handling of its private data
during migration. To support this, this series introduce the
folio_expected_ref_count(), which calculates external references to a
folio from page/swap cache, private data, and page table mappings.
This standardized implementation replaces the previous ad-hoc
folio_expected_refs() function and enables JFS to accurately determine
whether a folio has unexpected references before attempting migration.
Implement folio_expected_ref_count() to calculate expected folio reference
counts from:
- Page/swap cache (1 per page)
- Private data (1)
- Page table mappings (1 per map)
While originally needed for page migration operations, this improved
implementation standardizes reference counting by consolidating all
refcount contributors into a single, reusable function that can benefit
any subsystem needing to detect unexpected references to folios.
The folio_expected_ref_count() returns the sum of these external
references without including any reference the caller itself might hold.
Callers comparing against the actual folio_ref_count() must account for
their own references separately.
Link: https://syzkaller.appspot.com/bug?extid=8bb6fd945af4e0ad9299 [1]
Link: https://lkml.kernel.org/r/20250430100150.279751-1-shivankg@amd.com
Link: https://lkml.kernel.org/r/20250430100150.279751-2-shivankg@amd.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Shivank Garg <shivankg@amd.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Co-developed-by: David Hildenbrand <david@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Dave Kleikamp <shaggy@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Stable-dep-of: 98c6d25931 ("mm/gup: check ref_count instead of lru before migration")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0dd765fae2 upstream.
vmxnet3 does not unregister xdp rxq info in the
vmxnet3_reset_work() code path as vmxnet3_rq_destroy()
is not invoked in this code path. So, we get below message with a
backtrace.
Missing unregister, handled but fix driver
WARNING: CPU:48 PID: 500 at net/core/xdp.c:182
__xdp_rxq_info_reg+0x93/0xf0
This patch fixes the problem by moving the unregister
code of XDP from vmxnet3_rq_destroy() to vmxnet3_rq_cleanup().
Fixes: 54f00cce11 ("vmxnet3: Add XDP support.")
Signed-off-by: Sankararaman Jayaraman <sankararaman.jayaraman@broadcom.com>
Signed-off-by: Ronak Doshi <ronak.doshi@broadcom.com>
Link: https://patch.msgid.link/20250320045522.57892-1-sankararaman.jayaraman@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[ Ajay: Modified to apply on v6.6, v6.12 ]
Signed-off-by: Ajay Kaher <ajay.kaher@broadcom.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e3417ab75a upstream.
Set the magic BP_SPEC_REDUCE bit to mitigate SRSO when running VMs if and
only if KVM has at least one active VM. Leaving the bit set at all times
unfortunately degrades performance by a wee bit more than expected.
Use a dedicated spinlock and counter instead of hooking virtualization
enablement, as changing the behavior of kvm.enable_virt_at_load based on
SRSO_BP_SPEC_REDUCE is painful, and has its own drawbacks, e.g. could
result in performance issues for flows that are sensitive to VM creation
latency.
Defer setting BP_SPEC_REDUCE until VMRUN is imminent to avoid impacting
performance on CPUs that aren't running VMs, e.g. if a setup is using
housekeeping CPUs. Setting BP_SPEC_REDUCE in task context, i.e. without
blasting IPIs to all CPUs, also helps avoid serializing 1<=>N transitions
without incurring a gross amount of complexity (see the Link for details
on how ugly coordinating via IPIs gets).
Link: https://lore.kernel.org/all/aBOnzNCngyS_pQIW@google.com
Fixes: 8442df2b49 ("x86/bugs: KVM: Add support for SRSO_MSR_FIX")
Reported-by: Michael Larabel <Michael@michaellarabel.com>
Closes: https://www.phoronix.com/review/linux-615-amd-regression
Cc: Borislav Petkov <bp@alien8.de>
Tested-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20250505180300.973137-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8442df2b49 upstream.
Add support for
CPUID Fn8000_0021_EAX[31] (SRSO_MSR_FIX). If this bit is 1, it
indicates that software may use MSR BP_CFG[BpSpecReduce] to mitigate
SRSO.
Enable BpSpecReduce to mitigate SRSO across guest/host boundaries.
Switch back to enabling the bit when virtualization is enabled and to
clear the bit when virtualization is disabled because using a MSR slot
would clear the bit when the guest is exited and any training the guest
has done, would potentially influence the host kernel when execution
enters the kernel and hasn't VMRUN the guest yet.
More detail on the public thread in Link below.
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20241202120416.6054-1-bp@kernel.org
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 877818802c upstream.
If the machine has:
CPUID Fn8000_0021_EAX[30] (SRSO_USER_KERNEL_NO) -- If this bit is 1,
it indicates the CPU is not subject to the SRSO vulnerability across
user/kernel boundaries.
have it fall back to IBPB on VMEXIT only, in the case it is going to run
VMs:
Speculative Return Stack Overflow: Mitigation: IBPB on VMEXIT only
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: https://lore.kernel.org/r/20241202120416.6054-2-bp@kernel.org
[ Harshit: Conflicts resolved as this commit: 7c62c442b6 ("x86/vmscape:
Enumerate VMSCAPE bug") has been applied already to 6.12.y ]
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 132bfcd249 upstream.
On commit 9286dfd573 ("platform/x86: asus-wmi: Fix spurious rfkill on
UX8406MA"), Mathieu adds a quirk for the Zenbook Duo to ignore the code
0x5f (WLAN button disable). On that laptop, this code is triggered when
the device keyboard is attached.
On the ASUS ROG Z13 2025, this code is triggered when pressing the side
button of the device, which is used to open Armoury Crate in Windows.
As this is becoming a pattern, where newer Asus laptops use this keycode
for emitting events, let's convert the wlan ignore quirk to instead
allow emitting codes, so that userspace programs can listen to it and
so that it does not interfere with the rfkill state.
With this patch, the Z13 wil emit KEY_PROG3 and the Duo will remain
unchanged and emit no event. While at it, add a quirk for the Z13 to
switch into tablet mode when removing the keyboard.
Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
Link: https://lore.kernel.org/r/20250808154710.8981-2-lkml@antheas.dev
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2c139a47ef ]
In io_link_skb function, there is a bug where prev_notif is incorrectly
assigned using 'nd' instead of 'prev_nd'. This causes the context
validation check to compare the current notification with itself instead
of comparing it with the previous notification.
Fix by using the correct prev_nd parameter when obtaining prev_notif.
Signed-off-by: Yang Xiuwei <yangxiuwei@kylinos.cn>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Fixes: 6fe4220912 ("io_uring/notif: implement notification stacking")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9574b2330d ]
If an error causes af_alg_sendmsg to abort, ctx->merge may contain
a garbage value from the previous loop. This may then trigger a
crash on the next entry into af_alg_sendmsg when it attempts to do
a merge that can't be done.
Fix this by setting ctx->merge to zero near the start of the loop.
Fixes: 8ff590903d ("crypto: algif_skcipher - User-space interface for skcipher operations")
Reported-by: Muhammad Alifa Ramdhan <ramdhan@starlabs.sg>
Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 93ed9a2951 ]
Fix the following case where the client would end up closing both
deferred files (foo.tmp & foo) after unlink(foo) due to strstr() call
in cifs_close_deferred_file_under_dentry():
fd1 = openat(AT_FDCWD, "foo", O_WRONLY|O_CREAT|O_TRUNC, 0666);
fd2 = openat(AT_FDCWD, "foo.tmp", O_WRONLY|O_CREAT|O_TRUNC, 0666);
close(fd1);
close(fd2);
unlink("foo");
Fixes: e3fc065682 ("cifs: Deferred close performance improvements")
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Cc: Frank Sorenson <sorenson@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Partially based on commit 98b6fa62c8 upstream.
This can be triggered by userspace, so just drop it. The condition
is appropriately handled.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit df8922afc3 upstream.
A recent commit:
fc582cd26e ("io_uring/msg_ring: ensure io_kiocb freeing is deferred for RCU")
fixed an issue with not deferring freeing of io_kiocb structs that
msg_ring allocates to after the current RCU grace period. But this only
covers requests that don't end up in the allocation cache. If a request
goes into the alloc cache, it can get reused before it is sane to do so.
A recent syzbot report would seem to indicate that there's something
there, however it may very well just be because of the KASAN poisoning
that the alloc_cache handles manually.
Rather than attempt to make the alloc_cache sane for that use case, just
drop the usage of the alloc_cache for msg_ring request payload data.
Fixes: 50cf5f3842 ("io_uring/msg_ring: add an alloc cache for io_kiocb entries")
Link: https://lore.kernel.org/io-uring/68cc2687.050a0220.139b6.0005.GAE@google.com/
Reported-by: syzbot+baa2e0f4e02df602583e@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 3539b1467e upstream.
When running task_work for an exiting task, rather than perform the
issue retry attempt, the task_work is canceled. However, this isn't
done for a ring that has been closed. This can lead to requests being
successfully completed post the ring being closed, which is somewhat
confusing and surprising to an application.
Rather than just check the task exit state, also include the ring
ref state in deciding whether or not to terminate a given request when
run from task_work.
Cc: stable@vger.kernel.org # 6.1+
Link: https://github.com/axboe/liburing/discussions/1459
Reported-by: Benedek Thaler <thaler@thaler.hu>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Parts of commit b6f58a3f4a upstream.
Backport io_should_terminate_tw() helper to judge whether task_work
should be run or terminated.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit df3b8ca604 upstream.
When the taks that submitted a request is dying, a task work for that
request might get run by a kernel thread or even worse by a half
dismantled task. We can't just cancel the task work without running the
callback as the cmd might need to do some clean up, so pass a flag
instead. If set, it's not safe to access any task resources and the
callback is expected to cancel the cmd ASAP.
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8708c5d8b3 upstream.
The disconnect test-case, with 'plain' TCP sockets generates spurious
errors, e.g.
07 ns1 TCP -> ns1 (dead:beef:1::1:10006) MPTCP
read: Connection reset by peer
read: Connection reset by peer
(duration 155ms) [FAIL] client exit code 3, server 3
netns ns1-FloSdv (listener) socket stat for 10006:
TcpActiveOpens 2 0.0
TcpPassiveOpens 2 0.0
TcpEstabResets 2 0.0
TcpInSegs 274 0.0
TcpOutSegs 276 0.0
TcpOutRsts 3 0.0
TcpExtPruneCalled 2 0.0
TcpExtRcvPruned 1 0.0
TcpExtTCPPureAcks 104 0.0
TcpExtTCPRcvCollapsed 2 0.0
TcpExtTCPBacklogCoalesce 42 0.0
TcpExtTCPRcvCoalesce 43 0.0
TcpExtTCPChallengeACK 1 0.0
TcpExtTCPFromZeroWindowAdv 42 0.0
TcpExtTCPToZeroWindowAdv 41 0.0
TcpExtTCPWantZeroWindowAdv 13 0.0
TcpExtTCPOrigDataSent 164 0.0
TcpExtTCPDelivered 165 0.0
TcpExtTCPRcvQDrop 1 0.0
In the failing scenarios (TCP -> MPTCP), the involved sockets are
actually plain TCP ones, as fallbacks for passive sockets at 2WHS time
cause the MPTCP listeners to actually create 'plain' TCP sockets.
Similar to commit 218cc16632 ("selftests: mptcp: avoid spurious errors
on disconnect"), the root cause is in the user-space bits: the test
program tries to disconnect as soon as all the pending data has been
spooled, generating an RST. If such option reaches the peer before the
connection has reached the closed status, the TCP socket will report an
error to the user-space, as per protocol specification, causing the
above failure. Note that it looks like this issue got more visible since
the "tcp: receiver changes" series from commit 06baf9bfa6 ("Merge
branch 'tcp-receiver-changes'").
Address the issue by explicitly waiting for the TCP sockets (-t) to
reach a closed status before performing the disconnect. More precisely,
the test program now waits for plain TCP sockets or TCP subflows in
addition to the MPTCP sockets that were already monitored.
While at it, use 'ss' with '-n' to avoid resolving service names, which
is not needed here.
Fixes: 218cc16632 ("selftests: mptcp: avoid spurious errors on disconnect")
Cc: stable@vger.kernel.org
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-fix-sft-connect-v1-3-d40e77cbbf02@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 14e22b43df upstream.
IO errors were correctly printed to stderr, and propagated up to the
main loop for the server side, but the returned value was ignored. As a
consequence, the program for the listener side was no longer exiting
with an error code in case of IO issues.
Because of that, some issues might not have been seen. But very likely,
most issues either had an effect on the client side, or the file
transfer was not the expected one, e.g. the connection got reset before
the end. Still, it is better to fix this.
The main consequence of this issue is the error that was reported by the
selftests: the received and sent files were different, and the MIB
counters were not printed. Also, when such errors happened during the
'disconnect' tests, the program tried to continue until the timeout.
Now when an IO error is detected, the program exits directly with an
error.
Fixes: 05be5e273c ("selftests: mptcp: add disconnect tests")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-fix-sft-connect-v1-2-d40e77cbbf02@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f755be0b1f upstream.
When the MPTCP DATA FIN have been ACKed, there is no more MPTCP related
metadata to exchange, and all subflows can be safely shutdown.
Before this patch, the subflows were actually terminated at 'close()'
time. That's certainly fine most of the time, but not when the userspace
'shutdown()' a connection, without close()ing it. When doing so, the
subflows were staying in LAST_ACK state on one side -- and consequently
in FIN_WAIT2 on the other side -- until the 'close()' of the MPTCP
socket.
Now, when the DATA FIN have been ACKed, all subflows are shutdown. A
consequence of this is that the TCP 'FIN' flag can be set earlier now,
but the end result is the same. This affects the packetdrill tests
looking at the end of the MPTCP connections, but for a good reason.
Note that tcp_shutdown() will check the subflow state, so no need to do
that again before calling it.
Fixes: 3721b9b646 ("mptcp: Track received DATA_FIN sequence number and add related helpers")
Cc: stable@vger.kernel.org
Fixes: 16a9a9da17 ("mptcp: Add helper to process acks of DATA_FIN")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-fix-sft-connect-v1-1-d40e77cbbf02@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4351ca3fcb upstream.
We need to increment i_fastreg_wrs before we bail out from
rds_ib_post_reg_frmr().
We have a fixed budget of how many FRWR operations that can be
outstanding using the dedicated QP used for memory registrations and
de-registrations. This budget is enforced by the atomic_t
i_fastreg_wrs. If we bail out early in rds_ib_post_reg_frmr(), we will
"leak" the possibility of posting an FRWR operation, and if that
accumulates, no FRWR operation can be carried out.
Fixes: 1659185fb4 ("RDS: IB: Support Fastreg MR (FRMR) memory registration mode")
Fixes: 3a2886cca7 ("net/rds: Keep track of and wait for FRWR segments in use upon shutdown")
Cc: stable@vger.kernel.org
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Link: https://patch.msgid.link/20250911133336.451212-1-haakon.bugge@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b6f56a44e4 upstream.
Since commit 7d5e9737ef ("net: rfkill: gpio: get the name and type from
device property") rfkill_find_type() gets called with the possibly
uninitialized "const char *type_name;" local variable.
On x86 systems when rfkill-gpio binds to a "BCM4752" or "LNV4752"
acpi_device, the rfkill->type is set based on the ACPI acpi_device_id:
rfkill->type = (unsigned)id->driver_data;
and there is no "type" property so device_property_read_string() will fail
and leave type_name uninitialized, leading to a potential crash.
rfkill_find_type() does accept a NULL pointer, fix the potential crash
by initializing type_name to NULL.
Note likely sofar this has not been caught because:
1. Not many x86 machines actually have a "BCM4752"/"LNV4752" acpi_device
2. The stack happened to contain NULL where type_name is stored
Fixes: 7d5e9737ef ("net: rfkill: gpio: get the name and type from device property")
Cc: stable@vger.kernel.org
Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Hans de Goede <hansg@kernel.org>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://patch.msgid.link/20250913113515.21698-1-hansg@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d02e48830e upstream.
Commit 3bbf3565f4 ("svm: Do not intercept CR8 when enable AVIC")
inhibited pre-VMRUN sync of TPR from LAPIC into VMCB::V_TPR in
sync_lapic_to_cr8() when AVIC is active.
AVIC does automatically sync between these two fields, however it does
so only on explicit guest writes to one of these fields, not on a bare
VMRUN.
This meant that when AVIC is enabled host changes to TPR in the LAPIC
state might not get automatically copied into the V_TPR field of VMCB.
This is especially true when it is the userspace setting LAPIC state via
KVM_SET_LAPIC ioctl() since userspace does not have access to the guest
VMCB.
Practice shows that it is the V_TPR that is actually used by the AVIC to
decide whether to issue pending interrupts to the CPU (not TPR in TASKPRI),
so any leftover value in V_TPR will cause serious interrupt delivery issues
in the guest when AVIC is enabled.
Fix this issue by doing pre-VMRUN TPR sync from LAPIC into VMCB::V_TPR
even when AVIC is enabled.
Fixes: 3bbf3565f4 ("svm: Do not intercept CR8 when enable AVIC")
Cc: stable@vger.kernel.org
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org>
Link: https://lore.kernel.org/r/c231be64280b1461e854e1ce3595d70cde3a2e9d.1756139678.git.maciej.szmigiero@oracle.com
[sean: tag for stable@]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 68f27f7c77 upstream.
If earlier opening of source graph fails (e.g. ADSP rejects due to
incorrect audioreach topology), the graph is closed and
"dai_data->graph[dai->id]" is assigned NULL. Preparing the DAI for sink
graph continues though and next call to q6apm_lpass_dai_prepare()
receives dai_data->graph[dai->id]=NULL leading to NULL pointer
exception:
qcom-apm gprsvc:service:2:1: Error (1) Processing 0x01001002 cmd
qcom-apm gprsvc:service:2:1: DSP returned error[1001002] 1
q6apm-lpass-dais 30000000.remoteproc:glink-edge:gpr:service@1:bedais: fail to start APM port 78
q6apm-lpass-dais 30000000.remoteproc:glink-edge:gpr:service@1:bedais: ASoC: error at snd_soc_pcm_dai_prepare on TX_CODEC_DMA_TX_3: -22
Unable to handle kernel NULL pointer dereference at virtual address 00000000000000a8
...
Call trace:
q6apm_graph_media_format_pcm+0x48/0x120 (P)
q6apm_lpass_dai_prepare+0x110/0x1b4
snd_soc_pcm_dai_prepare+0x74/0x108
__soc_pcm_prepare+0x44/0x160
dpcm_be_dai_prepare+0x124/0x1c0
Fixes: 30ad723b93 ("ASoC: qdsp6: audioreach: add q6apm lpass dai support")
Cc: stable@vger.kernel.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com>
Message-ID: <20250904101849.121503-2-krzysztof.kozlowski@linaro.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 96fa515e70 upstream.
[BUG]
Inside check_inode_ref(), we need to make sure every structure,
including the btrfs_inode_extref header, is covered by the item. But
our code is incorrectly using "sizeof(iref)", where @iref is just a
pointer.
This means "sizeof(iref)" will always be "sizeof(void *)", which is much
smaller than "sizeof(struct btrfs_inode_extref)".
This will allow some bad inode extrefs to sneak in, defeating tree-checker.
[FIX]
Fix the typo by calling "sizeof(*iref)", which is the same as
"sizeof(struct btrfs_inode_extref)", and will be the correct behavior we
want.
Fixes: 71bf92a9b8 ("btrfs: tree-checker: Add check for INODE_REF")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1e56310b40 upstream.
The AMD IOMMU host page table implementation supports dynamic page table levels
(up to 6 levels), starting with a 3-level configuration that expands based on
IOVA address. The kernel maintains a root pointer and current page table level
to enable proper page table walks in alloc_pte()/fetch_pte() operations.
The IOMMU IOVA allocator initially starts with 32-bit address and onces its
exhuasted it switches to 64-bit address (max address is determined based
on IOMMU and device DMA capability). To support larger IOVA, AMD IOMMU
driver increases page table level.
But in unmap path (iommu_v1_unmap_pages()), fetch_pte() reads
pgtable->[root/mode] without lock. So its possible that in exteme corner case,
when increase_address_space() is updating pgtable->[root/mode], fetch_pte()
reads wrong page table level (pgtable->mode). It does compare the value with
level encoded in page table and returns NULL. This will result is
iommu_unmap ops to fail and upper layer may retry/log WARN_ON.
CPU 0 CPU 1
------ ------
map pages unmap pages
alloc_pte() -> increase_address_space() iommu_v1_unmap_pages() -> fetch_pte()
pgtable->root = pte (new root value)
READ pgtable->[mode/root]
Reads new root, old mode
Updates mode (pgtable->mode += 1)
Since Page table level updates are infrequent and already synchronized with a
spinlock, implement seqcount to enable lock-free read operations on the read path.
Fixes: 754265bcab ("iommu/amd: Fix race in increase_address_space()")
Reported-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Cc: stable@vger.kernel.org
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dce043c07c upstream.
switch_to_super_page() assumes the memory range it's working on is aligned
to the target large page level. Unfortunately, __domain_mapping() doesn't
take this into account when using it, and will pass unaligned ranges
ultimately freeing a PTE range larger than expected.
Take for example a mapping with the following iov_pfn range [0x3fe400,
0x4c0600), which should be backed by the following mappings:
iov_pfn [0x3fe400, 0x3fffff] covered by 2MiB pages
iov_pfn [0x400000, 0x4bffff] covered by 1GiB pages
iov_pfn [0x4c0000, 0x4c05ff] covered by 2MiB pages
Under this circumstance, __domain_mapping() will pass [0x400000, 0x4c05ff]
to switch_to_super_page() at a 1 GiB granularity, which will in turn
free PTEs all the way to iov_pfn 0x4fffff.
Mitigate this by rounding down the iov_pfn range passed to
switch_to_super_page() in __domain_mapping()
to the target large page level.
Additionally add range alignment checks to switch_to_super_page.
Fixes: 9906b9352a ("iommu/vt-d: Avoid duplicate removing in __domain_mapping()")
Signed-off-by: Eugene Koira <eugkoira@amazon.com>
Cc: stable@vger.kernel.org
Reviewed-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20250826143816.38686-1-eugkoira@amazon.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a9d13433fe upstream.
ARCH_STRICT_ALIGN is used for hardware without UAL, now it only control
the -mstrict-align flag. However, ACPI structures are packed by default
so will cause unaligned accesses.
To avoid this, define ACPI_MISALIGNMENT_NOT_SUPPORTED in asm/acenv.h to
align ACPI structures if ARCH_STRICT_ALIGN enabled.
Cc: stable@vger.kernel.org
Reported-by: Binbin Zhou <zhoubinbin@loongson.cn>
Suggested-by: Xi Ruoyao <xry111@xry111.site>
Suggested-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ac398f5707 upstream.
Add a NULL-pointer check after the kcalloc() call in init_vdso(). If
allocation fails, return -ENOMEM to prevent a possible dereference of
vdso_info.code_mapping.pages when it is NULL.
Cc: stable@vger.kernel.org
Fixes: 2ed119aef6 ("LoongArch: Set correct size for vDSO code mapping")
Signed-off-by: Guangshuo Li <202321181@mail.sdu.edu.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 677d4a52d4 upstream.
When testing the kernel live patching with "modprobe livepatch-sample",
there is a timeout over 15 seconds from "starting patching transition"
to "patching complete". The dmesg command shows "unreliable stack" for
user tasks in debug mode, here is one of the messages:
livepatch: klp_try_switch_task: bash:1193 has an unreliable stack
The "unreliable stack" is because it can not unwind from do_syscall()
to its previous frame handle_syscall(). It should use fp to find the
original stack top due to secondary stack in do_syscall(), but fp is
not used for some other functions, then fp can not be restored by the
next frame of do_syscall(), so it is necessary to save fp if task is
not current, in order to get the stack top of do_syscall().
Here are the call chains:
klp_enable_patch()
klp_try_complete_transition()
klp_try_switch_task()
klp_check_and_switch_task()
klp_check_stack()
stack_trace_save_tsk_reliable()
arch_stack_walk_reliable()
When executing "rmmod livepatch-sample", there exists a similar issue.
With this patch, it takes a short time for patching and unpatching.
Before:
# modprobe livepatch-sample
# dmesg -T | tail -3
[Sat Sep 6 11:00:20 2025] livepatch: 'livepatch_sample': starting patching transition
[Sat Sep 6 11:00:35 2025] livepatch: signaling remaining tasks
[Sat Sep 6 11:00:36 2025] livepatch: 'livepatch_sample': patching complete
# echo 0 > /sys/kernel/livepatch/livepatch_sample/enabled
# rmmod livepatch_sample
rmmod: ERROR: Module livepatch_sample is in use
# rmmod livepatch_sample
# dmesg -T | tail -3
[Sat Sep 6 11:06:05 2025] livepatch: 'livepatch_sample': starting unpatching transition
[Sat Sep 6 11:06:20 2025] livepatch: signaling remaining tasks
[Sat Sep 6 11:06:21 2025] livepatch: 'livepatch_sample': unpatching complete
After:
# modprobe livepatch-sample
# dmesg -T | tail -2
[Tue Sep 16 16:19:30 2025] livepatch: 'livepatch_sample': starting patching transition
[Tue Sep 16 16:19:31 2025] livepatch: 'livepatch_sample': patching complete
# echo 0 > /sys/kernel/livepatch/livepatch_sample/enabled
# rmmod livepatch_sample
# dmesg -T | tail -2
[Tue Sep 16 16:19:36 2025] livepatch: 'livepatch_sample': starting unpatching transition
[Tue Sep 16 16:19:37 2025] livepatch: 'livepatch_sample': unpatching complete
Cc: stable@vger.kernel.org # v6.9+
Fixes: 199cc14cb4 ("LoongArch: Add kernel livepatching support")
Reported-by: Xi Zhang <zhangxi@kylinos.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 539d7344d4 upstream.
When compiling with LLVM and CONFIG_RUST is set, there exists the
following objtool warning:
rust/compiler_builtins.o: warning: objtool: __rust__unordsf2(): unexpected end of section .text.unlikely.
objdump shows that the end of section .text.unlikely is an atomic
instruction:
amswap.w $zero, $ra, $zero
According to the LoongArch Reference Manual, if the amswap.w atomic
memory access instruction has the same register number as rd and rj,
the execution will trigger an Instruction Non-defined Exception, so
mark the above instruction as INSN_BUG type to fix the warning.
Cc: stable@vger.kernel.org
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f5003098e2 upstream.
Loongson-3A6000 and 3C6000 CPUs also support unaligned memory access, so
the current description is out of date to some extent.
Actually, all of Loongson-3 series processors based on LoongArch support
unaligned memory access, this hardware capability is indicated by the bit
20 (UAL) of CPUCFG1 register, update the help info to reflect the reality.
Cc: stable@vger.kernel.org
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a03db236ae upstream.
In the current implementation of longterm pin_user_pages(), we invoke
collect_longterm_unpinnable_folios(). This function iterates through the
list to check whether each folio belongs to the "longterm_unpinnabled"
category. The folios in this list essentially correspond to a contiguous
region of userspace addresses, with each folio representing a physical
address in increments of PAGESIZE.
If this userspace address range is mapped with large folio, we can
optimize the performance of function collect_longterm_unpinnable_folios()
by reducing the using of READ_ONCE() invoked in
pofs_get_folio()->page_folio()->_compound_head().
Also, we can simplify the logic of collect_longterm_unpinnable_folios().
Instead of comparing with prev_folio after calling pofs_get_folio(), we
can check whether the next page is within the same folio.
The performance test results, based on v6.15, obtained through the
gup_test tool from the kernel source tree are as follows. We achieve an
improvement of over 66% for large folio with pagesize=2M. For small
folio, we have only observed a very slight degradation in performance.
Without this patch:
[root@localhost ~] ./gup_test -HL -m 8192 -n 512
TAP version 13
1..1
# PIN_LONGTERM_BENCHMARK: Time: get:14391 put:10858 us#
ok 1 ioctl status 0
# Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
[root@localhost ~]# ./gup_test -LT -m 8192 -n 512
TAP version 13
1..1
# PIN_LONGTERM_BENCHMARK: Time: get:130538 put:31676 us#
ok 1 ioctl status 0
# Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
With this patch:
[root@localhost ~] ./gup_test -HL -m 8192 -n 512
TAP version 13
1..1
# PIN_LONGTERM_BENCHMARK: Time: get:4867 put:10516 us#
ok 1 ioctl status 0
# Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
[root@localhost ~]# ./gup_test -LT -m 8192 -n 512
TAP version 13
1..1
# PIN_LONGTERM_BENCHMARK: Time: get:131798 put:31328 us#
ok 1 ioctl status 0
# Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
[lizhe.67@bytedance.com: whitespace fix, per David]
Link: https://lkml.kernel.org/r/20250606091917.91384-1-lizhe.67@bytedance.com
Link: https://lkml.kernel.org/r/20250606023742.58344-1-lizhe.67@bytedance.com
Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a865562646 upstream.
These commands
modprobe brd rd_size=1048576
vgcreate vg /dev/ram*
lvcreate -m4 -L10 -n lv vg
trigger the following warnings:
device-mapper: table: 252:10: adding target device (start sect 0 len 24576) caused an alignment inconsistency
device-mapper: table: 252:10: adding target device (start sect 0 len 24576) caused an alignment inconsistency
The warnings are caused by the fact that io_min is 512 and physical block
size is 4096.
If there's chunk-less raid, such as raid1, io_min shouldn't be set to zero
because it would be raised to 512 and it would trigger the warning.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2c334d0384 upstream.
Since commit
commit f16d9fb6cf ("power: supply: bq27xxx: Retrieve again when busy")
the console log of some devices with hdq enabled but no bq27000 battery
(like e.g. the Pandaboard) is flooded with messages like:
[ 34.247833] power_supply bq27000-battery: driver failed to report 'status' property: -1
as soon as user-space is finding a /sys entry and trying to read the
"status" property.
It turns out that the offending commit changes the logic to now return the
value of cache.flags if it is <0. This is likely under the assumption that
it is an error number. In normal errors from bq27xxx_read() this is indeed
the case.
But there is special code to detect if no bq27000 is installed or accessible
through hdq/1wire and wants to report this. In that case, the cache.flags
are set historically by
commit 3dd843e1c2 ("bq27000: report missing device better.")
to constant -1 which did make reading properties return -ENODEV. So everything
appeared to be fine before the return value was passed upwards.
Now the -1 is returned as -EPERM instead of -ENODEV, triggering the error
condition in power_supply_format_property() which then floods the console log.
So we change the detection of missing bq27000 battery to simply set
cache.flags = -ENODEV
instead of -1.
Fixes: f16d9fb6cf ("power: supply: bq27xxx: Retrieve again when busy")
Cc: Jerry Lv <Jerry.Lv@axis.com>
Cc: stable@vger.kernel.org
Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com>
Link: https://lore.kernel.org/r/692f79eb6fd541adb397038ea6e750d4de2deddf.1755945297.git.hns@goldelico.com
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1b34cbbf4f upstream.
Issuing two writes to the same af_alg socket is bogus as the
data will be interleaved in an unpredictable fashion. Furthermore,
concurrent writes may create inconsistencies in the internal
socket state.
Disallow this by adding a new ctx->write field that indiciates
exclusive ownership for writing.
Fixes: 8ff590903d ("crypto: algif_skcipher - User-space interface for skcipher operations")
Reported-by: Muhammad Alifa Ramdhan <ramdhan@starlabs.sg>
Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 025e87f8ea upstream.
When accessing one of the files under /sys/fs/nilfs2/features when
CONFIG_CFI_CLANG is enabled, there is a CFI violation:
CFI failure at kobj_attr_show+0x59/0x80 (target: nilfs_feature_revision_show+0x0/0x30; expected type: 0xfc392c4d)
...
Call Trace:
<TASK>
sysfs_kf_seq_show+0x2a6/0x390
? __cfi_kobj_attr_show+0x10/0x10
kernfs_seq_show+0x104/0x15b
seq_read_iter+0x580/0xe2b
...
When the kobject of the kset for /sys/fs/nilfs2 is initialized, its ktype
is set to kset_ktype, which has a ->sysfs_ops of kobj_sysfs_ops. When
nilfs_feature_attr_group is added to that kobject via
sysfs_create_group(), the kernfs_ops of each files is sysfs_file_kfops_rw,
which will call sysfs_kf_seq_show() when ->seq_show() is called.
sysfs_kf_seq_show() in turn calls kobj_attr_show() through
->sysfs_ops->show(). kobj_attr_show() casts the provided attribute out to
a 'struct kobj_attribute' via container_of() and calls ->show(), resulting
in the CFI violation since neither nilfs_feature_revision_show() nor
nilfs_feature_README_show() match the prototype of ->show() in 'struct
kobj_attribute'.
Resolve the CFI violation by adjusting the second parameter in
nilfs_feature_{revision,README}_show() from 'struct attribute' to 'struct
kobj_attribute' to match the expected prototype.
Link: https://lkml.kernel.org/r/20250906144410.22511-1-konishi.ryusuke@gmail.com
Fixes: aebe17f684 ("nilfs2: add /sys/fs/nilfs2/features group")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202509021646.bc78d9ef-lkp@intel.com/
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b0823d5fba upstream.
The perf_fuzzer found a hard-lockup crash on a RaptorLake machine:
Oops: general protection fault, maybe for address 0xffff89aeceab400: 0000
CPU: 23 UID: 0 PID: 0 Comm: swapper/23
Tainted: [W]=WARN
Hardware name: Dell Inc. Precision 9660/0VJ762
RIP: 0010:native_read_pmc+0x7/0x40
Code: cc e8 8d a9 01 00 48 89 03 5b cd cc cc cc cc 0f 1f ...
RSP: 000:fffb03100273de8 EFLAGS: 00010046
....
Call Trace:
<TASK>
icl_update_topdown_event+0x165/0x190
? ktime_get+0x38/0xd0
intel_pmu_read_event+0xf9/0x210
__perf_event_read+0xf9/0x210
CPUs 16-23 are E-core CPUs that don't support the perf metrics feature.
The icl_update_topdown_event() should not be invoked on these CPUs.
It's a regression of commit:
f9bdf1f953 ("perf/x86/intel: Avoid disable PMU if !cpuc->enabled in sample read")
The bug introduced by that commit is that the is_topdown_event() function
is mistakenly used to replace the is_topdown_count() call to check if the
topdown functions for the perf metrics feature should be invoked.
Fix it.
Fixes: f9bdf1f953 ("perf/x86/intel: Avoid disable PMU if !cpuc->enabled in sample read")
Closes: https://lore.kernel.org/lkml/352f0709-f026-cd45-e60c-60dfd97f73f3@maine.edu/
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Cc: stable@vger.kernel.org # v6.15+
Link: https://lore.kernel.org/r/20250612143818.2889040-1-kan.liang@linux.intel.com
[ omitted PEBS check ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Angel Adetula <angeladetula@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f8b4687151 ]
The original code relies on cancel_delayed_work() in otx2_ptp_destroy(),
which does not ensure that the delayed work item synctstamp_work has fully
completed if it was already running. This leads to use-after-free scenarios
where otx2_ptp is deallocated by otx2_ptp_destroy(), while synctstamp_work
remains active and attempts to dereference otx2_ptp in otx2_sync_tstamp().
Furthermore, the synctstamp_work is cyclic, the likelihood of triggering
the bug is nonnegligible.
A typical race condition is illustrated below:
CPU 0 (cleanup) | CPU 1 (delayed work callback)
otx2_remove() |
otx2_ptp_destroy() | otx2_sync_tstamp()
cancel_delayed_work() |
kfree(ptp) |
| ptp = container_of(...); //UAF
| ptp-> //UAF
This is confirmed by a KASAN report:
BUG: KASAN: slab-use-after-free in __run_timer_base.part.0+0x7d7/0x8c0
Write of size 8 at addr ffff88800aa09a18 by task bash/136
...
Call Trace:
<IRQ>
dump_stack_lvl+0x55/0x70
print_report+0xcf/0x610
? __run_timer_base.part.0+0x7d7/0x8c0
kasan_report+0xb8/0xf0
? __run_timer_base.part.0+0x7d7/0x8c0
__run_timer_base.part.0+0x7d7/0x8c0
? __pfx___run_timer_base.part.0+0x10/0x10
? __pfx_read_tsc+0x10/0x10
? ktime_get+0x60/0x140
? lapic_next_event+0x11/0x20
? clockevents_program_event+0x1d4/0x2a0
run_timer_softirq+0xd1/0x190
handle_softirqs+0x16a/0x550
irq_exit_rcu+0xaf/0xe0
sysvec_apic_timer_interrupt+0x70/0x80
</IRQ>
...
Allocated by task 1:
kasan_save_stack+0x24/0x50
kasan_save_track+0x14/0x30
__kasan_kmalloc+0x7f/0x90
otx2_ptp_init+0xb1/0x860
otx2_probe+0x4eb/0xc30
local_pci_probe+0xdc/0x190
pci_device_probe+0x2fe/0x470
really_probe+0x1ca/0x5c0
__driver_probe_device+0x248/0x310
driver_probe_device+0x44/0x120
__driver_attach+0xd2/0x310
bus_for_each_dev+0xed/0x170
bus_add_driver+0x208/0x500
driver_register+0x132/0x460
do_one_initcall+0x89/0x300
kernel_init_freeable+0x40d/0x720
kernel_init+0x1a/0x150
ret_from_fork+0x10c/0x1a0
ret_from_fork_asm+0x1a/0x30
Freed by task 136:
kasan_save_stack+0x24/0x50
kasan_save_track+0x14/0x30
kasan_save_free_info+0x3a/0x60
__kasan_slab_free+0x3f/0x50
kfree+0x137/0x370
otx2_ptp_destroy+0x38/0x80
otx2_remove+0x10d/0x4c0
pci_device_remove+0xa6/0x1d0
device_release_driver_internal+0xf8/0x210
pci_stop_bus_device+0x105/0x150
pci_stop_and_remove_bus_device_locked+0x15/0x30
remove_store+0xcc/0xe0
kernfs_fop_write_iter+0x2c3/0x440
vfs_write+0x871/0xd70
ksys_write+0xee/0x1c0
do_syscall_64+0xac/0x280
entry_SYSCALL_64_after_hwframe+0x77/0x7f
...
Replace cancel_delayed_work() with cancel_delayed_work_sync() to ensure
that the delayed work item is properly canceled before the otx2_ptp is
deallocated.
This bug was initially identified through static analysis. To reproduce
and test it, I simulated the OcteonTX2 PCI device in QEMU and introduced
artificial delays within the otx2_sync_tstamp() function to increase the
likelihood of triggering the bug.
Fixes: 2958d17a89 ("octeontx2-pf: Add support for ptp 1-step mode on CN10K silicon")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cfa7d9b1e3 ]
The original code uses cancel_delayed_work() in cnic_cm_stop_bnx2x_hw(),
which does not guarantee that the delayed work item 'delete_task' has
fully completed if it was already running. Additionally, the delayed work
item is cyclic, the flush_workqueue() in cnic_cm_stop_bnx2x_hw() only
blocks and waits for work items that were already queued to the
workqueue prior to its invocation. Any work items submitted after
flush_workqueue() is called are not included in the set of tasks that the
flush operation awaits. This means that after the cyclic work items have
finished executing, a delayed work item may still exist in the workqueue.
This leads to use-after-free scenarios where the cnic_dev is deallocated
by cnic_free_dev(), while delete_task remains active and attempt to
dereference cnic_dev in cnic_delete_task().
A typical race condition is illustrated below:
CPU 0 (cleanup) | CPU 1 (delayed work callback)
cnic_netdev_event() |
cnic_stop_hw() | cnic_delete_task()
cnic_cm_stop_bnx2x_hw() | ...
cancel_delayed_work() | /* the queue_delayed_work()
flush_workqueue() | executes after flush_workqueue()*/
| queue_delayed_work()
cnic_free_dev(dev)//free | cnic_delete_task() //new instance
| dev = cp->dev; //use
Replace cancel_delayed_work() with cancel_delayed_work_sync() to ensure
that the cyclic delayed work item is properly canceled and that any
ongoing execution of the work item completes before the cnic_dev is
deallocated. Furthermore, since cancel_delayed_work_sync() uses
__flush_work(work, true) to synchronously wait for any currently
executing instance of the work item to finish, the flush_workqueue()
becomes redundant and should be removed.
This bug was identified through static analysis. To reproduce the issue
and validate the fix, I simulated the cnic PCI device in QEMU and
introduced intentional delays — such as inserting calls to ssleep()
within the cnic_delete_task() function — to increase the likelihood
of triggering the bug.
Fixes: fdf24086f4 ("cnic: Defer iscsi connection cleanup")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cca7b1cfd7 ]
The expression `(conf->instr_type == 64) << iq_no` can overflow because
`iq_no` may be as high as 64 (`CN23XX_MAX_RINGS_PER_PF`). Casting the
operand to `u64` ensures correct 64-bit arithmetic.
Fixes: f21fb3ed36 ("Add support of Cavium Liquidio ethernet adapters")
Signed-off-by: Alexey Nepomnyashih <sdl@nppct.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0aeb54ac4c ]
Normally we wait for the socket to buffer up the whole record
before we service it. If the socket has a tiny buffer, however,
we read out the data sooner, to prevent connection stalls.
Make sure that we abort the connection when we find out late
that the record is actually invalid. Retrying the parsing is
fine in itself but since we copy some more data each time
before we parse we can overflow the allocated skb space.
Constructing a scenario in which we're under pressure without
enough data in the socket to parse the length upfront is quite
hard. syzbot figured out a way to do this by serving us the header
in small OOB sends, and then filling in the recvbuf with a large
normal send.
Make sure that tls_rx_msg_size() aborts strp, if we reach
an invalid record there's really no way to recover.
Reported-by: Lee Jones <lee@kernel.org>
Fixes: 84c61fe1a7 ("tls: rx: do not use the standard strparser")
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250917002814.1743558-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a72175c985 ]
Currently, VF MAC address info is not updated when the MAC address is
configured from VF, and it is not cleared when the VF is removed. This
leads to stale or missing MAC information in the PF, which may cause
incorrect state tracking or inconsistencies when VFs are hot-plugged
or reassigned.
Fix this by:
- storing the VF MAC address in the PF when it is set from VF
- clearing the stored VF MAC address when the VF is removed
This ensures that the PF always has correct VF MAC state.
Fixes: cde29af9e6 ("octeon_ep: add PF-VF mailbox communication")
Signed-off-by: Sathesh B Edara <sedara@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250916133207.21737-1-sedara@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a8ba87f04c ]
Unlike IPv4, IPv6 routing strictly requires the source address to be valid
on the outgoing interface. If the NS target is set to a remote VLAN interface,
and the source address is also configured on a VLAN over a bond interface,
setting the oif to the bond device will fail to retrieve the correct
destination route.
Fix this by not setting the oif to the bond device when retrieving the NS
target destination. This allows the correct destination device (the VLAN
interface) to be determined, so that bond_verify_device_path can return the
proper VLAN tags for sending NS messages.
Reported-by: David Wilder <wilder@us.ibm.com>
Closes: https://lore.kernel.org/netdev/aGOKggdfjv0cApTO@fedora/
Suggested-by: Jay Vosburgh <jv@jvosburgh.net>
Tested-by: David Wilder <wilder@us.ibm.com>
Acked-by: Jay Vosburgh <jv@jvosburgh.net>
Fixes: 4e24be018e ("bonding: add new parameter ns_targets")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250916080127.430626-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6b4be64fd9 ]
The function mlx5_uplink_netdev_get() gets the uplink netdevice
pointer from mdev->mlx5e_res.uplink_netdev. However, the netdevice can
be removed and its pointer cleared when unbound from the mlx5_core.eth
driver. This results in a NULL pointer, causing a kernel panic.
BUG: unable to handle page fault for address: 0000000000001300
at RIP: 0010:mlx5e_vport_rep_load+0x22a/0x270 [mlx5_core]
Call Trace:
<TASK>
mlx5_esw_offloads_rep_load+0x68/0xe0 [mlx5_core]
esw_offloads_enable+0x593/0x910 [mlx5_core]
mlx5_eswitch_enable_locked+0x341/0x420 [mlx5_core]
mlx5_devlink_eswitch_mode_set+0x17e/0x3a0 [mlx5_core]
devlink_nl_eswitch_set_doit+0x60/0xd0
genl_family_rcv_msg_doit+0xe0/0x130
genl_rcv_msg+0x183/0x290
netlink_rcv_skb+0x4b/0xf0
genl_rcv+0x24/0x40
netlink_unicast+0x255/0x380
netlink_sendmsg+0x1f3/0x420
__sock_sendmsg+0x38/0x60
__sys_sendto+0x119/0x180
do_syscall_64+0x53/0x1d0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
Ensure the pointer is valid before use by checking it for NULL. If it
is valid, immediately call netdev_hold() to take a reference, and
preventing the netdevice from being freed while it is in use.
Fixes: 7a9fb35e8c ("net/mlx5e: Do not reload ethernet ports when changing eswitch mode")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1757939074-617281-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e37084a260 ]
i40e has a feature which writes to memory location last descriptor
successfully sent. Memory barrier in i40e_clean_tx_irq() was used to
avoid forward-reading descriptor fields in case DD bit was not set.
Having mentioned feature in place implies that such situation will not
happen as we know in advance how many descriptors HW has dealt with.
Besides, this barrier placement was wrong. Idea is to have this
protection *after* reading DD bit from HW descriptor, not before.
Digging through git history showed me that indeed barrier was before DD
bit check, anyways the commit introducing i40e_get_head() should have
wiped it out altogether.
Also, there was one commit doing s/read_barrier_depends/smp_rmb when get
head feature was already in place, but it was only theoretical based on
ixgbe experiences, which is different in these terms as that driver has
to read DD bit from HW descriptor.
Fixes: 1943d8ba95 ("i40e/i40evf: enable hardware feature head write back")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 84bf1ac85a ]
The ice_put_rx_mbuf() function handles calling ice_put_rx_buf() for each
buffer in the current frame. This function was introduced as part of
handling multi-buffer XDP support in the ice driver.
It works by iterating over the buffers from first_desc up to 1 plus the
total number of fragments in the frame, cached from before the XDP program
was executed.
If the hardware posts a descriptor with a size of 0, the logic used in
ice_put_rx_mbuf() breaks. Such descriptors get skipped and don't get added
as fragments in ice_add_xdp_frag. Since the buffer isn't counted as a
fragment, we do not iterate over it in ice_put_rx_mbuf(), and thus we don't
call ice_put_rx_buf().
Because we don't call ice_put_rx_buf(), we don't attempt to re-use the
page or free it. This leaves a stale page in the ring, as we don't
increment next_to_alloc.
The ice_reuse_rx_page() assumes that the next_to_alloc has been incremented
properly, and that it always points to a buffer with a NULL page. Since
this function doesn't check, it will happily recycle a page over the top
of the next_to_alloc buffer, losing track of the old page.
Note that this leak only occurs for multi-buffer frames. The
ice_put_rx_mbuf() function always handles at least one buffer, so a
single-buffer frame will always get handled correctly. It is not clear
precisely why the hardware hands us descriptors with a size of 0 sometimes,
but it happens somewhat regularly with "jumbo frames" used by 9K MTU.
To fix ice_put_rx_mbuf(), we need to make sure to call ice_put_rx_buf() on
all buffers between first_desc and next_to_clean. Borrow the logic of a
similar function in i40e used for this same purpose. Use the same logic
also in ice_get_pgcnts().
Instead of iterating over just the number of fragments, use a loop which
iterates until the current index reaches to the next_to_clean element just
past the current frame. Unlike i40e, the ice_put_rx_mbuf() function does
call ice_put_rx_buf() on the last buffer of the frame indicating the end of
packet.
For non-linear (multi-buffer) frames, we need to take care when adjusting
the pagecnt_bias. An XDP program might release fragments from the tail of
the frame, in which case that fragment page is already released. Only
update the pagecnt_bias for the first descriptor and fragments still
remaining post-XDP program. Take care to only access the shared info for
fragmented buffers, as this avoids a significant cache miss.
The xdp_xmit value only needs to be updated if an XDP program is run, and
only once per packet. Drop the xdp_xmit pointer argument from
ice_put_rx_mbuf(). Instead, set xdp_xmit in the ice_clean_rx_irq() function
directly. This avoids needing to pass the argument and avoids an extra
bit-wise OR for each buffer in the frame.
Move the increment of the ntc local variable to ensure its updated *before*
all calls to ice_get_pgcnts() or ice_put_rx_mbuf(), as the loop logic
requires the index of the element just after the current frame.
Now that we use an index pointer in the ring to identify the packet, we no
longer need to track or cache the number of fragments in the rx_ring.
Cc: Christoph Petrausch <christoph.petrausch@deepl.com>
Cc: Jesper Dangaard Brouer <hawk@kernel.org>
Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
Closes: https://lore.kernel.org/netdev/CAK8fFZ4hY6GUJNENz3wY9jaYLZXGfpr7dnZxzGMYoE44caRbgw@mail.gmail.com/
Fixes: 743bbd93cf ("ice: put Rx buffers after being done with current frame")
Tested-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Tested-by: Priya Singh <priyax.singh@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7e61c89c60 ]
The max_frame and rx_buf_len fields of the VSI set the maximum frame size
for packets on the wire, and configure the size of the Rx buffer. In the
hardware, these are per-queue configuration. Most VSI types use a simple
method to determine the size of the buffers for all queues.
However, VFs may potentially configure different values for each queue.
While the Linux iAVF driver does not do this, it is allowed by the virtchnl
interface.
The current virtchnl code simply sets the per-VSI fields inbetween calls to
ice_vsi_cfg_single_rxq(). This technically works, as these fields are only
ever used when programming the Rx ring, and otherwise not checked again.
However, it is confusing to maintain.
The Rx ring also already has an rx_buf_len field in order to access the
buffer length in the hotpath. It also has extra unused bytes in the ring
structure which we can make use of to store the maximum frame size.
Drop the VSI max_frame and rx_buf_len fields. Add max_frame to the Rx ring,
and slightly re-order rx_buf_len to better fit into the gaps in the
structure layout.
Change the ice_vsi_cfg_frame_size function so that it writes to the ring
fields. Call this function once per ring in ice_vsi_cfg_rxqs(). This is
done over calling it inside the ice_vsi_cfg_rxq(), because
ice_vsi_cfg_rxq() is called in the virtchnl flow where the max_frame and
rx_buf_len have already been configured.
Change the accesses for rx_buf_len and max_frame to all point to the ring
structure. This has the added benefit that ice_vsi_cfg_rxq() no longer has
the surprise side effect of updating ring->rx_buf_len based on the VSI
field.
Update the virtchnl ice_vc_cfg_qs_msg() function to set the ring values
directly, and drop references to the removed VSI fields.
This now makes the VF logic clear, as the ring fields are obviously
per-queue. This reduces the required cognitive load when reasoning about
this logic.
Note that removing the VSI fields does leave a 4 byte gap, but the ice_vsi
structure has many gaps, and its layout is not as critical in the hot path.
The structure may benefit from a more thorough repacking, but no attempt
was made in this change.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Stable-dep-of: 84bf1ac85a ("ice: fix Rx page leak on multi-buffer frames")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 93ab4881a4 ]
`netif_rx()` already increments `rx_dropped` core stat when it fails.
The driver was also updating `ndev->stats.rx_dropped` in the same path.
Since both are reported together via `ip -s -s` command, this resulted
in drops being counted twice in user-visible stats.
Keep the driver update on `if (unlikely(!skb))`, but skip it after
`netif_rx()` errors.
Fixes: caf586e5f2 ("net: add a core netdev->rx_dropped counter")
Signed-off-by: Yeounsu Moon <yyyynoom@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250913060135.35282-3-yyyynoom@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b86418bead ]
This patch fixes several issues in the error reporting of the MPTCP sockopt
selftest:
1. Fix diff not printed: The error messages for counter mismatches had
the actual difference ('diff') as argument, but it was missing in the
format string. Displaying it makes the debugging easier.
2. Fix variable usage: The error check for 'mptcpi_bytes_acked' incorrectly
used 'ret2' (sent bytes) for both the expected value and the difference
calculation. It now correctly uses 'ret' (received bytes), which is the
expected value for bytes_acked.
3. Fix off-by-one in diff: The calculation for the 'mptcpi_rcv_delta' diff
was 's.mptcpi_rcv_delta - ret', which is off-by-one. It has been
corrected to 's.mptcpi_rcv_delta - (ret + 1)' to match the expected
value in the condition above it.
Fixes: 5dcff89e14 ("selftests: mptcp: explicitly tests aggregate counters")
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-5-40171884ade8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 24733e193a ]
The previous commit adds the MPTCP_PM_EV_FLAG_DENY_JOIN_ID0 flag. Make
sure it is correctly announced by the other peer when it has been
received.
pm_nl_ctl will now display 'deny_join_id0:1' when monitoring the events,
and when this flag was set by the other peer.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
Fixes: 702c2f646d ("mptcp: netlink: allow userspace-driven subflow establishment")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-3-40171884ade8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 96939cec99 ]
When a SYN containing the 'C' flag (deny join id0) was received, this
piece of information was not propagated to the path-manager.
Even if this flag is mainly set on the server side, a client can also
tell the server it cannot try to establish new subflows to the client's
initial IP address and port. The server's PM should then record such
info when received, and before sending events about the new connection.
Fixes: df377be387 ("mptcp: add deny_join_id0 in mptcp_options_received")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-1-40171884ade8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 35ae4e8629 ]
After commit 5c3bf6cba7 ("bonding: assign random address if device
address is same as bond"), bonding will erroneously randomize the MAC
address of the first interface added to the bond if fail_over_mac =
follow.
Correct this by additionally testing for the bond being empty before
randomizing the MAC.
Fixes: 5c3bf6cba7 ("bonding: assign random address if device address is same as bond")
Reported-by: Qiuling Ren <qren@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250910024336.400253-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 56c0a2a9dd ]
In the protection override dump path, the firmware can return far too
many GRC elements, resulting in attempting to write past the end of the
previously-kmalloc'ed dump buffer.
This will result in a kernel panic with reason:
BUG: unable to handle kernel paging request at ADDRESS
where "ADDRESS" is just past the end of the protection override dump
buffer. The start address of the buffer is:
p_hwfn->cdev->dbg_features[DBG_FEATURE_PROTECTION_OVERRIDE].dump_buf
and the size of the buffer is buf_size in the same data structure.
The panic can be arrived at from either the qede Ethernet driver path:
[exception RIP: qed_grc_dump_addr_range+0x108]
qed_protection_override_dump at ffffffffc02662ed [qed]
qed_dbg_protection_override_dump at ffffffffc0267792 [qed]
qed_dbg_feature at ffffffffc026aa8f [qed]
qed_dbg_all_data at ffffffffc026b211 [qed]
qed_fw_fatal_reporter_dump at ffffffffc027298a [qed]
devlink_health_do_dump at ffffffff82497f61
devlink_health_report at ffffffff8249cf29
qed_report_fatal_error at ffffffffc0272baf [qed]
qede_sp_task at ffffffffc045ed32 [qede]
process_one_work at ffffffff81d19783
or the qedf storage driver path:
[exception RIP: qed_grc_dump_addr_range+0x108]
qed_protection_override_dump at ffffffffc068b2ed [qed]
qed_dbg_protection_override_dump at ffffffffc068c792 [qed]
qed_dbg_feature at ffffffffc068fa8f [qed]
qed_dbg_all_data at ffffffffc0690211 [qed]
qed_fw_fatal_reporter_dump at ffffffffc069798a [qed]
devlink_health_do_dump at ffffffff8aa95e51
devlink_health_report at ffffffff8aa9ae19
qed_report_fatal_error at ffffffffc0697baf [qed]
qed_hw_err_notify at ffffffffc06d32d7 [qed]
qed_spq_post at ffffffffc06b1011 [qed]
qed_fcoe_destroy_conn at ffffffffc06b2e91 [qed]
qedf_cleanup_fcport at ffffffffc05e7597 [qedf]
qedf_rport_event_handler at ffffffffc05e7bf7 [qedf]
fc_rport_work at ffffffffc02da715 [libfc]
process_one_work at ffffffff8a319663
Resolve this by clamping the firmware's return value to the maximum
number of legal elements the firmware should return.
Fixes: d52c89f120 ("qed*: Utilize FW 8.37.2.0")
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Link: https://patch.msgid.link/f8e1182934aa274c18d0682a12dbaf347595469c.1757485536.git.jamie.bainbridge@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2e7bba0892 ]
A NULL pointer dereference can occur in tcp_ao_finish_connect() during a
connect() system call on a socket with a TCP-AO key added and TCP_REPAIR
enabled.
The function is called with skb being NULL and attempts to dereference it
on tcp_hdr(skb)->seq without a prior skb validation.
Fix this by checking if skb is NULL before dereferencing it.
The commentary is taken from bpf_skops_established(), which is also called
in the same flow. Unlike the function being patched,
bpf_skops_established() validates the skb before dereferencing it.
int main(void){
struct sockaddr_in sockaddr;
struct tcp_ao_add tcp_ao;
int sk;
int one = 1;
memset(&sockaddr,'\0',sizeof(sockaddr));
memset(&tcp_ao,'\0',sizeof(tcp_ao));
sk = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
sockaddr.sin_family = AF_INET;
memcpy(tcp_ao.alg_name,"cmac(aes128)",12);
memcpy(tcp_ao.key,"ABCDEFGHABCDEFGH",16);
tcp_ao.keylen = 16;
memcpy(&tcp_ao.addr,&sockaddr,sizeof(sockaddr));
setsockopt(sk, IPPROTO_TCP, TCP_AO_ADD_KEY, &tcp_ao,
sizeof(tcp_ao));
setsockopt(sk, IPPROTO_TCP, TCP_REPAIR, &one, sizeof(one));
sockaddr.sin_family = AF_INET;
sockaddr.sin_port = htobe16(123);
inet_aton("127.0.0.1", &sockaddr.sin_addr);
connect(sk,(struct sockaddr *)&sockaddr,sizeof(sockaddr));
return 0;
}
$ gcc tcp-ao-nullptr.c -o tcp-ao-nullptr -Wall
$ unshare -Urn
BUG: kernel NULL pointer dereference, address: 00000000000000b6
PGD 1f648d067 P4D 1f648d067 PUD 1982e8067 PMD 0
Oops: Oops: 0000 [#1] SMP NOPTI
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop
Reference Platform, BIOS 6.00 11/12/2020
RIP: 0010:tcp_ao_finish_connect (net/ipv4/tcp_ao.c:1182)
Fixes: 7c2ffaf21b ("net/tcp: Calculate TCP-AO traffic keys")
Signed-off-by: Anderson Nascimento <anderson@allelesecurity.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250911230743.2551-3-anderson@allelesecurity.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2690cb0895 ]
Starting with commit c50e747596 ("dpaa2-switch: Fix error checking in
dpaa2_switch_seed_bp()"), the probing of a second DPSW object errors out
like below.
fsl_dpaa2_switch dpsw.1: fsl_mc_driver_probe failed: -12
fsl_dpaa2_switch dpsw.1: probe with driver fsl_dpaa2_switch failed with error -12
The aforementioned commit brought to the surface the fact that seeding
buffers into the buffer pool destined for control traffic is not
successful and an access violation recoverable error can be seen in the
MC firmware log:
[E, qbman_rec_isr:391, QBMAN] QBMAN recoverable event 0x1000000
This happens because the driver incorrectly used the ID of the DPBP
object instead of the hardware buffer pool ID when trying to release
buffers into it.
This is because any DPSW object uses two buffer pools, one managed by
the Linux driver and destined for control traffic packet buffers and the
other one managed by the MC firmware and destined only for offloaded
traffic. And since the buffer pool managed by the MC firmware does not
have an external facing DPBP equivalent, any subsequent DPBP objects
created after the first DPSW will have a DPBP id different to the
underlying hardware buffer ID.
The issue was not caught earlier because these two numbers can be
identical when all DPBP objects are created before the DPSW objects are.
This is the case when the DPL file is used to describe the entire DPAA2
object layout and objects are created at boot time and it's also true
for the first DPSW being created dynamically using ls-addsw.
Fix this by using the buffer pool ID instead of the DPBP id when
releasing buffers into the pool.
Fixes: 2877e4f7e1 ("staging: dpaa2-switch: setup buffer pool and RX path rings")
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://patch.msgid.link/20250910144825.2416019-1-ioana.ciornei@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7ebf70cf18 ]
When register_virtio_device() fails in virtio_uml_probe(),
the code sets vu_dev->registered = 1 even though
the device was not successfully registered.
This can lead to use-after-free or other issues.
Fixes: 04e5b1fb01 ("um: virtio: Remove device on disconnect")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b62fd63ade ]
The offset for an extref item's key is not the object ID of the parent
dir, otherwise we would not need the extref item and would use plain ref
items. Instead the offset is the result of a hash computation that uses
the object ID of the parent dir and the name associated to the entry.
So fix this by setting the key offset at replay_one_name() to be the
result of calling btrfs_extref_hash().
Fixes: 725af92a62 ("btrfs: Open-code name_in_log_ref in replay_one_name")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 79f919a89c ]
A hung task can occur during [1] LTP cgroup testing when repeatedly
mounting/unmounting perf_event and net_prio controllers with
systemd.unified_cgroup_hierarchy=1. The hang manifests in
cgroup_lock_and_drain_offline() during root destruction.
Related case:
cgroup_fj_function_perf_event cgroup_fj_function.sh perf_event
cgroup_fj_function_net_prio cgroup_fj_function.sh net_prio
Call Trace:
cgroup_lock_and_drain_offline+0x14c/0x1e8
cgroup_destroy_root+0x3c/0x2c0
css_free_rwork_fn+0x248/0x338
process_one_work+0x16c/0x3b8
worker_thread+0x22c/0x3b0
kthread+0xec/0x100
ret_from_fork+0x10/0x20
Root Cause:
CPU0 CPU1
mount perf_event umount net_prio
cgroup1_get_tree cgroup_kill_sb
rebind_subsystems // root destruction enqueues
// cgroup_destroy_wq
// kill all perf_event css
// one perf_event css A is dying
// css A offline enqueues cgroup_destroy_wq
// root destruction will be executed first
css_free_rwork_fn
cgroup_destroy_root
cgroup_lock_and_drain_offline
// some perf descendants are dying
// cgroup_destroy_wq max_active = 1
// waiting for css A to die
Problem scenario:
1. CPU0 mounts perf_event (rebind_subsystems)
2. CPU1 unmounts net_prio (cgroup_kill_sb), queuing root destruction work
3. A dying perf_event CSS gets queued for offline after root destruction
4. Root destruction waits for offline completion, but offline work is
blocked behind root destruction in cgroup_destroy_wq (max_active=1)
Solution:
Split cgroup_destroy_wq into three dedicated workqueues:
cgroup_offline_wq – Handles CSS offline operations
cgroup_release_wq – Manages resource release
cgroup_free_wq – Performs final memory deallocation
This separation eliminates blocking in the CSS free path while waiting for
offline operations to complete.
[1] https://github.com/linux-test-project/ltp/blob/master/runtest/controllers
Fixes: 334c3679ec ("cgroup: reimplement rebind_subsystems() using cgroup_apply_control() and friends")
Reported-by: Gao Yingjie <gaoyingjie@uniontech.com>
Signed-off-by: Chen Ridong <chenridong@huawei.com>
Suggested-by: Teju Heo <tj@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d1dfcdd301 ]
As described in the added code comment, a reference to .exit.text is ok
for drivers registered via platform_driver_probe(). Make this explicit
to prevent the following section mismatch warning
WARNING: modpost: drivers/pcmcia/omap_cf: section mismatch in reference: omap_cf_driver+0x4 (section: .data) -> omap_cf_remove (section: .exit.text)
that triggers on an omap1_defconfig + CONFIG_OMAP_CF=m build.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Reviewed-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit aea3493246 ]
The ALSA HwDep character device of the firewire-motu driver incorrectly
returns EPOLLOUT in poll(2), even though the driver implements no operation
for write(2). This misleads userspace applications to believe write() is
allowed, potentially resulting in unnecessarily wakeups.
This issue dates back to the driver's initial code added by a commit
71c3797779 ("ALSA: firewire-motu: add hwdep interface"), and persisted
when POLLOUT was updated to EPOLLOUT by a commit a9a08845e9 ('vfs: do
bulk POLL* -> EPOLL* replacement("").').
This commit fixes the bug.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Link: https://patch.msgid.link/20250829233749.366222-1-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7ac3c2889b ]
I recently ran into an issue where the PI generated using the block layer
integrity code differs from that from a kernel using the PRACT fallback
when the block layer integrity code is disabled, and I tracked this down
to us using PRACT incorrectly.
The NVM Command Set Specification (section 5.33 in 1.2, similar in older
versions) specifies the PRACT insert behavior as:
Inserted protection information consists of the computed CRC for the
protection information format (refer to section 5.3.1) in the Guard
field, the LBAT field value in the Application Tag field, the LBST
field value in the Storage Tag field, if defined, and the computed
reference tag in the Logical Block Reference Tag.
Where the computed reference tag is defined as following for type 1 and
type 2 using the text below that is duplicated in the respective bullet
points:
the value of the computed reference tag for the first logical block of
the command is the value contained in the Initial Logical Block
Reference Tag (ILBRT) or Expected Initial Logical Block Reference Tag
(EILBRT) field in the command, and the computed reference tag is
incremented for each subsequent logical block.
So we need to set ILBRT field, but we currently don't. Interestingly
this works fine on my older type 1 formatted SSD, but Qemu trips up on
this. We already set ILBRT for Write Same since commit aeb7bb061be5
("nvme: set the PRACT bit when using Write Zeroes with T10 PI").
To ease this, move the PI type check into nvme_set_ref_tag.
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Replace one-element array with flexible-array member to fix:
[ 11.725017] ------------[ cut here ]------------
[ 11.725038] memcpy: detected field-spanning write (size 4) of single field "hdr->body" at drivers/staging/vc04_services/vc-sm-cma/vc_sm_cma_vchi.c:130 (size 0)
[ 11.725113] WARNING: CPU: 3 PID: 455 at drivers/staging/vc04_services/vc-sm-cma/vc_sm_cma_vchi.c:130 vc_vchi_cmd_create+0x1a8/0x1d0 [vc_sm_cma]
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Extend the LED control features of the led-modes property by adding a
led-swap property. This allows the same led-modes values to be used
across designs where the LED assignments differ.
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Copy the firmware's handling of FSTROBE, but using module parameters
instead of config.txt entries.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
When the analog binning mode is used for high framerate operation, the
pixel rate is effectively doubled. Account for this when setting up the
pixel clock rate, and applying the vblank and exposure controls.
The previous logic only used analog binning for RAW8, but normal binning
limits the framerate on RAW10 480p [1]. So with this patch we switch to
using special binning (with 2x pixel rate) wherever possible.
Backport of commit f513997 upstream.
[1]: https://github.com/raspberrypi/linux/issues/5493
Co-developed-by: Naushir Patuck <naush@raspberrypi.com>
Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
Co-developed-by: Vinay Varma <varmavinaym@gmail.com>
Signed-off-by: Vinay Varma <varmavinaym@gmail.com>
Signed-off-by: Jai Luthra <jai.luthra@ideasonboard.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
commit 30c1d25b98 upstream.
Blamed commit broke the check for a null scratch map:
- if (unlikely(!m || !*raw_cpu_ptr(m->scratch)))
+ if (unlikely(!raw_cpu_ptr(m->scratch)))
This should have been "if (!*raw_ ...)".
Use the pattern of the avx2 version which is more readable.
This can only be reproduced if avx2 support isn't available.
Fixes: d8d871a35c ("netfilter: nft_set_pipapo: merge pipapo_get/lookup")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8ea25274eb upstream.
reset_gpio is claimed in mdiobus_register_device(), but it is not
released in mdiobus_unregister_device(). It is instead only
released when the whole MDIO bus is unregistered.
When a device uses the reset_gpio property, it becomes impossible
to unregister it and register it again, because the GPIO remains
claimed.
This patch resolves that issue.
Fixes: bafbdd527d ("phylib: Add device reset GPIO support") # see notes
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Cc: Csókás Bence <csokas.bence@prolan.hu>
[ csokas.bence: Resolve rebase conflict and clarify msg ]
Signed-off-by: Buday Csaba <buday.csaba@prolan.hu>
Link: https://patch.msgid.link/20250807135449.254254-2-csokas.bence@prolan.hu
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
[ csokas.bence: Use the v1 patch on top of 6.12, as specified in notes ]
Signed-off-by: Bence Csókás <csokas.bence@prolan.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cba4262a19 upstream.
Support for parsing the topology on AMD/Hygon processors using CPUID leaf 0xb
was added in
3986a0a805 ("x86/CPU/AMD: Derive CPU topology from CPUID function 0xB when available").
In an effort to keep all the topology parsing bits in one place, this commit
also introduced a pseudo dependency on the TOPOEXT feature to parse the CPUID
leaf 0xb.
The TOPOEXT feature (CPUID 0x80000001 ECX[22]) advertises the support for
Cache Properties leaf 0x8000001d and the CPUID leaf 0x8000001e EAX for
"Extended APIC ID" however support for 0xb was introduced alongside the x2APIC
support not only on AMD [1], but also historically on x86 [2].
Similar to 0xb, the support for extended CPU topology leaf 0x80000026 too does
not depend on the TOPOEXT feature.
The support for these leaves is expected to be confirmed by ensuring
leaf <= {extended_}cpuid_level
and then parsing the level 0 of the respective leaf to confirm EBX[15:0]
(LogProcAtThisLevel) is non-zero as stated in the definition of
"CPUID_Fn0000000B_EAX_x00 [Extended Topology Enumeration]
(Core::X86::Cpuid::ExtTopEnumEax0)" in Processor Programming Reference (PPR)
for AMD Family 19h Model 01h Rev B1 Vol1 [3] Sec. 2.1.15.1 "CPUID Instruction
Functions".
This has not been a problem on baremetal platforms since support for TOPOEXT
(Fam 0x15 and later) predates the support for CPUID leaf 0xb (Fam 0x17[Zen2]
and later), however, for AMD guests on QEMU, the "x2apic" feature can be
enabled independent of the "topoext" feature where QEMU expects topology and
the initial APICID to be parsed using the CPUID leaf 0xb (especially when
number of cores > 255) which is populated independent of the "topoext" feature
flag.
Unconditionally call cpu_parse_topology_ext() on AMD and Hygon processors to
first parse the topology using the XTOPOLOGY leaves (0x80000026 / 0xb) before
using the TOPOEXT leaf (0x8000001e).
While at it, break down the single large comment in parse_topology_amd() to
better highlight the purpose of each CPUID leaf.
Fixes: 3986a0a805 ("x86/CPU/AMD: Derive CPU topology from CPUID function 0xB when available")
Suggested-by: Naveen N Rao (AMD) <naveen@kernel.org>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org # Only v6.9 and above; depends on x86 topology rewrite
Link: https://lore.kernel.org/lkml/1529686927-7665-1-git-send-email-suravee.suthikulpanit@amd.com/ [1]
Link: https://lore.kernel.org/lkml/20080818181435.523309000@linux-os.sc.intel.com/ [2]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 [3]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5068b52548 upstream.
When we don't have a clock specified in the device tree, we have no way to
ensure the BAM is on. This is often the case for remotely-controlled or
remotely-powered BAM instances. In this case, we need to read num-channels
from the DT to have all the necessary information to complete probing.
However, at the moment invalid device trees without clock and without
num-channels still continue probing, because the error handling is missing
return statements. The driver will then later try to read the number of
channels from the registers. This is unsafe, because it relies on boot
firmware and lucky timing to succeed. Unfortunately, the lack of proper
error handling here has been abused for several Qualcomm SoCs upstream,
causing early boot crashes in several situations [1, 2].
Avoid these early crashes by erroring out when any of the required DT
properties are missing. Note that this will break some of the existing DTs
upstream (mainly BAM instances related to the crypto engine). However,
clearly these DTs have never been tested properly, since the error in the
kernel log was just ignored. It's safer to disable the crypto engine for
these broken DTBs.
[1]: https://lore.kernel.org/r/CY01EKQVWE36.B9X5TDXAREPF@fairphone.com/
[2]: https://lore.kernel.org/r/20230626145959.646747-1-krzysztof.kozlowski@linaro.org/
Cc: stable@vger.kernel.org
Fixes: 48d163b1aa ("dmaengine: qcom: bam_dma: get num-channels and num-ees from dt")
Signed-off-by: Stephan Gerhold <stephan.gerhold@linaro.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250212-bam-dma-fixes-v1-8-f560889e65d8@linaro.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 116e79c679 upstream.
The EP-IN of MIDI2 (altset 1) wasn't initialized in
f_midi2_create_usb_configs() as it's an INT EP unlike others BULK
EPs. But this leaves rather the max packet size unchanged no matter
which speed is used, resulting in the very slow access.
And the wMaxPacketSize values set there look legit for INT EPs, so
let's initialize the MIDI2 EP-IN there for achieving the equivalent
speed as well.
Fixes: 8b645922b2 ("usb: gadget: Add support for USB MIDI 2.0 function driver")
Cc: stable <stable@kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://lore.kernel.org/r/20250905133240.20966-1-tiwai@suse.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 21d8525d2e upstream.
The gadget card driver forgot to call snd_ump_update_group_attrs()
after adding FBs, and this leaves the UMP group attributes
uninitialized. As a result, -ENODEV error is returned at opening a
legacy rawmidi device as an inactive group.
This patch adds the missing call to address the behavior above.
Fixes: 8b645922b2 ("usb: gadget: Add support for USB MIDI 2.0 function driver")
Cc: stable <stable@kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://lore.kernel.org/r/20250904153932.13589-1-tiwai@suse.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f34bfcc77b upstream.
tcpm_handle_vdm_request delivers messages to the partner altmode or the
cable altmode depending on the SVDM response type, which is incorrect.
The partner or cable should be chosen based on the received message type
instead.
Also add this filter to ADEV_NOTIFY_USB_AND_QUEUE_VDM, which is used when
the Enter Mode command is responded to by a NAK on SOP or SOP' and when
the Exit Mode command is responded to by an ACK on SOP.
Fixes: 7e7877c55e ("usb: typec: tcpm: add alt mode enter/exit/vdm support for sop'")
Cc: stable@vger.kernel.org
Signed-off-by: RD Babiera <rdbabiera@google.com>
Reviewed-by: Badhri Jagan Sridharan <badhri@google.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20250821203759.1720841-2-rdbabiera@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8d4f1e05ff ]
Without this I get a bunch of build errors like
In file included from ./include/linux/sched/task_stack.h:12,
from ./arch/riscv/include/asm/compat.h:12,
from ./arch/riscv/include/asm/pgtable.h:115,
from ./include/linux/pgtable.h:6,
from ./include/linux/mm.h:30,
from arch/riscv/kernel/asm-offsets.c:8:
./include/linux/kasan.h:50:37: error: ‘MAX_PTRS_PER_PTE’ undeclared here (not in a function); did you mean ‘PTRS_PER_PTE’?
50 | extern pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE + PTE_HWTABLE_PTRS];
| ^~~~~~~~~~~~~~~~
| PTRS_PER_PTE
./include/linux/kasan.h:51:8: error: unknown type name ‘pmd_t’; did you mean ‘pgd_t’?
51 | extern pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD];
| ^~~~~
| pgd_t
./include/linux/kasan.h:51:37: error: ‘MAX_PTRS_PER_PMD’ undeclared here (not in a function); did you mean ‘PTRS_PER_PGD’?
51 | extern pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD];
| ^~~~~~~~~~~~~~~~
| PTRS_PER_PGD
./include/linux/kasan.h:52:8: error: unknown type name ‘pud_t’; did you mean ‘pgd_t’?
52 | extern pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD];
| ^~~~~
| pgd_t
./include/linux/kasan.h:52:37: error: ‘MAX_PTRS_PER_PUD’ undeclared here (not in a function); did you mean ‘PTRS_PER_PGD’?
52 | extern pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD];
| ^~~~~~~~~~~~~~~~
| PTRS_PER_PGD
./include/linux/kasan.h:53:8: error: unknown type name ‘p4d_t’; did you mean ‘pgd_t’?
53 | extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D];
| ^~~~~
| pgd_t
./include/linux/kasan.h:53:37: error: ‘MAX_PTRS_PER_P4D’ undeclared here (not in a function); did you mean ‘PTRS_PER_PGD’?
53 | extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D];
| ^~~~~~~~~~~~~~~~
| PTRS_PER_PGD
Link: https://lore.kernel.org/r/20241126143250.29708-1-palmer@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e63419dbf2 ]
Fix a critical memory allocation bug in edma_setup_from_hw() where
queue_priority_map was allocated with insufficient memory. The code
declared queue_priority_map as s8 (*)[2] (pointer to array of 2 s8),
but allocated memory using sizeof(s8) instead of the correct size.
This caused out-of-bounds memory writes when accessing:
queue_priority_map[i][0] = i;
queue_priority_map[i][1] = i;
The bug manifested as kernel crashes with "Oops - undefined instruction"
on ARM platforms (BeagleBoard-X15) during EDMA driver probe, as the
memory corruption triggered kernel hardening features on Clang.
Change the allocation to use sizeof(*queue_priority_map) which
automatically gets the correct size for the 2D array structure.
Fixes: 2b6b3b7420 ("ARM/dmaengine: edma: Merge the two drivers under drivers/dma/")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Link: https://lore.kernel.org/r/20250830094953.3038012-1-anders.roxell@linaro.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 39aaa33744 ]
The clean up in idxd_setup_wqs() has had a couple bugs because the error
handling is a bit subtle. It's simpler to just re-write it in a cleaner
way. The issues here are:
1) If "idxd->max_wqs" is <= 0 then we call put_device(conf_dev) when
"conf_dev" hasn't been initialized.
2) If kzalloc_node() fails then again "conf_dev" is invalid. It's
either uninitialized or it points to the "conf_dev" from the
previous iteration so it leads to a double free.
It's better to free partial loop iterations within the loop and then
the unwinding at the end can handle whole loop iterations. I also
renamed the labels to describe what the goto does and not where the goto
was located.
Fixes: 3fd2f4bc01 ("dmaengine: idxd: fix memory leak in error handling path of idxd_setup_wqs")
Reported-by: Colin Ian King <colin.i.king@gmail.com>
Closes: https://lore.kernel.org/all/20250811095836.1642093-1-colin.i.king@gmail.com/
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/aJnJW3iYTDDCj9sk@stanley.mountain
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b7cb9a0343 ]
A recent refactor introduced a misplaced put_device() call, resulting in a
reference count underflow during module unload.
There is no need to add additional put_device() calls for idxd groups,
engines, or workqueues. Although the commit claims: "Note, this also
fixes the missing put_device() for idxd groups, engines, and wqs."
It appears no such omission actually existed. The required cleanup is
already handled by the call chain:
idxd_unregister_devices() -> device_unregister() -> put_device()
Extend idxd_cleanup() to handle the remaining necessary cleanup and
remove idxd_cleanup_internals(), which duplicates deallocation logic
for idxd, engines, groups, and workqueues. Memory management is also
properly handled through the Linux device model.
Fixes: a409e919ca ("dmaengine: idxd: Refactor remove call with idxd_cleanup() helper")
Signed-off-by: Yi Sun <yi.sun@intel.com>
Tested-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Link: https://lore.kernel.org/r/20250729150313.1934101-3-yi.sun@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f41c538881 ]
The call to idxd_free() introduces a duplicate put_device() leading to a
reference count underflow:
refcount_t: underflow; use-after-free.
WARNING: CPU: 15 PID: 4428 at lib/refcount.c:28 refcount_warn_saturate+0xbe/0x110
...
Call Trace:
<TASK>
idxd_remove+0xe4/0x120 [idxd]
pci_device_remove+0x3f/0xb0
device_release_driver_internal+0x197/0x200
driver_detach+0x48/0x90
bus_remove_driver+0x74/0xf0
pci_unregister_driver+0x2e/0xb0
idxd_exit_module+0x34/0x7a0 [idxd]
__do_sys_delete_module.constprop.0+0x183/0x280
do_syscall_64+0x54/0xd70
entry_SYSCALL_64_after_hwframe+0x76/0x7e
The idxd_unregister_devices() which is invoked at the very beginning of
idxd_remove(), already takes care of the necessary put_device() through the
following call path:
idxd_unregister_devices() -> device_unregister() -> put_device()
In addition, when CONFIG_DEBUG_KOBJECT_RELEASE is enabled, put_device() may
trigger asynchronous cleanup via schedule_delayed_work(). If idxd_free() is
called immediately after, it can result in a use-after-free.
Remove the improper idxd_free() to avoid both the refcount underflow and
potential memory corruption during module unload.
Fixes: d5449ff1b0 ("dmaengine: idxd: Add missing idxd cleanup to fix memory leak in remove call")
Signed-off-by: Yi Sun <yi.sun@intel.com>
Tested-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Link: https://lore.kernel.org/r/20250729150313.1934101-2-yi.sun@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 393c841fe4 ]
hsr_port_get_hsr() iterates over ports using hsr_for_each_port(),
but many of its callers do not hold the required RCU lock.
Switch to hsr_for_each_port_rtnl(), since most callers already hold
the rtnl lock. After review, all callers are covered by either the rtnl
lock or the RCU lock, except hsr_dev_xmit(). Fix this by adding an
RCU read lock there.
Fixes: c5a7591172 ("net/hsr: Use list_head (and rcu) instead of array for slave devices.")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250905091533.377443-3-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8884c69399 ]
hsr_for_each_port is called in many places without holding the RCU read
lock, this may trigger warnings on debug kernels. Most of the callers
are actually hold rtnl lock. So add a new helper hsr_for_each_port_rtnl
to allow callers in suitable contexts to iterate ports safely without
explicit RCU locking.
This patch only fixed the callers that is hold rtnl lock. Other caller
issues will be fixed in later patches.
Fixes: c5a7591172 ("net/hsr: Use list_head (and rcu) instead of array for slave devices.")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250905091533.377443-2-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1a8a63a530 ]
This patch adds support for VLAN ctag based filtering at slave devices.
The slave ethernet device may be capable of filtering ethernet packets
based on VLAN ID. This requires that when the VLAN interface is created
over an HSR/PRP interface, it passes the VID information to the
associated slave ethernet devices so that it updates the hardware
filters to filter ethernet frames based on VID. This patch adds the
required functions to propagate the vid information to the slave
devices.
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20241106091710.3308519-3-danishanwar@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: 8884c69399 ("hsr: use rtnl lock when iterating over ports")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b2f742c846 ]
The hash, hash_fast, rhash and bitwise sets may indicate no result even
though a matching element exists during a short time window while other
cpu is finalizing the transaction.
This happens when the hash lookup/bitwise lookup function has picked up
the old genbit, right before it was toggled by nf_tables_commit(), but
then the same cpu managed to unlink the matching old element from the
hash table:
cpu0 cpu1
has added new elements to clone
has marked elements as being
inactive in new generation
perform lookup in the set
enters commit phase:
A) observes old genbit
increments base_seq
I) increments the genbit
II) removes old element from the set
B) finds matching element
C) returns no match: found
element is not valid in old
generation
Next lookup observes new genbit and
finds matching e2.
Consider a packet matching element e1, e2.
cpu0 processes following transaction:
1. remove e1
2. adds e2, which has same key as e1.
P matches both e1 and e2. Therefore, cpu1 should always find a match
for P. Due to above race, this is not the case:
cpu1 observed the old genbit. e2 will not be considered once it is found.
The element e1 is not found anymore if cpu0 managed to unlink it from the
hlist before cpu1 found it during list traversal.
The situation only occurs for a brief time period, lookups happening
after I) observe new genbit and return e2.
This problem exists in all set types except nft_set_pipapo, so fix it once
in nft_lookup rather than each set ops individually.
Sample the base sequence counter, which gets incremented right before the
genbit is changed.
Then, if no match is found, retry the lookup if the base sequence was
altered in between.
If the base sequence hasn't changed:
- No update took place: no-match result is expected.
This is the common case. or:
- nf_tables_commit() hasn't progressed to genbit update yet.
Old elements were still visible and nomatch result is expected, or:
- nf_tables_commit updated the genbit:
We picked up the new base_seq, so the lookup function also picked
up the new genbit, no-match result is expected.
If the old genbit was observed, then nft_lookup also picked up the old
base_seq: nft_lookup_should_retry() returns true and relookup is performed
in the new generation.
This problem was added when the unconditional synchronize_rcu() call
that followed the current/next generation bit toggle was removed.
Thanks to Pablo Neira Ayuso for reviewing an earlier version of this
patchset, for suggesting re-use of existing base_seq and placement of
the restart loop in nft_set_do_lookup().
Fixes: 0cbc06b3fa ("netfilter: nf_tables: remove synchronize_rcu in commit phase")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 11fe5a82e5 ]
This function was added for retpoline mitigation and is replaced by a
static inline helper if mitigations are not enabled.
Enable this helper function unconditionally so next patch can add a lookup
restart mechanism to fix possible false negatives while transactions are
in progress.
Adding lookup restarts in nft_lookup_eval doesn't work as nft_objref would
then need the same copypaste loop.
This patch is separate to ease review of the actual bug fix.
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Stable-dep-of: b2f742c846 ("netfilter: nf_tables: restart set lookup on base_seq change")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 64102d9bbc ]
This will soon be read from packet path around same time as the gencursor.
Both gencursor and base_seq get incremented almost at the same time, so
it makes sense to place them in the same structure.
This doesn't increase struct net size on 64bit due to padding.
Signed-off-by: Florian Westphal <fw@strlen.de>
Stable-dep-of: b2f742c846 ("netfilter: nf_tables: restart set lookup on base_seq change")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a1050dd071 ]
Restore commit 28339b21a3 ("netfilter: nf_tables: do not send complete
notification of deletions") and fix it:
- Avoid upfront modification of 'event' variable so the conditionals
become effective.
- Always include NFTA_OBJ_TYPE attribute in object notifications, user
space requires it for proper deserialisation.
- Catch DESTROY events, too.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Stable-dep-of: b2f742c846 ("netfilter: nf_tables: restart set lookup on base_seq change")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a60f7bf4a1 ]
When the rbtree lookup function finds a match in the rbtree, it sets the
range start interval to a potentially inactive element.
Then, after tree lookup, if the matching element is inactive, it returns
NULL and suppresses a matching result.
This is wrong and leads to false negative matches when a transaction has
already entered the commit phase.
cpu0 cpu1
has added new elements to clone
has marked elements as being
inactive in new generation
perform lookup in the set
enters commit phase:
I) increments the genbit
A) observes new genbit
B) finds matching range
C) returns no match: found
range invalid in new generation
II) removes old elements from the tree
C New nft_lookup happening now
will find matching element,
because it is no longer
obscured by old, inactive one.
Consider a packet matching range r1-r2:
cpu0 processes following transaction:
1. remove r1-r2
2. add r1-r3
P is contained in both ranges. Therefore, cpu1 should always find a match
for P. Due to above race, this is not the case:
cpu1 does find r1-r2, but then ignores it due to the genbit indicating
the range has been removed. It does NOT test for further matches.
The situation persists for all lookups until after cpu0 hits II) after
which r1-r3 range start node is tested for the first time.
Move the "interval start is valid" check ahead so that tree traversal
continues if the starting interval is not valid in this generation.
Thanks to Stefan Hanreich for providing an initial reproducer for this
bug.
Reported-by: Stefan Hanreich <s.hanreich@proxmox.com>
Fixes: c1eda3c639 ("netfilter: nft_rbtree: ignore inactive matching element with no descendants")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c4eaca2e10 ]
The pipapo set type is special in that it has two copies of its
datastructure: one live copy containing only valid elements and one
on-demand clone used during transaction where adds/deletes happen.
This clone is not visible to the datapath.
This is unlike all other set types in nftables, those all link new
elements into their live hlist/tree.
For those sets, the lookup functions must skip the new elements while the
transaction is ongoing to ensure consistency.
As the clone is shallow, removal does have an effect on the packet path:
once the transaction enters the commit phase the 'gencursor' bit that
determines which elements are active and which elements should be ignored
(because they are no longer valid) is flipped.
This causes the datapath lookup to ignore these elements if they are found
during lookup.
This opens up a small race window where pipapo has an inconsistent view of
the dataset from when the transaction-cpu flipped the genbit until the
transaction-cpu calls nft_pipapo_commit() to swap live/clone pointers:
cpu0 cpu1
has added new elements to clone
has marked elements as being
inactive in new generation
perform lookup in the set
enters commit phase:
I) increments the genbit
A) observes new genbit
removes elements from the clone so
they won't be found anymore
B) lookup in datastructure
can't see new elements yet,
but old elements are ignored
-> Only matches elements that
were not changed in the
transaction
II) calls nft_pipapo_commit(), clone
and live pointers are swapped.
C New nft_lookup happening now
will find matching elements.
Consider a packet matching range r1-r2:
cpu0 processes following transaction:
1. remove r1-r2
2. add r1-r3
P is contained in both ranges. Therefore, cpu1 should always find a match
for P. Due to above race, this is not the case:
cpu1 does find r1-r2, but then ignores it due to the genbit indicating
the range has been removed.
At the same time, r1-r3 is not visible yet, because it can only be found
in the clone.
The situation persists for all lookups until after cpu0 hits II).
The fix is easy: Don't check the genbit from pipapo lookup functions.
This is possible because unlike the other set types, the new elements are
not reachable from the live copy of the dataset.
The clone/live pointer swap is enough to avoid matching on old elements
while at the same time all new elements are exposed in one go.
After this change, step B above returns a match in r1-r2.
This is fine: r1-r2 only becomes truly invalid the moment they get freed.
This happens after a synchronize_rcu() call and rcu read lock is held
via netfilter hook traversal (nf_hook_slow()).
Cc: Stefano Brivio <sbrivio@redhat.com>
Fixes: 3c4287f620 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c8a7c2c608 ]
Dan Carpenter says:
Commit 17a20e09f0 ("netfilter: nft_set: remove one argument from
lookup and update functions") [..] leads to the following Smatch
static checker warning:
net/netfilter/nft_set_pipapo_avx2.c:1269 nft_pipapo_avx2_lookup()
error: uninitialized symbol 'ext'.
Fix this by initing ext to NULL and set it only once we've found
a match.
Fixes: 17a20e09f0 ("netfilter: nft_set: remove one argument from lookup and update functions")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/netfilter-devel/aJBzc3V5wk-yPOnH@stanley.mountain/
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Stable-dep-of: c4eaca2e10 ("netfilter: nft_set_pipapo: don't check genbit from packetpath lookups")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d8d871a35c ]
The matching algorithm has implemented thrice:
1. data path lookup, generic version
2. data path lookup, avx2 version
3. control plane lookup
Merge 1 and 3 by refactoring pipapo_get as a common helper, then make
nft_pipapo_lookup and nft_pipapo_get both call the common helper.
Aside from the code savings this has the benefit that we no longer allocate
temporary scratch maps for each control plane get and insertion operation.
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Stable-dep-of: c4eaca2e10 ("netfilter: nft_set_pipapo: don't check genbit from packetpath lookups")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 17a20e09f0 ]
Return the extension pointer instead of passing it as a function
argument to be filled in by the callee.
As-is, whenever false is returned, the extension pointer is not used.
For all set types, when true is returned, the extension pointer was set
to the matching element.
Only exception: nft_set_bitmap doesn't support extensions.
Return a pointer to a static const empty element extension container.
return false -> return NULL
return true -> return the elements' extension pointer.
This saves one function argument.
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Stable-dep-of: c4eaca2e10 ("netfilter: nft_set_pipapo: don't check genbit from packetpath lookups")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ef79f00be7 ]
can_put_echo_skb() takes ownership of the SKB and it may be freed
during or after the call.
However, xilinx_can xcan_write_frame() keeps using SKB after the call.
Fix that by only calling can_put_echo_skb() after the code is done
touching the SKB.
The tx_lock is held for the entire xcan_write_frame() execution and
also on the can_get_echo_skb() side so the order of operations does not
matter.
An earlier fix commit 3d3c817c3a ("can: xilinx_can: Fix usage of skb
memory") did not move the can_put_echo_skb() call far enough.
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Fixes: 1598efe57b ("can: xilinx_can: refactor code in preparation for CAN FD support")
Link: https://patch.msgid.link/20250822095002.168389-1-anssi.hannula@bitwise.fi
[mkl: add "commit" in front of sha1 in patch description]
[mkl: fix indention]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 915470e1b4 ]
If request_irq() in i40e_vsi_request_irq_msix() fails in an iteration
later than the first, the error path wants to free the IRQs requested
so far. However, it uses the wrong dev_id argument for free_irq(), so
it does not free the IRQs correctly and instead triggers the warning:
Trying to free already-free IRQ 173
WARNING: CPU: 25 PID: 1091 at kernel/irq/manage.c:1829 __free_irq+0x192/0x2c0
Modules linked in: i40e(+) [...]
CPU: 25 UID: 0 PID: 1091 Comm: NetworkManager Not tainted 6.17.0-rc1+ #1 PREEMPT(lazy)
Hardware name: [...]
RIP: 0010:__free_irq+0x192/0x2c0
[...]
Call Trace:
<TASK>
free_irq+0x32/0x70
i40e_vsi_request_irq_msix.cold+0x63/0x8b [i40e]
i40e_vsi_request_irq+0x79/0x80 [i40e]
i40e_vsi_open+0x21f/0x2f0 [i40e]
i40e_open+0x63/0x130 [i40e]
__dev_open+0xfc/0x210
__dev_change_flags+0x1fc/0x240
netif_change_flags+0x27/0x70
do_setlink.isra.0+0x341/0xc70
rtnl_newlink+0x468/0x860
rtnetlink_rcv_msg+0x375/0x450
netlink_rcv_skb+0x5c/0x110
netlink_unicast+0x288/0x3c0
netlink_sendmsg+0x20d/0x430
____sys_sendmsg+0x3a2/0x3d0
___sys_sendmsg+0x99/0xe0
__sys_sendmsg+0x8a/0xf0
do_syscall_64+0x82/0x2c0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
[...]
</TASK>
---[ end trace 0000000000000000 ]---
Use the same dev_id for free_irq() as for request_irq().
I tested this with inserting code to fail intentionally.
Fixes: 493fb30011 ("i40e: Move q_vectors from pointer to array to array of pointers")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d709f178ab ]
The igb driver incorrectly skips the link test when the network
interface is admin down (if_running == false), causing the test to
always report PASS regardless of the actual physical link state.
This behavior is inconsistent with other drivers (e.g. i40e, ice, ixgbe,
etc.) which correctly test the physical link state regardless of admin
state.
Remove the if_running check to ensure link test always reflects the
physical link state.
Fixes: 8d420a1b3e ("igb: correct link test not being run when link is down")
Signed-off-by: Kohei Enju <enjuk@amazon.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e3c674db35 ]
If a GSO skb is sent through a Geneve tunnel and if Geneve options are
added, the split GSO skb might not fit in the MTU anymore and an ICMP
frag needed packet can be generated. In such case the ICMP packet might
go through the segmentation logic (and dropped) later if it reaches a
path were the GSO status is checked and segmentation is required.
This is especially true when an OvS bridge is used with a Geneve tunnel
attached to it. The following set of actions could lead to the ICMP
packet being wrongfully segmented:
1. An skb is constructed by the TCP layer (e.g. gso_type SKB_GSO_TCPV4,
segs >= 2).
2. The skb hits the OvS bridge where Geneve options are added by an OvS
action before being sent through the tunnel.
3. When the skb is xmited in the tunnel, the split skb does not fit
anymore in the MTU and iptunnel_pmtud_build_icmp is called to
generate an ICMP fragmentation needed packet. This is done by reusing
the original (GSO!) skb. The GSO metadata is not cleared.
4. The ICMP packet being sent back hits the OvS bridge again and because
skb_is_gso returns true, it goes through queue_gso_packets...
5. ...where __skb_gso_segment is called. The skb is then dropped.
6. Note that in the above example on re-transmission the skb won't be a
GSO one as it would be segmented (len > MSS) and the ICMP packet
should go through.
Fix this by resetting the GSO information before reusing an skb in
iptunnel_pmtud_build_icmp and iptunnel_pmtud_build_icmpv6.
Fixes: 4cb47a8644 ("tunnels: PMTU discovery support for directly bridged IP packets")
Reported-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Link: https://patch.msgid.link/20250904125351.159740-1-atenart@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1dbfb03632 ]
Per family bind/unbind callbacks were introduced to allow families
to track multicast group consumer presence, e.g. to start or stop
producing events depending on listeners.
However, in genl_bind() the bind() callback was invoked even if
capability checks failed and ret was set to -EPERM. This means that
callbacks could run on behalf of unauthorized callers while the
syscall still returned failure to user space.
Fix this by only invoking bind() after "if (ret) break;" check
i.e. after permission checks have succeeded.
Fixes: 3de21a8990 ("genetlink: Add per family bind/unbind callbacks")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Link: https://patch.msgid.link/20250905135731.3026965-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6f110a5e4f ]
... and don't error out so hard on missing module descriptions.
Before commit 6c6c1fc09d ("modpost: require a MODULE_DESCRIPTION()")
we used to warn about missing module descriptions, but only when
building with extra warnigns (ie 'W=1').
After that commit the warning became an unconditional hard error.
And it turns out not all modules have been converted despite the claims
to the contrary. As reported by Damian Tometzki, the slub KUnit test
didn't have a module description, and apparently nobody ever really
noticed.
The reason nobody noticed seems to be that the slub KUnit tests get
disabled by SLUB_TINY, which also ends up disabling a lot of other code,
both in tests and in slub itself. And so anybody doing full build tests
didn't actually see this failre.
So let's disable SLUB_TINY for build-only tests, since it clearly ends
up limiting build coverage. Also turn the missing module descriptions
error back into a warning, but let's keep it around for non-'W=1'
builds.
Reported-by: Damian Tometzki <damian@riscv-rocks.de>
Link: https://lore.kernel.org/all/01070196099fd059-e8463438-7b1b-4ec8-816d-173874be9966-000000@eu-central-1.amazonses.com/
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
Fixes: 6c6c1fc09d ("modpost: require a MODULE_DESCRIPTION()")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 535fd4c984 upstream.
When trying to set MCR[2], XON1 is incorrectly accessed instead. And when
writing to the TCR register to configure flow control levels, we are
incorrectly writing to the MSR register. The default value of $00 is then
used for TCR, which means that selectable trigger levels in FCR are used
in place of TCR.
TCR/TLR access requires EFR[4] (enable enhanced functions) and MCR[2]
to be set. EFR[4] is already set in probe().
MCR access requires LCR[7] to be zero.
Since LCR is set to $BF when trying to set MCR[2], XON1 is incorrectly
accessed instead because MCR shares the same address space as XON1.
Since MCR[2] is unmodified and still zero, when writing to TCR we are in
fact writing to MSR because TCR/TLR registers share the same address space
as MSR/SPR.
Fix by first removing useless reconfiguration of EFR[4] (enable enhanced
functions), as it is already enabled in sc16is7xx_probe() since commit
43c51bb573 ("sc16is7xx: make sure device is in suspend once probed").
Now LCR is $00, which means that MCR access is enabled.
Also remove regcache_cache_bypass() calls since we no longer access the
enhanced registers set, and TCR is already declared as volatile (in fact
by declaring MSR as volatile, which shares the same address).
Finally disable access to TCR/TLR registers after modifying them by
clearing MCR[2].
Note: the comment about "... and internal clock div" is wrong and can be
ignored/removed as access to internal clock div registers (DLL/DLH)
is permitted only when LCR[7] is logic 1, not when enhanced features
is enabled. And DLL/DLH access is not needed in sc16is7xx_startup().
Fixes: dfeae619d7 ("serial: sc16is7xx")
Cc: stable@vger.kernel.org
Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Link: https://lore.kernel.org/r/20250731124451.1108864-1-hugo@hugovil.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e895f8e291 upstream.
When testing softirq based hrtimers on an ARM32 board, with high resolution
mode and NOHZ inactive, softirq based hrtimers fail to expire after being
moved away from an offline CPU:
CPU0 CPU1
hrtimer_start(..., HRTIMER_MODE_SOFT);
cpu_down(CPU1) ...
hrtimers_cpu_dying()
// Migrate timers to CPU0
smp_call_function_single(CPU0, returgger_next_event);
retrigger_next_event()
if (!highres && !nohz)
return;
As retrigger_next_event() is a NOOP when both high resolution timers and
NOHZ are inactive CPU0's hrtimer_cpu_base::softirq_expires_next is not
updated and the migrated softirq timers never expire unless there is a
softirq based hrtimer queued on CPU0 later.
Fix this by removing the hrtimer_hres_active() and tick_nohz_active() check
in retrigger_next_event(), which enforces a full update of the CPU base.
As this is not a fast path the extra cost does not matter.
[ tglx: Massaged change log ]
Fixes: 5c0930ccaa ("hrtimers: Push pending hrtimers away from outgoing CPU earlier")
Co-developed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250805081025.54235-1-wangxiongfeng2@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9786531399 ]
[BUG]
With 64K page size (aarch64 with 64K page size config) and 4K btrfs
block size, the following workload can easily lead to a corrupted read:
mkfs.btrfs -f -s 4k $dev > /dev/null
mount -o compress $dev $mnt
xfs_io -f -c "pwrite -S 0xff 0 64k" $mnt/base > /dev/null
echo "correct result:"
od -Ad -t x1 $mnt/base
xfs_io -f -c "reflink $mnt/base 32k 0 32k" \
-c "reflink $mnt/base 0 32k 32k" \
-c "pwrite -S 0xff 60k 4k" $mnt/new > /dev/null
echo "incorrect result:"
od -Ad -t x1 $mnt/new
umount $mnt
This shows the following result:
correct result:
0000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
*
0065536
incorrect result:
0000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
*
0032768 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
0061440 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
*
0065536
Notice the zero in the range [32K, 60K), which is incorrect.
[CAUSE]
With extra trace printk, it shows the following events during od:
(some unrelated info removed like CPU and context)
od-3457 btrfs_do_readpage: enter r/i=5/258 folio=0(65536) prev_em_start=0000000000000000
The "r/i" is indicating the root and inode number. In our case the file
"new" is using ino 258 from fs tree (root 5).
Here notice the @prev_em_start pointer is NULL. This means the
btrfs_do_readpage() is called from btrfs_read_folio(), not from
btrfs_readahead().
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=0 got em start=0 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=4096 got em start=0 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=8192 got em start=0 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=12288 got em start=0 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=16384 got em start=0 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=20480 got em start=0 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=24576 got em start=0 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=28672 got em start=0 len=32768
These above 32K blocks will be read from the first half of the
compressed data extent.
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=32768 got em start=32768 len=32768
Note here there is no btrfs_submit_compressed_read() call. Which is
incorrect now.
Although both extent maps at 0 and 32K are pointing to the same compressed
data, their offsets are different thus can not be merged into the same
read.
So this means the compressed data read merge check is doing something
wrong.
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=36864 got em start=32768 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=40960 got em start=32768 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=45056 got em start=32768 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=49152 got em start=32768 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=53248 got em start=32768 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=57344 got em start=32768 len=32768
od-3457 btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=61440 skip uptodate
od-3457 btrfs_submit_compressed_read: cb orig_bio: file off=0 len=61440
The function btrfs_submit_compressed_read() is only called at the end of
folio read. The compressed bio will only have an extent map of range [0,
32K), but the original bio passed in is for the whole 64K folio.
This will cause the decompression part to only fill the first 32K,
leaving the rest untouched (aka, filled with zero).
This incorrect compressed read merge leads to the above data corruption.
There were similar problems that happened in the past, commit 808f80b467
("Btrfs: update fix for read corruption of compressed and shared
extents") is doing pretty much the same fix for readahead.
But that's back to 2015, where btrfs still only supports bs (block size)
== ps (page size) cases.
This means btrfs_do_readpage() only needs to handle a folio which
contains exactly one block.
Only btrfs_readahead() can lead to a read covering multiple blocks.
Thus only btrfs_readahead() passes a non-NULL @prev_em_start pointer.
With v5.15 kernel btrfs introduced bs < ps support. This breaks the above
assumption that a folio can only contain one block.
Now btrfs_read_folio() can also read multiple blocks in one go.
But btrfs_read_folio() doesn't pass a @prev_em_start pointer, thus the
existing bio force submission check will never be triggered.
In theory, this can also happen for btrfs with large folios, but since
large folio is still experimental, we don't need to bother it, thus only
bs < ps support is affected for now.
[FIX]
Instead of passing @prev_em_start to do the proper compressed extent
check, introduce one new member, btrfs_bio_ctrl::last_em_start, so that
the existing bio force submission logic will always be triggered.
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[ Adjust context ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9e9ff875e4 ]
We recently received a report of poor performance doing sequential
buffered reads of a file with compressed extents. With bs=128k, a naive
sequential dd ran as fast on a compressed file as on an uncompressed
(1.2GB/s on my reproducing system) while with bs<32k, this performance
tanked down to ~300MB/s.
i.e., slow:
dd if=some-compressed-file of=/dev/null bs=4k count=X
vs fast:
dd if=some-compressed-file of=/dev/null bs=128k count=Y
The cause of this slowness is overhead to do with looking up extent_maps
to enable readahead pre-caching on compressed extents
(add_ra_bio_pages()), as well as some overhead in the generic VFS
readahead code we hit more in the slow case. Notably, the main
difference between the two read sizes is that in the large sized request
case, we call btrfs_readahead() relatively rarely while in the smaller
request we call it for every compressed extent. So the fast case stays
in the btrfs readahead loop:
while ((folio = readahead_folio(rac)) != NULL)
btrfs_do_readpage(folio, &em_cached, &bio_ctrl, &prev_em_start);
where the slower one breaks out of that loop every time. This results in
calling add_ra_bio_pages a lot, doing lots of extent_map lookups,
extent_map locking, etc.
This happens because although add_ra_bio_pages() does add the
appropriate un-compressed file pages to the cache, it does not
communicate back to the ractl in any way. To solve this, we should be
using readahead_expand() to signal to readahead to expand the readahead
window.
This change passes the readahead_control into the btrfs_bio_ctrl and in
the case of compressed reads sets the expansion to the size of the
extent_map we already looked up anyway. It skips the subpage case as
that one already doesn't do add_ra_bio_pages().
With this change, whether we use bs=4k or bs=128k, btrfs expands the
readahead window up to the largest compressed extent we have seen so far
(in the trivial example: 128k) and the call stacks of the two modes look
identical. Notably, we barely call add_ra_bio_pages at all. And the
performance becomes identical as well. So this change certainly "fixes"
this performance problem.
Of course, it does seem to beg a few questions:
1. Will this waste too much page cache with a too large ra window?
2. Will this somehow cause bugs prevented by the more thoughtful
checking in add_ra_bio_pages?
3. Should we delete add_ra_bio_pages?
My stabs at some answers:
1. Hard to say. See attempts at generic performance testing below. Is
there a "readahead_shrink" we should be using? Should we expand more
slowly, by half the remaining em size each time?
2. I don't think so. Since the new behavior is indistinguishable from
reading the file with a larger read size passed in, I don't see why
one would be safe but not the other.
3. Probably! I tested that and it was fine in fstests, and it seems like
the pages would get re-used just as well in the readahead case.
However, it is possible some reads that use page cache but not
btrfs_readahead() could suffer. I will investigate this further as a
follow up.
I tested the performance implications of this change in 3 ways (using
compress-force=zstd:3 for compression):
Directly test the affected workload of small sequential reads on a
compressed file (improved from ~250MB/s to ~1.2GB/s)
==========for-next==========
dd /mnt/lol/non-cmpr 4k
1048576+0 records in
1048576+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 6.02983 s, 712 MB/s
dd /mnt/lol/non-cmpr 128k
32768+0 records in
32768+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 5.92403 s, 725 MB/s
dd /mnt/lol/cmpr 4k
1048576+0 records in
1048576+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 17.8832 s, 240 MB/s
dd /mnt/lol/cmpr 128k
32768+0 records in
32768+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 3.71001 s, 1.2 GB/s
==========ra-expand==========
dd /mnt/lol/non-cmpr 4k
1048576+0 records in
1048576+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 6.09001 s, 705 MB/s
dd /mnt/lol/non-cmpr 128k
32768+0 records in
32768+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 6.07664 s, 707 MB/s
dd /mnt/lol/cmpr 4k
1048576+0 records in
1048576+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 3.79531 s, 1.1 GB/s
dd /mnt/lol/cmpr 128k
32768+0 records in
32768+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 3.69533 s, 1.2 GB/s
Built the linux kernel from clean (no change)
Ran fsperf. Mostly neutral results with some improvements and
regressions here and there.
Reported-by: Dimitrios Apostolou <jimis@gmx.net>
Link: https://lore.kernel.org/linux-btrfs/34601559-6c16-6ccc-1793-20a97ca0dbba@gmx.net/
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
[ Assert doesn't take a format string ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 21cc2b5c50 ]
When restoring a reservation for an anonymous page, we need to check to
freeing a surplus. However, __unmap_hugepage_range() causes data race
because it reads h->surplus_huge_pages without the protection of
hugetlb_lock.
And adjust_reservation is a boolean variable that indicates whether
reservations for anonymous pages in each folio should be restored.
Therefore, it should be initialized to false for each round of the loop.
However, this variable is not initialized to false except when defining
the current adjust_reservation variable.
This means that once adjust_reservation is set to true even once within
the loop, reservations for anonymous pages will be restored
unconditionally in all subsequent rounds, regardless of the folio's state.
To fix this, we need to add the missing hugetlb_lock, unlock the
page_table_lock earlier so that we don't lock the hugetlb_lock inside the
page_table_lock lock, and initialize adjust_reservation to false on each
round within the loop.
Link: https://lkml.kernel.org/r/20250823182115.1193563-1-aha310510@gmail.com
Fixes: df7a6d1f64 ("mm/hugetlb: restore the reservation if needed")
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Reported-by: syzbot+417aeb05fd190f3a6da9@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=417aeb05fd190f3a6da9
Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Breno Leitao <leitao@debian.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Page vs folio differences ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3260a3f082 upstream.
state_show() reads kdamond->damon_ctx without holding damon_sysfs_lock.
This allows a use-after-free race:
CPU 0 CPU 1
----- -----
state_show() damon_sysfs_turn_damon_on()
ctx = kdamond->damon_ctx; mutex_lock(&damon_sysfs_lock);
damon_destroy_ctx(kdamond->damon_ctx);
kdamond->damon_ctx = NULL;
mutex_unlock(&damon_sysfs_lock);
damon_is_running(ctx); /* ctx is freed */
mutex_lock(&ctx->kdamond_lock); /* UAF */
(The race can also occur with damon_sysfs_kdamonds_rm_dirs() and
damon_sysfs_kdamond_release(), which free or replace the context under
damon_sysfs_lock.)
Fix by taking damon_sysfs_lock before dereferencing the context, mirroring
the locking used in pid_show().
The bug has existed since state_show() first accessed kdamond->damon_ctx.
Link: https://lkml.kernel.org/r/20250905101046.2288-1-disclosure@aisle.com
Fixes: a61ea561c8 ("mm/damon/sysfs: link DAMON for virtual address spaces monitoring")
Signed-off-by: Stanislav Fort <disclosure@aisle.com>
Reported-by: Stanislav Fort <disclosure@aisle.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bec324f33d upstream.
When the parent directory's i_rwsem is not locked, req->r_parent may become
stale due to concurrent operations (e.g. rename) between dentry lookup and
message creation. Validate that r_parent matches the encoded parent inode
and update to the correct inode if a mismatch is detected.
[ idryomov: folded a follow-up fix from Alex to drop extra reference
from ceph_get_reply_dir() in ceph_fill_trace():
ceph_get_reply_dir() may return a different, referenced inode when
r_parent is stale and the parent directory lock is not held.
ceph_fill_trace() used that inode but failed to drop the reference
when it differed from req->r_parent, leaking an inode reference.
Keep the directory inode in a local variable and iput() it at
function end if it does not match req->r_parent. ]
Cc: stable@vger.kernel.org
Signed-off-by: Alex Markuze <amarkuze@redhat.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 15f519e9f8 upstream.
Add validation to ensure the cached parent directory inode matches the
directory info in MDS replies. This prevents client-side race conditions
where concurrent operations (e.g. rename) cause r_parent to become stale
between request initiation and reply processing, which could lead to
applying state changes to incorrect directory inodes.
[ idryomov: folded a kerneldoc fixup and a follow-up fix from Alex to
move CEPH_CAP_PIN reference when r_parent is updated:
When the parent directory lock is not held, req->r_parent can become
stale and is updated to point to the correct inode. However, the
associated CEPH_CAP_PIN reference was not being adjusted. The
CEPH_CAP_PIN is a reference on an inode that is tracked for
accounting purposes. Moving this pin is important to keep the
accounting balanced. When the pin was not moved from the old parent
to the new one, it created two problems: The reference on the old,
stale parent was never released, causing a reference leak.
A reference for the new parent was never acquired, creating the risk
of a reference underflow later in ceph_mdsc_release_request(). This
patch corrects the logic by releasing the pin from the old parent and
acquiring it for the new parent when r_parent is switched. This
ensures reference accounting stays balanced. ]
Cc: stable@vger.kernel.org
Signed-off-by: Alex Markuze <amarkuze@redhat.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cdbc9836c7 upstream.
There is a place where generic code in messenger.c is reading and
another place where it is writing to con->v1 union member without
checking that the union member is active (i.e. msgr1 is in use).
On 64-bit systems, con->v1.auth_retry overlaps with con->v2.out_iter,
so such a read is almost guaranteed to return a bogus value instead of
0 when msgr2 is in use. This ends up being fairly benign because the
side effect is just the invalidation of the authorizer and successive
fetching of new tickets.
con->v1.connect_seq overlaps with con->v2.conn_bufs and the fact that
it's being written to can cause more serious consequences, but luckily
it's not something that happens often.
Cc: stable@vger.kernel.org
Fixes: cd1a677cad ("libceph, ceph: implement msgr2.1 protocol (crc and secure modes)")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5c87fee3c9 upstream.
VRAM+TT bos that are evicted from VRAM to TT may remain in
TT also after a revalidation following eviction or suspend.
This manifests itself as applications becoming sluggish
after buffer objects get evicted or after a resume from
suspend or hibernation.
If the bo supports placement in both VRAM and TT, and
we are on DGFX, mark the TT placement as fallback. This means
that it is tried only after VRAM + eviction.
This flaw has probably been present since the xe module was
upstreamed but use a Fixes: commit below where backporting is
likely to be simple. For earlier versions we need to open-
code the fallback algorithm in the driver.
v2:
- Remove check for dgfx. (Matthew Auld)
- Update the xe_dma_buf kunit test for the new strategy (CI)
- Allow dma-buf to pin in current placement (CI)
- Make xe_bo_validate() for pinned bos a NOP.
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5995
Fixes: a78a8da51b ("drm/ttm: replace busy placement with flags v6")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: <stable@vger.kernel.org> # v6.9+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250904160715.2613-2-thomas.hellstrom@linux.intel.com
(cherry picked from commit cb3d7b3b46)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 711f19dfd7 upstream.
Patch series "mm/damon: avoid divide-by-zero in DAMON module's parameters
application".
DAMON's RECLAIM and LRU_SORT modules perform no validation on
user-configured parameters during application, which may lead to
division-by-zero errors.
Avoid the divide-by-zero by adding validation checks when DAMON modules
attempt to apply the parameters.
This patch (of 2):
During the calculation of 'hot_thres' and 'cold_thres', either
'sample_interval' or 'aggr_interval' is used as the divisor, which may
lead to division-by-zero errors. Fix it by directly returning -EINVAL
when such a case occurs. Additionally, since 'aggr_interval' is already
required to be set no smaller than 'sample_interval' in damon_set_attrs(),
only the case where 'sample_interval' is zero needs to be checked.
Link: https://lkml.kernel.org/r/20250827115858.1186261-2-yanquanmin1@huawei.com
Fixes: 40e983cca9 ("mm/damon: introduce DAMON-based LRU-lists Sorting")
Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: ze zuo <zuoze1@huawei.com>
Cc: <stable@vger.kernel.org> [6.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ce652aac9c upstream.
Kernel initializes the "jiffies" timer as 5 minutes below zero, as shown
in include/linux/jiffies.h
/*
* Have the 32 bit jiffies value wrap 5 minutes after boot
* so jiffies wrap bugs show up earlier.
*/
#define INITIAL_JIFFIES ((unsigned long)(unsigned int) (-300*HZ))
And jiffies comparison help functions cast unsigned value to signed to
cover wraparound
#define time_after_eq(a,b) \
(typecheck(unsigned long, a) && \
typecheck(unsigned long, b) && \
((long)((a) - (b)) >= 0))
When quota->charged_from is initialized to 0, time_after_eq() can
incorrectly return FALSE even after reset_interval has elapsed. This
occurs when (jiffies - reset_interval) produces a value with MSB=1, which
is interpreted as negative in signed arithmetic.
This issue primarily affects 32-bit systems because: On 64-bit systems:
MSB=1 values occur after ~292 million years from boot (assuming HZ=1000),
almost impossible.
On 32-bit systems: MSB=1 values occur during the first 5 minutes after
boot, and the second half of every jiffies wraparound cycle, starting from
day 25 (assuming HZ=1000)
When above unexpected FALSE return from time_after_eq() occurs, the
charging window will not reset. The user impact depends on esz value at
that time.
If esz is 0, scheme ignores configured quotas and runs without any limits.
If esz is not 0, scheme stops working once the quota is exhausted. It
remains until the charging window finally resets.
So, change quota->charged_from to jiffies at damos_adjust_quota() when it
is considered as the first charge window. By this change, we can avoid
unexpected FALSE return from time_after_eq()
Link: https://lkml.kernel.org/r/20250822025057.1740854-1-ekffu200098@gmail.com
Fixes: 2b8a248d58 ("mm/damon/schemes: implement size quota for schemes application speed control") # 5.16
Signed-off-by: Sang-Heon Jeon <ekffu200098@gmail.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1e08938c36 upstream.
The FUSE protocol uses struct fuse_write_out to convey the return value of
copy_file_range, which is restricted to uint32_t. But the COPY_FILE_RANGE
interface supports a 64-bit size copies.
Currently the number of bytes copied is silently truncated to 32-bit, which
may result in poor performance or even failure to copy in case of
truncation to zero.
Reported-by: Florian Weimer <fweimer@redhat.com>
Closes: https://lore.kernel.org/all/lhuh5ynl8z5.fsf@oldenburg.str.redhat.com/
Fixes: 88bc7d5097 ("fuse: add support for copy_file_range()")
Cc: <stable@vger.kernel.org> # v4.20
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 811c0da454 upstream.
In case OOB write is requested during a data write, ECC is currently
lost. Avoid this issue by only writing in the free spare area.
This issue has been seen with a YAFFS2 file system.
Signed-off-by: Christophe Kerello <christophe.kerello@foss.st.com>
Cc: stable@vger.kernel.org
Fixes: 2cd457f328 ("mtd: rawnand: stm32_fmc2: add STM32 FMC2 NAND flash controller driver")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5537a46794 upstream.
Drop phylink_{suspend,resume}() from ax88772 PM callbacks.
MDIO bus accesses have their own runtime-PM handling and will try to
wake the device if it is suspended. Such wake attempts must not happen
from PM callbacks while the device PM lock is held. Since phylink
{sus|re}sume may trigger MDIO, it must not be called in PM context.
No extra phylink PM handling is required for this driver:
- .ndo_open/.ndo_stop control the phylink start/stop lifecycle.
- ethtool/phylib entry points run in process context, not PM.
- phylink MAC ops program the MAC on link changes after resume.
Fixes: e0bffe3e68 ("net: asix: ax88772: migrate to phylink")
Reported-by: Hubert Wiśniewski <hubert.wisniewski.25632@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Tested-by: Hubert Wiśniewski <hubert.wisniewski.25632@gmail.com>
Tested-by: Xu Yang <xu.yang_2@nxp.com>
Link: https://patch.msgid.link/20250908112619.2900723-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f6a6c28005 upstream.
There is a race condition between inode eviction and inode caching that
can cause a live struct btrfs_inode to be missing from the root->inodes
xarray. Specifically, there is a window during evict() between the inode
being unhashed and deleted from the xarray. If btrfs_iget() is called
for the same inode in that window, it will be recreated and inserted
into the xarray, but then eviction will delete the new entry, leaving
nothing in the xarray:
Thread 1 Thread 2
---------------------------------------------------------------
evict()
remove_inode_hash()
btrfs_iget_path()
btrfs_iget_locked()
btrfs_read_locked_inode()
btrfs_add_inode_to_root()
destroy_inode()
btrfs_destroy_inode()
btrfs_del_inode_from_root()
__xa_erase
In turn, this can cause issues for subvolume deletion. Specifically, if
an inode is in this lost state, and all other inodes are evicted, then
btrfs_del_inode_from_root() will call btrfs_add_dead_root() prematurely.
If the lost inode has a delayed_node attached to it, then when
btrfs_clean_one_deleted_snapshot() calls btrfs_kill_all_delayed_nodes(),
it will loop forever because the delayed_nodes xarray will never become
empty (unless memory pressure forces the inode out). We saw this
manifest as soft lockups in production.
Fix it by only deleting the xarray entry if it matches the given inode
(using __xa_cmpxchg()).
Fixes: 310b2f5d5a ("btrfs: use an xarray to track open inodes in a root")
Cc: stable@vger.kernel.org # 6.11+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Co-authored-by: Leo Martins <loemra.dev@gmail.com>
Signed-off-by: Leo Martins <loemra.dev@gmail.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit de134cb54c upstream.
The following workload on a squota enabled fs:
btrfs subvol create mnt/subvol
# ensure subvol extents get accounted
sync
btrfs qgroup create 1/1 mnt
btrfs qgroup assign mnt/subvol 1/1 mnt
btrfs qgroup delete mnt/subvol
# make the cleaner thread run
btrfs filesystem sync mnt
sleep 1
btrfs filesystem sync mnt
btrfs qgroup destroy 1/1 mnt
will fail with EBUSY. The reason is that 1/1 does the quick accounting
when we assign subvol to it, gaining its exclusive usage as excl and
excl_cmpr. But then when we delete subvol, the decrement happens via
record_squota_delta() which does not update excl_cmpr, as squotas does
not make any distinction between compressed and normal extents. Thus,
we increment excl_cmpr but never decrement it, and are unable to delete
1/1. The two possible fixes are to make squota always mirror excl and
excl_cmpr or to make the fast accounting separately track the plain and
cmpr numbers. The latter felt cleaner to me so that is what I opted for.
Fixes: 1e0e9d5771 ("btrfs: add helper for recording simple quota deltas")
CC: stable@vger.kernel.org # 6.12+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 648de37416 upstream.
Users reported a scenario where MPTCP connections that were configured
with SO_KEEPALIVE prior to connect would fail to enable their keepalives
if MTPCP fell back to TCP mode.
After investigating, this affects keepalives for any connection where
sync_socket_options is called on a socket that is in the closed or
listening state. Joins are handled properly. For connects,
sync_socket_options is called when the socket is still in the closed
state. The tcp_set_keepalive() function does not act on sockets that
are closed or listening, hence keepalive is not immediately enabled.
Since the SO_KEEPOPEN flag is absent, it is not enabled later in the
connect sequence via tcp_finish_connect. Setting the keepalive via
sockopt after connect does work, but would not address any subsequently
created flows.
Fortunately, the fix here is straight-forward: set SOCK_KEEPOPEN on the
subflow when calling sync_socket_options.
The fix was valdidated both by using tcpdump to observe keepalive
packets not being sent before the fix, and being sent after the fix. It
was also possible to observe via ss that the keepalive timer was not
enabled on these sockets before the fix, but was enabled afterwards.
Fixes: 1b3e7ede13 ("mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY")
Cc: stable@vger.kernel.org
Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/aL8dYfPZrwedCIh9@templeofstupid.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3fac212fe4 upstream.
Clang 22 recently added support for defining __SANITIZE__ macros similar
to GCC [1], which causes warnings (or errors with CONFIG_WERROR=y or W=e)
with the existing defines that the kernel creates to emulate this behavior
with existing clang versions.
In file included from <built-in>:3:
In file included from include/linux/compiler_types.h:171:
include/linux/compiler-clang.h:37:9: error: '__SANITIZE_THREAD__' macro redefined [-Werror,-Wmacro-redefined]
37 | #define __SANITIZE_THREAD__
| ^
<built-in>:352:9: note: previous definition is here
352 | #define __SANITIZE_THREAD__ 1
| ^
Refactor compiler-clang.h to only define the sanitizer macros when they
are undefined and adjust the rest of the code to use these macros for
checking if the sanitizers are enabled, clearing up the warnings and
allowing the kernel to easily drop these defines when the minimum
supported version of LLVM for building the kernel becomes 22.0.0 or newer.
Link: https://lkml.kernel.org/r/20250902-clang-update-sanitize-defines-v1-1-cf3702ca3d92@kernel.org
Link: 568c23bbd3 [1]
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Justin Stitt <justinstitt@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Bill Wendling <morbo@google.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a3967baad4 ]
syzbot reported the splat below. [0]
The repro does the following:
1. Load a sk_msg prog that calls bpf_msg_cork_bytes(msg, cork_bytes)
2. Attach the prog to a SOCKMAP
3. Add a socket to the SOCKMAP
4. Activate fault injection
5. Send data less than cork_bytes
At 5., the data is carried over to the next sendmsg() as it is
smaller than the cork_bytes specified by bpf_msg_cork_bytes().
Then, tcp_bpf_send_verdict() tries to allocate psock->cork to hold
the data, but this fails silently due to fault injection + __GFP_NOWARN.
If the allocation fails, we need to revert the sk->sk_forward_alloc
change done by sk_msg_alloc().
Let's call sk_msg_free() when tcp_bpf_send_verdict fails to allocate
psock->cork.
The "*copied" also needs to be updated such that a proper error can
be returned to the caller, sendmsg. It fails to allocate psock->cork.
Nothing has been corked so far, so this patch simply sets "*copied"
to 0.
[0]:
WARNING: net/ipv4/af_inet.c:156 at inet_sock_destruct+0x623/0x730 net/ipv4/af_inet.c:156, CPU#1: syz-executor/5983
Modules linked in:
CPU: 1 UID: 0 PID: 5983 Comm: syz-executor Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025
RIP: 0010:inet_sock_destruct+0x623/0x730 net/ipv4/af_inet.c:156
Code: 0f 0b 90 e9 62 fe ff ff e8 7a db b5 f7 90 0f 0b 90 e9 95 fe ff ff e8 6c db b5 f7 90 0f 0b 90 e9 bb fe ff ff e8 5e db b5 f7 90 <0f> 0b 90 e9 e1 fe ff ff 89 f9 80 e1 07 80 c1 03 38 c1 0f 8c 9f fc
RSP: 0018:ffffc90000a08b48 EFLAGS: 00010246
RAX: ffffffff8a09d0b2 RBX: dffffc0000000000 RCX: ffff888024a23c80
RDX: 0000000000000100 RSI: 0000000000000fff RDI: 0000000000000000
RBP: 0000000000000fff R08: ffff88807e07c627 R09: 1ffff1100fc0f8c4
R10: dffffc0000000000 R11: ffffed100fc0f8c5 R12: ffff88807e07c380
R13: dffffc0000000000 R14: ffff88807e07c60c R15: 1ffff1100fc0f872
FS: 00005555604c4500(0000) GS:ffff888125af1000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005555604df5c8 CR3: 0000000032b06000 CR4: 00000000003526f0
Call Trace:
<IRQ>
__sk_destruct+0x86/0x660 net/core/sock.c:2339
rcu_do_batch kernel/rcu/tree.c:2605 [inline]
rcu_core+0xca8/0x1770 kernel/rcu/tree.c:2861
handle_softirqs+0x286/0x870 kernel/softirq.c:579
__do_softirq kernel/softirq.c:613 [inline]
invoke_softirq kernel/softirq.c:453 [inline]
__irq_exit_rcu+0xca/0x1f0 kernel/softirq.c:680
irq_exit_rcu+0x9/0x30 kernel/softirq.c:696
instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1052 [inline]
sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1052
</IRQ>
Fixes: 4f738adba3 ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data")
Reported-by: syzbot+4cabd1d2fa917a456db8@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/68c0b6b5.050a0220.3c6139.0013.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250909232623.4151337-1-kuniyu@google.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6d78b4473c ]
Currently, calling bpf_map_kmalloc_node() from __bpf_async_init() can
cause various locking issues; see the following stack trace (edited for
style) as one example:
...
[10.011566] do_raw_spin_lock.cold
[10.011570] try_to_wake_up (5) double-acquiring the same
[10.011575] kick_pool rq_lock, causing a hardlockup
[10.011579] __queue_work
[10.011582] queue_work_on
[10.011585] kernfs_notify
[10.011589] cgroup_file_notify
[10.011593] try_charge_memcg (4) memcg accounting raises an
[10.011597] obj_cgroup_charge_pages MEMCG_MAX event
[10.011599] obj_cgroup_charge_account
[10.011600] __memcg_slab_post_alloc_hook
[10.011603] __kmalloc_node_noprof
...
[10.011611] bpf_map_kmalloc_node
[10.011612] __bpf_async_init
[10.011615] bpf_timer_init (3) BPF calls bpf_timer_init()
[10.011617] bpf_prog_xxxxxxxxxxxxxxxx_fcg_runnable
[10.011619] bpf__sched_ext_ops_runnable
[10.011620] enqueue_task_scx (2) BPF runs with rq_lock held
[10.011622] enqueue_task
[10.011626] ttwu_do_activate
[10.011629] sched_ttwu_pending (1) grabs rq_lock
...
The above was reproduced on bpf-next (b338cf849e) by modifying
./tools/sched_ext/scx_flatcg.bpf.c to call bpf_timer_init() during
ops.runnable(), and hacking the memcg accounting code a bit to make
a bpf_timer_init() call more likely to raise an MEMCG_MAX event.
We have also run into other similar variants (both internally and on
bpf-next), including double-acquiring cgroup_file_kn_lock, the same
worker_pool::lock, etc.
As suggested by Shakeel, fix this by using __GFP_HIGH instead of
GFP_ATOMIC in __bpf_async_init(), so that e.g. if try_charge_memcg()
raises an MEMCG_MAX event, we call __memcg_memory_event() with
@allow_spinning=false and avoid calling cgroup_file_notify() there.
Depends on mm patch
"memcg: skip cgroup_file_notify if spinning is not allowed":
https://lore.kernel.org/bpf/20250905201606.66198-1-shakeel.butt@linux.dev/
v0 approach s/bpf_map_kmalloc_node/bpf_mem_alloc/
https://lore.kernel.org/bpf/20250905061919.439648-1-yepeilin@google.com/
v1 approach:
https://lore.kernel.org/bpf/20250905234547.862249-1-yepeilin@google.com/
Fixes: b00628b1c7 ("bpf: Introduce bpf timers.")
Suggested-by: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Peilin Ye <yepeilin@google.com>
Link: https://lore.kernel.org/r/20250909095222.2121438-1-yepeilin@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit df0cb5cb50 ]
OpenWRT users reported regression on ARMv6 devices after updating to latest
HEAD, where tcpdump filter:
tcpdump "not ether host 3c37121a2b3c and not ether host 184ecbca2a3a \
and not ether host 14130b4d3f47 and not ether host f0f61cf440b7 \
and not ether host a84b4dedf471 and not ether host d022be17e1d7 \
and not ether host 5c497967208b and not ether host 706655784d5b"
fails with warning: "Kernel filter failed: No error information"
when using config:
# CONFIG_BPF_JIT_ALWAYS_ON is not set
CONFIG_BPF_JIT_DEFAULT_ON=y
The issue arises because commits:
1. "bpf: Fix array bounds error with may_goto" changed default runtime to
__bpf_prog_ret0_warn when jit_requested = 1
2. "bpf: Avoid __bpf_prog_ret0_warn when jit fails" returns error when
jit_requested = 1 but jit fails
This change restores interpreter fallback capability for BPF programs with
stack size <= 512 bytes when jit fails.
Reported-by: Felix Fietkau <nbd@nbd.name>
Closes: https://lore.kernel.org/bpf/2e267b4b-0540-45d8-9310-e127bf95fc63@nbd.name/
Fixes: 6ebc5030e0 ("bpf: Fix array bounds error with may_goto")
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250909144614.2991253-1-kafai.wan@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f9bb6ffa7f ]
Stanislav reported that in bpf_crypto_crypt() the destination dynptr's
size is not validated to be at least as large as the source dynptr's
size before calling into the crypto backend with 'len = src_len'. This
can result in an OOB write when the destination is smaller than the
source.
Concretely, in mentioned function, psrc and pdst are both linear
buffers fetched from each dynptr:
psrc = __bpf_dynptr_data(src, src_len);
[...]
pdst = __bpf_dynptr_data_rw(dst, dst_len);
[...]
err = decrypt ?
ctx->type->decrypt(ctx->tfm, psrc, pdst, src_len, piv) :
ctx->type->encrypt(ctx->tfm, psrc, pdst, src_len, piv);
The crypto backend expects pdst to be large enough with a src_len length
that can be written. Add an additional src_len > dst_len check and bail
out if it's the case. Note that these kfuncs are accessible under root
privileges only.
Fixes: 3e1c6f3540 ("bpf: make common crypto API for TC/XDP programs")
Reported-by: Stanislav Fort <disclosure@aisle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://lore.kernel.org/r/20250829143657.318524-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ce97123324 ]
Deny all sampling event by the CPUMF counter facility device driver
and return -ENOENT. This return value is used to try other PMUs.
Up to now events for type PERF_TYPE_HARDWARE were not tested for
sampling and returned later on -EOPNOTSUPP. This ends the search
for alternative PMUs. Change that behavior and try other PMUs
instead.
Fixes: 613a41b0d1 ("s390/cpum_cf: Reject request for sampling in event initialization")
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 85941afd2c ]
Each PAI PMU device driver returns -EINVAL when an event is out of
its accepted range. This return value aborts the search for an
alternative PMU device driver to handle this event.
Change the return value to -ENOENT. This return value is used to
try other PMUs instead. This makes the PMUs more robust when
the sequence of PMU device driver initialization changes (at boot time)
or by using modules.
Fixes: 39d62336f5 ("s390/pai: add support for cryptography counters")
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dd2fa82473 ]
Typo in ff_lseg_match_mirrors makes the diff ineffective. This results
in merge happening all the time. Merge happening all the time is
problematic because it marks lsegs invalid. Marking lsegs invalid
causes all outstanding IO to get restarted with EAGAIN and connections
to get closed.
Closing connections constantly triggers race conditions in the RDMA
implementation...
Fixes: 660d1eb223 ("pNFS/flexfile: Don't merge layout segments if the mirrors don't match")
Signed-off-by: Jonathan Curley <jcurley@purestorage.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b7b8574225 ]
If we're truncating part of the folio, then we need to write out the
data on the part that is not covered by the cancellation.
Fixes: d47992f86b ("mm: change invalidatepage prototype to accept length")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ca247c8990 ]
Ensure that all O_DIRECT reads and writes complete before copying a file
range, so that the destination is up to date.
Fixes: a5864c999d ("NFS: Do not serialise O_DIRECT reads and writes")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c80ebeba11 ]
Ensure that all O_DIRECT reads and writes complete before cloning a file
range, so that both the source and destination are up to date.
Fixes: a5864c999d ("NFS: Do not serialise O_DIRECT reads and writes")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b93128f297 ]
Ensure that all O_DIRECT reads and writes complete before calling
fallocate so that we don't race w.r.t. attribute updates.
Fixes: 99f2378322 ("NFSv4.2: Always flush out writes in nfs42_proc_fallocate()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9eb90f4354 ]
Ensure that all O_DIRECT reads and writes are complete, and prevent the
initiation of new i/o until the setattr operation that will truncate the
file is complete.
Fixes: a5864c999d ("NFS: Do not serialise O_DIRECT reads and writes")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 38a125b315 ]
This allows killing processes that wait for a lock when one process is
stuck waiting for the NFS server. This aims to complete the coverage
of NFS operations being killable, like nfs_direct_wait() does, for
example.
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Stable-dep-of: 9eb90f4354 ("NFS: Serialise O_DIRECT i/o and truncate()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3feec68563 ]
This commit simply adds the required O_DIRECT plumbing. It doesn't
address the fact that NFS doesn't ensure all writes are page aligned
(nor device logical block size aligned as required by O_DIRECT).
Because NFS will read-modify-write for IO that isn't aligned, LOCALIO
will not use O_DIRECT semantics by default if/when an application
requests the use of O_DIRECT. Allow the use of O_DIRECT semantics by:
1: Adding a flag to the nfs_pgio_header struct to allow the NFS
O_DIRECT layer to signal that O_DIRECT was used by the application
2: Adding a 'localio_O_DIRECT_semantics' NFS module parameter that
when enabled will cause LOCALIO to use O_DIRECT semantics (this may
cause IO to fail if applications do not properly align their IO).
This commit is derived from code developed by Weston Andros Adamson.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Stable-dep-of: 992203a1fb ("nfs/localio: restore creds before releasing pageio data")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0978e5b85f ]
Push the read_iter and write_iter availability checks down to
nfs_do_local_read and nfs_do_local_write respectively.
This eliminates a redundant nfs_to->nfsd_file_file() call.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Stable-dep-of: 992203a1fb ("nfs/localio: restore creds before releasing pageio data")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3d62ab32df ]
Both tracing_mark_write and tracing_mark_raw_write call
__copy_from_user_inatomic during preempt_disable. But in some case,
__copy_from_user_inatomic may trigger page fault, and will call schedule()
subtly. And if a task is migrated to other cpu, the following warning will
be trigger:
if (RB_WARN_ON(cpu_buffer,
!local_read(&cpu_buffer->committing)))
An example can illustrate this issue:
process flow CPU
---------------------------------------------------------------------
tracing_mark_raw_write(): cpu:0
...
ring_buffer_lock_reserve(): cpu:0
...
cpu = raw_smp_processor_id() cpu:0
cpu_buffer = buffer->buffers[cpu] cpu:0
...
...
__copy_from_user_inatomic(): cpu:0
...
# page fault
do_mem_abort(): cpu:0
...
# Call schedule
schedule() cpu:0
...
# the task schedule to cpu1
__buffer_unlock_commit(): cpu:1
...
ring_buffer_unlock_commit(): cpu:1
...
cpu = raw_smp_processor_id() cpu:1
cpu_buffer = buffer->buffers[cpu] cpu:1
As shown above, the process will acquire cpuid twice and the return values
are not the same.
To fix this problem using copy_from_user_nofault instead of
__copy_from_user_inatomic, as the former performs 'access_ok' before
copying.
Link: https://lore.kernel.org/20250819105152.2766363-1-luogengkun@huaweicloud.com
Fixes: 656c7f0d2d ("tracing: Replace kmap with copy_from_user() in trace_marker writing")
Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4fb2b677fc ]
nfs_server_set_fsinfo() shouldn't assume that NFS_CAP_XATTR is unset
on entry to the function.
Fixes: b78ef845c3 ("NFSv4.2: query the server for extended attribute support")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b3ac334360 ]
_nfs4_server_capabilities() should clear capabilities that are not
supported by the server.
Fixes: d2a00cceb9 ("NFSv4: Detect support for OPEN4_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dd5a8621b8 ]
_nfs4_server_capabilities() is expected to clear any flags that are not
supported by the server.
Fixes: 8a59bb93b7 ("NFSv4 store server support for fs_location attribute")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 31f1a960ad ]
Don't clear the capabilities that are not going to get reset by the call
to _nfs4_server_capabilities().
Reported-by: Scott Haiden <scott.b.haiden@gmail.com>
Fixes: b01f21cacd ("NFS: Fix the setting of capabilities when automounting a new filesystem")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9559d2fffd ]
xs_sock_recv_cmsg was failing to call xs_sock_process_cmsg for any cmsg
type other than TLS_RECORD_TYPE_ALERT (TLS_RECORD_TYPE_DATA, and other
values not handled.) Based on my reading of the previous commit
(cc5d5908: sunrpc: fix client side handling of tls alerts), it looks
like only iov_iter_revert should be conditional on TLS_RECORD_TYPE_ALERT
(but that other cmsg types should still call xs_sock_process_cmsg). On
my machine, I was unable to connect (over mtls) to an NFS share hosted
on FreeBSD. With this patch applied, I am able to mount the share again.
Fixes: cc5d59081f ("sunrpc: fix client side handling of tls alerts")
Signed-off-by: Justin Worrell <jworrell@gmail.com>
Reviewed-and-tested-by: Scott Mayhew <smayhew@redhat.com>
Link: https://lore.kernel.org/r/20250904211038.12874-3-jworrell@gmail.com
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5a46d2339a ]
Recent commit f06bedfa62 ("pNFS/flexfiles: don't attempt pnfs on fatal DS
errors") has changed the error return type of ff_layout_choose_ds_for_read() from
NULL to an error pointer. However, not all code paths have been updated
to match the change. Thus, some non-NULL checks will accept error pointers
as a valid return value.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Suggested-by: Dan Carpenter <dan.carpenter@linaro.org>
Fixes: f06bedfa62 ("pNFS/flexfiles: don't attempt pnfs on fatal DS errors")
Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 829ee558f3 ]
ASUS VivoBook X515UA with PCI SSID 1043:106f had a default quirk
pickup via pin table that applies ALC256_FIXUP_ASUS_MIC, but this adds
a bogus built-in mic pin 0x13 enabled. This was no big problem
because the pin 0x13 was assigned as the secondary mic, but the recent
fix made the entries sorted, hence this bogus pin appeared now as the
primary input and it broke.
For fixing the bug, put the right quirk entry for this device pointing
to ALC256_FIXUP_ASUS_MIC_NO_PRESENCE.
Fixes: 3b4309546b ("ALSA: hda: Fix headset detection failure due to unstable sort")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219897
Link: https://patch.msgid.link/20250324153233.21195-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a5d258a00b ]
This reverts commit 88c7c56d07c108ed4de319c8dba44aa4b8a38dd1.
SW and HW state are not always matching in some cases causing cursor to
be disabled.
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit da29abe71e ]
The function amdgpu_dm_crtc_mem_type_changed was dereferencing pointers
returned by drm_atomic_get_plane_state without checking for errors. This
could lead to undefined behavior if the function returns an error pointer.
This commit adds checks using IS_ERR to ensure that new_plane_state and
old_plane_state are valid before dereferencing them.
Fixes the below:
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:11486 amdgpu_dm_crtc_mem_type_changed()
error: 'new_plane_state' dereferencing possible ERR_PTR()
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c
11475 static bool amdgpu_dm_crtc_mem_type_changed(struct drm_device *dev,
11476 struct drm_atomic_state *state,
11477 struct drm_crtc_state *crtc_state)
11478 {
11479 struct drm_plane *plane;
11480 struct drm_plane_state *new_plane_state, *old_plane_state;
11481
11482 drm_for_each_plane_mask(plane, dev, crtc_state->plane_mask) {
11483 new_plane_state = drm_atomic_get_plane_state(state, plane);
11484 old_plane_state = drm_atomic_get_plane_state(state, plane);
^^^^^^^^^^^^^^^^^^^^^^^^^^ These functions can fail.
11485
--> 11486 if (old_plane_state->fb && new_plane_state->fb &&
11487 get_mem_type(old_plane_state->fb) != get_mem_type(new_plane_state->fb))
11488 return true;
11489 }
11490
11491 return false;
11492 }
Fixes: 4caacd1671 ("drm/amd/display: Do not elevate mem_type change to full update")
Cc: Leo Li <sunpeng.li@amd.com>
Cc: Tom Chung <chiahsuan.chung@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Roman Li <roman.li@amd.com>
Cc: Alex Hung <alex.hung@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Hamza Mahfooz <hamza.mahfooz@amd.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Roman Li <roman.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cb5fab2afd ]
When running igt@gem_exec_balancer@individual for multiple iterations,
it is seen that the delta busyness returned by PMU is 0. The issue stems
from a combination of 2 implementation specific details:
1) gt_park is throttling __update_guc_busyness_stats() so that it does
not hog PCI bandwidth for some use cases. (Ref: 59bcdb564b)
2) busyness implementation always returns monotonically increasing
counters. (Ref: cf907f6d29)
If an application queried an engine while it was active,
engine->stats.guc.running is set to true. Following that, if all PM
wakeref's are released, then gt is parked. At this time the throttling
of __update_guc_busyness_stats() may result in a missed update to the
running state of the engine (due to (1) above). This means subsequent
calls to guc_engine_busyness() will think that the engine is still
running and they will keep updating the cached counter (stats->total).
This results in an inflated cached counter.
Later when the application runs a workload and queries for busyness, we
return the cached value since it is larger than the actual value (due to
(2) above)
All subsequent queries will return the same large (inflated) value, so
the application sees a delta busyness of zero.
Fix the issue by resetting the running state of engines each time
intel_guc_busyness_park() is called.
v2: (Rodrigo)
- Use the correct tag in commit message
- Drop the redundant wakeref check in guc_engine_busyness() and update
commit message
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13366
Fixes: cf907f6d29 ("i915/guc: Ensure busyness counter increases motonically")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250123193839.2394694-1-umesh.nerlige.ramappa@intel.com
(cherry picked from commit 431b742e2b)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9e28059d56 ]
This patch addresses an issue where some files in case-insensitive
directories become inaccessible due to changes in how the kernel
function, utf8_casefold(), generates case-folded strings from the
commit 5c26d2f1d3 ("unicode: Don't special case ignorable code
points").
There are good reasons why this change should be made; it's actually
quite stupid that Unicode seems to think that the characters ❤ and ❤️
should be casefolded. Unfortimately because of the backwards
compatibility issue, this commit was reverted in 231825b2e1.
This problem is addressed by instituting a brute-force linear fallback
if a lookup fails on case-folded directory, which does result in a
performance hit when looking up files affected by the changing how
thekernel treats ignorable Uniode characters, or when attempting to
look up non-existent file names. So this fallback can be disabled by
setting an encoding flag if in the future, the system administrator or
the manufacturer of a mobile handset or tablet can be sure that there
was no opportunity for a kernel to insert file names with incompatible
encodings.
Fixes: 5c26d2f1d3 ("unicode: Don't special case ignorable code points")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 288ff0d10b ]
nvme_submit_cmds() should check the rqlist before calling
nvme_write_sq_db(); if the list is empty, it must return immediately.
Fixes: beadf00885 ("nvme-pci: reverse request order in nvme_queue_rqs")
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit aef7ee7649 ]
Offset into the page should also be considered while calculating a physical
address for struct dma_debug_entry. page_to_phys() just shifts the value
PAGE_SHIFT bits to the left so offset part is zero-filled.
An example (wrong) debug assertion failure with CONFIG_DMA_API_DEBUG
enabled which is observed during systemd boot process after recent
dma-debug changes:
DMA-API: e1000 0000:00:03.0: cacheline tracking EEXIST, overlapping mappings aren't supported
WARNING: CPU: 4 PID: 941 at kernel/dma/debug.c:596 add_dma_entry
CPU: 4 UID: 0 PID: 941 Comm: ip Not tainted 6.12.0+ #288
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:add_dma_entry kernel/dma/debug.c:596
Call Trace:
<TASK>
debug_dma_map_page kernel/dma/debug.c:1236
dma_map_page_attrs kernel/dma/mapping.c:179
e1000_alloc_rx_buffers drivers/net/ethernet/intel/e1000/e1000_main.c:4616
...
Found by Linux Verification Center (linuxtesting.org).
Fixes: 9d4f645a1f ("dma-debug: store a phys_addr_t in struct dma_debug_entry")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
[hch: added a little helper to clean up the code]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit f2d2f9598e upstream.
Introduce and use {pgd,p4d}_populate_kernel() in core MM code when
populating PGD and P4D entries for the kernel address space. These
helpers ensure proper synchronization of page tables when updating the
kernel portion of top-level page tables.
Until now, the kernel has relied on each architecture to handle
synchronization of top-level page tables in an ad-hoc manner. For
example, see commit 9b861528a8 ("x86-64, mem: Update all PGDs for direct
mapping and vmemmap mapping changes").
However, this approach has proven fragile for following reasons:
1) It is easy to forget to perform the necessary page table
synchronization when introducing new changes.
For instance, commit 4917f55b4e ("mm/sparse-vmemmap: improve memory
savings for compound devmaps") overlooked the need to synchronize
page tables for the vmemmap area.
2) It is also easy to overlook that the vmemmap and direct mapping areas
must not be accessed before explicit page table synchronization.
For example, commit 8d400913c2 ("x86/vmemmap: handle unpopulated
sub-pmd ranges")) caused crashes by accessing the vmemmap area
before calling sync_global_pgds().
To address this, as suggested by Dave Hansen, introduce _kernel() variants
of the page table population helpers, which invoke architecture-specific
hooks to properly synchronize page tables. These are introduced in a new
header file, include/linux/pgalloc.h, so they can be called from common
code.
They reuse existing infrastructure for vmalloc and ioremap.
Synchronization requirements are determined by ARCH_PAGE_TABLE_SYNC_MASK,
and the actual synchronization is performed by
arch_sync_kernel_mappings().
This change currently targets only x86_64, so only PGD and P4D level
helpers are introduced. Currently, these helpers are no-ops since no
architecture sets PGTBL_{PGD,P4D}_MODIFIED in ARCH_PAGE_TABLE_SYNC_MASK.
In theory, PUD and PMD level helpers can be added later if needed by other
architectures. For now, 32-bit architectures (x86-32 and arm) only handle
PGTBL_PMD_MODIFIED, so p*d_populate_kernel() will never affect them unless
we introduce a PMD level helper.
[harry.yoo@oracle.com: fix KASAN build error due to p*d_populate_kernel()]
Link: https://lkml.kernel.org/r/20250822020727.202749-1-harry.yoo@oracle.com
Link: https://lkml.kernel.org/r/20250818020206.4517-3-harry.yoo@oracle.com
Fixes: 8d400913c2 ("x86/vmemmap: handle unpopulated sub-pmd ranges")
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Kiryl Shutsemau <kas@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: bibo mao <maobibo@loongson.cn>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Christoph Lameter (Ampere) <cl@gentwo.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Huth <thuth@redhat.com>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Adjust context ]
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1ce840c7a6 upstream.
Currently, when firmware failure occurs during matcher disconnect flow,
the error flow of the function reconnects the matcher back and returns
an error, which continues running the calling function and eventually
frees the matcher that is being disconnected.
This leads to a case where we have a freed matcher on the matchers list,
which in turn leads to use-after-free and eventual crash.
This patch fixes that by not trying to reconnect the matcher back when
some FW command fails during disconnect.
Note that we're dealing here with FW error. We can't overcome this
problem. This might lead to bad steering state (e.g. wrong connection
between matchers), and will also lead to resource leakage, as it is
the case with any other error handling during resource destruction.
However, the goal here is to allow the driver to continue and not crash
the machine with use-after-free error.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Itamar Gozlan <igozlan@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250102181415.1477316-7-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jan Alexander Preissler <akendo@akendo.eu>
Signed-off-by: Sujana Subramaniam <sujana.subramaniam@sap.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7a19afee6f upstream.
Similar to commit 09c6304e38 ("kasan: test: fix compatibility with
FORTIFY_SOURCE") the kernel is panicing in kasan_string().
This is due to the `src` and `ptr` not being hidden from the optimizer
which would disable the runtime fortify string checker.
Call trace:
__fortify_panic+0x10/0x20 (P)
kasan_strings+0x980/0x9b0
kunit_try_run_case+0x68/0x190
kunit_generic_run_threadfn_adapter+0x34/0x68
kthread+0x1c4/0x228
ret_from_fork+0x10/0x20
Code: d503233f a9bf7bfd 910003fd 9424b243 (d4210000)
---[ end trace 0000000000000000 ]---
note: kunit_try_catch[128] exited with irqs disabled
note: kunit_try_catch[128] exited with preempt_count 1
# kasan_strings: try faulted: last
** replaying previous printk message **
# kasan_strings: try faulted: last line seen mm/kasan/kasan_test_c.c:1600
# kasan_strings: internal error occurred preventing test case from running: -4
Link: https://lkml.kernel.org/r/20250801120236.2962642-1-yeoreum.yun@arm.com
Fixes: 73228c7ecc ("KASAN: port KASAN Tests to KUnit")
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 68b6dbf1f4 ]
It can be surprising to the user if DMA functions are only traced on
success. On failure, it can be unclear what the source of the problem
is. Fix this by tracing all functions even when they fail. Cases where
we BUG/WARN are skipped, since those should be sufficiently noisy
already.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Stable-dep-of: 7e2368a217 ("dma-debug: don't enforce dma mapping check on noncoherent allocations")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c4484ab86e ]
In some cases, we use trace_dma_map to trace dma_alloc* functions. This
generally follows dma_debug. However, this does not record all of the
relevant information for allocations, such as GFP flags. Create new
dma_alloc tracepoints for these functions. Note that while
dma_alloc_noncontiguous may allocate discontiguous pages (from the CPU's
point of view), the device will only see one contiguous mapping.
Therefore, we just need to trace dma_addr and size.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Stable-dep-of: 7e2368a217 ("dma-debug: don't enforce dma mapping check on noncoherent allocations")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3afff779a7 ]
In preparation for using these tracepoints in a few more places, trace
the DMA direction as well. For coherent allocations this is always
bidirectional.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Stable-dep-of: 7e2368a217 ("dma-debug: don't enforce dma mapping check on noncoherent allocations")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9d4f645a1f ]
dma-debug goes to great length to split incoming physical addresses into
a PFN and offset to store them in struct dma_debug_entry, just to
recombine those for all meaningful uses. Just store a phys_addr_t
instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Stable-dep-of: 7e2368a217 ("dma-debug: don't enforce dma mapping check on noncoherent allocations")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bb585591eb ]
Commit 620c266f39 ("fhandle: relax open_by_handle_at() permission
checks") relaxed the coditions for decoding a file handle from non init
userns.
The conditions are that that decoded dentry is accessible from the user
provided mountfd (or to fs root) and that all the ancestors along the
path have a valid id mapping in the userns.
These conditions are intentionally more strict than the condition that
the decoded dentry should be "lookable" by path from the mountfd.
For example, the path /home/amir/dir/subdir is lookable by path from
unpriv userns of user amir, because /home perms is 755, but the owner of
/home does not have a valid id mapping in unpriv userns of user amir.
The current code did not check that the decoded dentry itself has a
valid id mapping in the userns. There is no security risk in that,
because that final open still performs the needed permission checks,
but this is inconsistent with the checks performed on the ancestors,
so the behavior can be a bit confusing.
Add the check for the decoded dentry itself, so that the entire path,
including the last component has a valid id mapping in the userns.
Fixes: 620c266f39 ("fhandle: relax open_by_handle_at() permission checks")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/20250827194309.1259650-1-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
To ensure we don't get a regression on handling reflect_[xy]
in the middle of the command line string, add a test for it.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The command line parser was looking for an "=" before the ","
separator.
If the command line had reflect_[xy] before another option
which took a parameter, it would find the "=" from the second
option and assume it was associated with the reflect option,
generally leading to a parsing failure.
Handle this case by looking for both "," and "=", and taking
the first instance.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
All the matrix entries for the YUV to RGB conversion matrices were
being filled with the same coefficients.
Compute the values for the BT601, BT709, and BT2020 matrices in
both full and limited range, and program those into the hardware.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Upstream bcm2712 support added/split out the inbound window for MIP1 into
a separate range. For the pcie-32bit-dma overlay to work, both the MIP
and RC ranges need to agree.
Shift the MIP window to the top 4K page below 4GB.
Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
Use native arm64 runners to speed up build process. Cross compile is
still used for arm targets, but also benefit from the arm64 runner
architecture. Overall build time will be reduced by 25 to 30 minutes by
this.
Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
Commit b7cc988723 upstream.
Cross-thread attacks are generally harder as they require the victim to be
co-located on a core. However, with VMSCAPE the adversary targets belong to
the same guest execution, that are more likely to get co-located. In
particular, a thread that is currently executing userspace hypervisor
(after the IBPB) may still be targeted by a guest execution from a sibling
thread.
Issue a warning about the potential risk, except when:
- SMT is disabled
- STIBP is enabled system-wide
- Intel eIBRS is enabled (which implies STIBP protection)
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 6449f5baf9 upstream.
cpu_bugs_smt_update() uses global variables from different mitigations. For
SMT updates it can't currently use vmscape_mitigation that is defined after
it.
Since cpu_bugs_smt_update() depends on many other mitigations, move it
after all mitigations are defined. With that, it can use vmscape_mitigation
in a moment.
No functional change.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 2f8f173413 upstream.
VMSCAPE is a vulnerability that exploits insufficient branch predictor
isolation between a guest and a userspace hypervisor (like QEMU). Existing
mitigations already protect kernel/KVM from a malicious guest. Userspace
can additionally be protected by flushing the branch predictors after a
VMexit.
Since it is the userspace that consumes the poisoned branch predictors,
conditionally issue an IBPB after a VMexit and before returning to
userspace. Workloads that frequently switch between hypervisor and
userspace will incur the most overhead from the new IBPB.
This new IBPB is not integrated with the existing IBPB sites. For
instance, a task can use the existing speculation control prctl() to
get an IBPB at context switch time. With this implementation, the
IBPB is doubled up: one at context switch and another before running
userspace.
The intent is to integrate and optimize these cases post-embargo.
[ dhansen: elaborate on suboptimal IBPB solution ]
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit a508cec6e5 upstream.
The VMSCAPE vulnerability may allow a guest to cause Branch Target
Injection (BTI) in userspace hypervisors.
Kernels (both host and guest) have existing defenses against direct BTI
attacks from guests. There are also inter-process BTI mitigations which
prevent processes from attacking each other. However, the threat in this
case is to a userspace hypervisor within the same process as the attacker.
Userspace hypervisors have access to their own sensitive data like disk
encryption keys and also typically have access to all guest data. This
means guest userspace may use the hypervisor as a confused deputy to attack
sensitive guest kernel data. There are no existing mitigations for these
attacks.
Introduce X86_BUG_VMSCAPE for this vulnerability and set it on affected
Intel and AMD CPUs.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 41f9049cff upstream.
When building with CONFIG_CMODEL_MEDLOW and CONFIG_LTO_CLANG, there is a
series of errors due to some files being unconditionally compiled with
'-mcmodel=medany', mismatching with the rest of the kernel built with
'-mcmodel=medlow':
ld.lld: error: Function Import: link error: linking module flags 'Code Model': IDs have conflicting values: 'i32 3' from vmlinux.a(init.o at 899908), and 'i32 1' from vmlinux.a(net-traces.o at 1014628)
Only allow LTO to be performed when CONFIG_CMODEL_MEDANY is enabled to
ensure there will be no code model mismatch errors. An alternative
solution would be disabling LTO for the files with a different code
model than the main kernel like some specialized areas of the kernel do
but doing that for individual files is not as sustainable than
forbidding the combination altogether.
Cc: stable@vger.kernel.org
Fixes: 021d23428b ("RISC-V: build: Allow LTO to be selected")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202506290255.KBVM83vZ-lkp@intel.com/
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20250710-riscv-restrict-lto-to-medany-v1-1-b1dac9871ecf@kernel.org
Signed-off-by: Paul Walmsley <pjw@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7202082b7b ]
In md_do_sync(), when md_sync_action returns ACTION_FROZEN, subsequent
call to md_sync_position() will return MaxSector. This causes
'curr_resync' (and later 'recovery_offset') to be set to MaxSector too,
which incorrectly signals that recovery/resync has completed, even though
disk data has not actually been updated.
To fix this issue, skip updating any offset values when the sync action
is FROZEN. The same holds true for IDLE.
Fixes: 7d9f107a4e ("md: use new helpers in md_do_sync()")
Signed-off-by: Li Nan <linan122@huawei.com>
Link: https://lore.kernel.org/linux-raid/20250904073452.3408516-1-linan666@huaweicloud.com
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 467e00b30d ]
Currently the kzalloc failure check just sets reports the failure
and sets the variable ret to -ENOMEM, which is not checked later
for this specific error. Fix this by just returning -ENOMEM rather
than setting ret.
Fixes: 4fb9307154 ("drm/amd/amdgpu: remove redundant host to psp cmd buf allocations")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1ee9d1a096)
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 01be43f2a0 ]
In get_bpf_prog_info_linear two calls to bpf_obj_get_info_by_fd are
made, the first to compute memory requirements for a struct perf_bpil
and the second to fill it in. Previously the code would warn when the
second call didn't match the first. Such races can be common place in
things like perf test, whose perf trace tests will frequently load BPF
programs. Rather than a debug message, return actual errors for this
case. Out of paranoia also validate the read bpf_prog_info array
value. Change the type of ptr to avoid mismatched pointer type
compiler warnings. Add some additional debug print outs and sanity
asserts.
Closes: https://lore.kernel.org/lkml/CAP-5=fWJQcmUOP7MuCA2ihKnDAHUCOBLkQFEkQES-1ZZTrgf8Q@mail.gmail.com/
Fixes: 6ac22d036f ("perf bpf: Pull in bpf_program__get_prog_info_linear()")
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250902181713.309797-4-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d7b67dd6f9 ]
Calls to perf_env__insert_bpf_prog_info may fail as a sideband thread
may already have inserted the bpf_prog_info. Such failures may yield
info_linear being freed which then causes use-after-free issues with
the internal bpf_prog_info info struct. Make it so that
perf_env__insert_bpf_prog_info trigger early non-error paths and fix
the use-after-free in perf_event__synthesize_one_bpf_prog. Add proper
return error handling to perf_env__add_bpf_info (that calls
perf_env__insert_bpf_prog_info) and propagate the return value in its
callers.
Closes: https://lore.kernel.org/lkml/CAP-5=fWJQcmUOP7MuCA2ihKnDAHUCOBLkQFEkQES-1ZZTrgf8Q@mail.gmail.com/
Fixes: 03edb7020b ("perf bpf: Fix two memory leakages when calling perf_env__insert_bpf_prog_info()")
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250902181713.309797-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bdd5a14e66 ]
The bridge has three bootstrap pins which are sampled to determine the
frequency of the external reference clock. The driver will also
(over)write that setting. But it seems this is racy after the bridge is
enabled. It was observed that although the driver write the correct
value (by sniffing on the I2C bus), the register has the wrong value.
The datasheet states that the GPIO lines have to be stable for at least
5us after asserting the EN signal. Thus, there seems to be some logic
which samples the GPIO lines and this logic appears to overwrite the
register value which was set by the driver. Waiting 20us after
asserting the EN line resolves this issue.
Fixes: a095f15c00 ("drm/bridge: add support for sn65dsi86 bridge driver")
Signed-off-by: Michael Walle <mwalle@kernel.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20250821122341.1257286-1-mwalle@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dedf9c93de ]
Clear the error flags after disabling the module to avoid the case when
a flag is set again between flag clear and module disable. And use
SR_CLEAR_MASK to replace hardcoded value for improved readability.
Although fsl_lpspi_reset() was only introduced in commit a15dc3d657
("spi: lpspi: Fix CLK pin becomes low before one transfer"), the
original driver only reset SR in the interrupt handler, making it
vulnerable to the same issue. Therefore the fixes commit is set at the
introduction of the driver.
Fixes: 5314987de5 ("spi: imx: add lpspi bus driver")
Signed-off-by: Larisa Grigore <larisa.grigore@nxp.com>
Signed-off-by: Ciprian Marian Costea <ciprianmarian.costea@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Link: https://patch.msgid.link/20250828-james-nxp-lpspi-v2-4-6262b9aa9be4@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 782a7c7307 ]
Commit 6a13044849 ("spi: lpspi: Fix wrong transmission when don't use
CONT") breaks transmissions when CONT is used. The TDIE interrupt should
not be disabled in all cases. If CONT is used and the TX transfer is not
yet completed yet, but the interrupt handler is called because there are
characters to be received, TDIE is replaced with FCIE. When the transfer
is finally completed, SR_TDF is set but the interrupt handler isn't
called again.
Fixes: 6a13044849 ("spi: lpspi: Fix wrong transmission when don't use CONT")
Signed-off-by: Larisa Grigore <larisa.grigore@nxp.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20250828-james-nxp-lpspi-v2-1-6262b9aa9be4@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1180c79fbf ]
The fans controlled by the driver can get stuck at 0 RPM if they are
configured below a 20% duty cycle. The driver tries to avoid this by
enforcing a minimum duty cycle of 20%, but this is done after the fans
are registered with the thermal subsystem. This is too late as the
thermal subsystem can set their current state before the driver is able
to enforce the minimum duty cycle.
Fix by setting the minimum duty cycle before registering the fans with
the thermal subsystem.
Fixes: d7efb2ebc7 ("hwmon: (mlxreg-fan) Extend driver to support multiply cooling devices")
Reported-by: Nikolay Aleksandrov <razor@blackwall.org>
Tested-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Link: https://lore.kernel.org/r/20250730201715.1111133-1-vadimp@nvidia.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit aa28991fd5 ]
Currently, tpmi_get_logical_id() calls topology_physical_package_id()
to set the pkg_id of the info structure. Since some VM hosts assign non
contiguous package IDs, topology_physical_package_id() can return a
larger value than topology_max_packages(). This will result in an
invalid reference into tpmi_power_domain_mask[] as that is allocatead
based on topology_max_packages() as the maximum package ID.
Fixes: 17ca278045 ("platform/x86/intel: TPMI domain id and CPU mapping")
Signed-off-by: David Arcari <darcari@redhat.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20250829113859.1772827-1-darcari@redhat.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cf3940ac73 ]
Currently, the ignore_key_wlan quirk applies to keycodes 0x5D, 0x5E, and
0x5F. However, the relevant code for the Asus Zenbook Duo is only 0x5F.
Since this code is emitted by other Asus devices, such as from the Z13
for its ROG button, remove the extra codes before expanding the quirk.
For the Duo devices, which are the only ones that use this quirk, there
should be no effect.
Fixes: 9286dfd573 ("platform/x86: asus-wmi: Fix spurious rfkill on UX8406MA")
Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
Link: https://lore.kernel.org/r/20250808154710.8981-1-lkml@antheas.dev
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4a81f78caa ]
In the do_validate_mem(), the call to add_interval() does not
handle errors. If kmalloc() fails in add_interval(), it could
result in a null pointer being inserted into the linked list,
leading to illegal memory access when sub_interval() is called
next.
This patch adds an error handling for the add_interval(). If
add_interval() returns an error, the function will return early
with the error code.
Fixes: 7b4884ca88 ("pcmcia: validate late-added resources")
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
This reverts commit 71598a5a77 which is
commit 1f02f2044b upstream.
This commit introduced a regression, however the fix for the regression:
aa5fc4362f ("drm/amdgpu: fix task hang from failed job submission
during process kill") depends on things not yet present in 6.12.y and
older kernels. Since this commit is more of an optimization, just
revert it for 6.12.y and older stable kernels.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x - 6.12.x
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 157ae5ffd7 ]
Fix a potential deadlock bug. Observe that in the mtk-cqdma.c
file, functions like mtk_cqdma_issue_pending() and
mtk_cqdma_free_active_desc() properly acquire the pc lock before the vc
lock when handling pc and vc fields. However, mtk_cqdma_tx_status()
violates this order by first acquiring the vc lock before invoking
mtk_cqdma_find_active_desc(), which subsequently takes the pc lock. This
reversed locking sequence (vc → pc) contradicts the established
pc → vc order and creates deadlock risks.
Fix the issue by moving the vc lock acquisition code from
mtk_cqdma_find_active_desc() to mtk_cqdma_tx_status(). Ensure the pc lock
is acquired before the vc lock in the calling function to maintain correct
locking hierarchy. Note that since mtk_cqdma_find_active_desc() is a
static function with only one caller (mtk_cqdma_tx_status()), this
modification safely eliminates the deadlock possibility without affecting
other components.
This possible bug is found by an experimental static analysis tool
developed by our team. This tool analyzes the locking APIs to extract
function pairs that can be concurrently executed, and then analyzes the
instructions in the paired functions to identify possible concurrency bugs
including deadlocks, data races and atomicity violations.
Fixes: b1f01e48df ("dmaengine: mediatek: Add MediaTek Command-Queue DMA controller for MT6765 SoC")
Cc: stable@vger.kernel.org
Signed-off-by: Qiu-ji Chen <chenqiuji666@gmail.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20250508073634.3719-1-chenqiuji666@gmail.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Similar to commit 09c6304e38 ("kasan: test: fix compatibility with
FORTIFY_SOURCE") the kernel is panicing in kasan_string().
This is due to the `src` and `ptr` not being hidden from the optimizer
which would disable the runtime fortify string checker.
Call trace:
__fortify_panic+0x10/0x20 (P)
kasan_strings+0x980/0x9b0
kunit_try_run_case+0x68/0x190
kunit_generic_run_threadfn_adapter+0x34/0x68
kthread+0x1c4/0x228
ret_from_fork+0x10/0x20
Code: d503233f a9bf7bfd 910003fd 9424b243 (d4210000)
---[ end trace 0000000000000000 ]---
note: kunit_try_catch[128] exited with irqs disabled
note: kunit_try_catch[128] exited with preempt_count 1
# kasan_strings: try faulted: last
** replaying previous printk message **
# kasan_strings: try faulted: last line seen mm/kasan/kasan_test_c.c:1600
# kasan_strings: internal error occurred preventing test case from running: -4
Link: https://lkml.kernel.org/r/20250801120236.2962642-1-yeoreum.yun@arm.com
Fixes: 73228c7ecc ("KASAN: port KASAN Tests to KUnit")
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: <stable@vger.kernel.org>
[ One less test in older trees ]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0ef5c4e4db ]
Nouveau has code that when it gets an IRQ with no allowed handler
it disables it to avoid storms.
However with nonstall interrupts, we often disable them from
the drm driver, but still request their emission via the push submission.
Just don't disable nonstall irqs ever in normal operation, the
event handling code will filter them out, and the driver will
just enable/disable them at load time.
This fixes timeouts we've been seeing on/off for a long time,
but they became a lot more noticeable on Blackwell.
This doesn't fix all of them, there is a subsequent fence emission
fix to fix the last few.
Fixes: 3ebd64aa3c ("drm/nouveau/intr: support multiple trees, and explicit interfaces")
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://lore.kernel.org/r/20250829021633.1674524-1-airlied@gmail.com
[ Fix a typo and a minor checkpatch.pl warning; remove "v2" from commit
subject. - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
[ Apply to drivers/gpu/drm/nouveau/nvkm/engine/fifo/r535.c ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b4efccec8d ]
object_err() reports details of an object for further debugging, such as
the freelist pointer, redzone, etc. However, if the pointer is invalid,
attempting to access object metadata can lead to a crash since it does
not point to a valid object.
One known path to the crash is when alloc_consistency_checks()
determines the pointer to the allocated object is invalid because of a
freelist corruption, and calls object_err() to report it. The debug code
should report and handle the corruption gracefully and not crash in the
process.
In case the pointer is NULL or check_valid_pointer() returns false for
the pointer, only print the pointer value and skip accessing metadata.
Fixes: 81819f0fc8 ("SLUB core")
Cc: <stable@vger.kernel.org>
Signed-off-by: Li Qiong <liqiong@nfschina.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4b183dd935 ]
slab_err() has variadic printf arguments but instead of passing them to
slab_bug() it does vsnprintf() to a buffer and passes %s, buf.
To allow passing them directly, turn slab_bug() to __slab_bug() with a
va_list parameter, and slab_bug() a wrapper with fmt, ... parameters.
Then slab_err() can call __slab_bug() without the intermediate buffer.
Also constify fmt everywhere, which also simplifies object_err()'s
call to slab_bug().
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Stable-dep-of: b4efccec8d ("mm/slub: avoid accessing metadata when pointer is invalid in object_err()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3f6f32b14a ]
If a slab object is corrupted or an error occurs in its internal
validation, continuing after restoration may cause other side effects.
At this point, it is difficult to debug because the problem occurred in
the past. It is useful to use WARN() to catch errors at the point of
issue because WARN() could trigger panic for system debugging when
panic_on_warn is enabled. WARN() is added where to detect the error on
slab_err and object_err.
It makes sense to only do the WARN() after printing the logs. slab_err
is splited to __slab_err that calls the WARN() and it is called after
printing logs.
Signed-off-by: Hyesoo Yu <hyesoo.yu@samsung.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Stable-dep-of: b4efccec8d ("mm/slub: avoid accessing metadata when pointer is invalid in object_err()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ed5ec2e952 ]
Previously, the restore occurred after printing the object in slub.
After commit 47d911b02c ("slab: make check_object() more consistent"),
the bytes are printed after the restore. This information about the bytes
before the restore is highly valuable for debugging purpose.
For instance, in a event of cache issue, it displays byte patterns
by breaking them down into 64-bytes units. Without this information,
we can only speculate on how it was broken. Hence the corrupted regions
should be printed prior to the restoration process. However if an object
breaks in multiple places, the same log may be output multiple times.
Therefore the slub log is reported only once to prevent redundant printing,
by sending a parameter indicating whether an error has occurred previously.
Signed-off-by: Hyesoo Yu <hyesoo.yu@samsung.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Stable-dep-of: b4efccec8d ("mm/slub: avoid accessing metadata when pointer is invalid in object_err()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6130825f34 upstream.
In clustermd, separate write-intent-bitmaps are used for each cluster
node:
0 4k 8k 12k
-------------------------------------------------------------------
| idle | md super | bm super [0] + bits |
| bm bits[0, contd] | bm super[1] + bits | bm bits[1, contd] |
| bm super[2] + bits | bm bits [2, contd] | bm super[3] + bits |
| bm bits [3, contd] | | |
So in node 1, pg_index in __write_sb_page() could equal to
bitmap->storage.file_pages. Then bitmap_limit will be calculated to
0. md_super_write() will be called with 0 size.
That means the first 4k sb area of node 1 will never be updated
through filemap_write_page().
This bug causes hang of mdadm/clustermd_tests/01r1_Grow_resize.
Here use (pg_index % bitmap->storage.file_pages) to make calculation
of bitmap_limit correct.
Fixes: ab99a87542 ("md/md-bitmap: fix writing non bitmap pages")
Signed-off-by: Su Yue <glass.su@suse.com>
Reviewed-by: Heming Zhao <heming.zhao@suse.com>
Link: https://lore.kernel.org/linux-raid/20250303033918.32136-1-glass.su@suse.com
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0032c99e83 upstream.
When delete l3s ipvlan:
ip link del link eth0 ipvlan1 type ipvlan mode l3s
This may cause a null pointer dereference:
Call trace:
ip_rcv_finish+0x48/0xd0
ip_rcv+0x5c/0x100
__netif_receive_skb_one_core+0x64/0xb0
__netif_receive_skb+0x20/0x80
process_backlog+0xb4/0x204
napi_poll+0xe8/0x294
net_rx_action+0xd8/0x22c
__do_softirq+0x12c/0x354
This is because l3mdev_l3_rcv() visit dev->l3mdev_ops after
ipvlan_l3s_unregister() assign the dev->l3mdev_ops to NULL. The process
like this:
(CPU1) | (CPU2)
l3mdev_l3_rcv() |
check dev->priv_flags: |
master = skb->dev; |
|
| ipvlan_l3s_unregister()
| set dev->priv_flags
| dev->l3mdev_ops = NULL;
|
visit master->l3mdev_ops |
To avoid this by do not set dev->l3mdev_ops when unregister l3s ipvlan.
Suggested-by: David Ahern <dsahern@kernel.org>
Fixes: c675e06a98 ("ipvlan: decouple l3s mode dependencies from other modes")
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250321090353.1170545-1-wangliang74@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 02aae8e2f9 upstream.
With previous patch "wifi: ath11k: move update channel list from update
reg worker to reg notifier", ath11k_reg_update_chan_list() will be
called during reg_process_self_managed_hint().
reg_process_self_managed_hint() will hold rtnl_lock all the time.
But ath11k_reg_update_chan_list() may increase the occupation time of
rtnl_lock, because when wait flag is set, wait_for_completion_timeout()
will be called during 11d/hw scan.
Should minimize the occupation time of rtnl_lock as much as possible
to avoid interfering with rest of the system. So move the update channel
list operation to a new worker, so that wait_for_completion_timeout()
won't be called and will not increase the occupation time of rtnl_lock.
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3
Signed-off-by: Wen Gong <quic_wgong@quicinc.com>
Co-developed-by: Kang Yang <quic_kangyang@quicinc.com>
Signed-off-by: Kang Yang <quic_kangyang@quicinc.com>
Reviewed-by: Aditya Kumar Singh <quic_adisi@quicinc.com>
Link: https://patch.msgid.link/20250117061737.1921-3-quic_kangyang@quicinc.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 933ab187e6 upstream.
Currently when ath11k gets a new channel list, it will be processed
according to the following steps:
1. update new channel list to cfg80211 and queue reg_work.
2. cfg80211 handles new channel list during reg_work.
3. update cfg80211's handled channel list to firmware by
ath11k_reg_update_chan_list().
But ath11k will immediately execute step 3 after reg_work is just
queued. Since step 2 is asynchronous, cfg80211 may not have completed
handling the new channel list, which may leading to an out-of-bounds
write error:
BUG: KASAN: slab-out-of-bounds in ath11k_reg_update_chan_list
Call Trace:
ath11k_reg_update_chan_list+0xbfe/0xfe0 [ath11k]
kfree+0x109/0x3a0
ath11k_regd_update+0x1cf/0x350 [ath11k]
ath11k_regd_update_work+0x14/0x20 [ath11k]
process_one_work+0xe35/0x14c0
Should ensure step 2 is completely done before executing step 3. Thus
Wen raised patch[1]. When flag NL80211_REGDOM_SET_BY_DRIVER is set,
cfg80211 will notify ath11k after step 2 is done.
So enable the flag NL80211_REGDOM_SET_BY_DRIVER then cfg80211 will
notify ath11k after step 2 is done. At this time, there will be no
KASAN bug during the execution of the step 3.
[1] https://patchwork.kernel.org/project/linux-wireless/patch/20230201065313.27203-1-quic_wgong@quicinc.com/
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3
Fixes: f45cb6b29c ("wifi: ath11k: avoid deadlock during regulatory update in ath11k_regd_update()")
Signed-off-by: Wen Gong <quic_wgong@quicinc.com>
Signed-off-by: Kang Yang <quic_kangyang@quicinc.com>
Reviewed-by: Aditya Kumar Singh <quic_adisi@quicinc.com>
Link: https://patch.msgid.link/20250117061737.1921-2-quic_kangyang@quicinc.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ce2f26e737 upstream.
Presently we always BUG_ON if trying to start a transaction on a journal marked
with JBD2_UNMOUNT, since this should never happen. However, while ltp running
stress tests, it was observed that in case of some error handling paths, it is
possible for update_super_work to start a transaction after the journal is
destroyed eg:
(umount)
ext4_kill_sb
kill_block_super
generic_shutdown_super
sync_filesystem /* commits all txns */
evict_inodes
/* might start a new txn */
ext4_put_super
flush_work(&sbi->s_sb_upd_work) /* flush the workqueue */
jbd2_journal_destroy
journal_kill_thread
journal->j_flags |= JBD2_UNMOUNT;
jbd2_journal_commit_transaction
jbd2_journal_get_descriptor_buffer
jbd2_journal_bmap
ext4_journal_bmap
ext4_map_blocks
...
ext4_inode_error
ext4_handle_error
schedule_work(&sbi->s_sb_upd_work)
/* work queue kicks in */
update_super_work
jbd2_journal_start
start_this_handle
BUG_ON(journal->j_flags &
JBD2_UNMOUNT)
Hence, introduce a new mount flag to indicate journal is destroying and only do
a journaled (and deferred) update of sb if this flag is not set. Otherwise, just
fallback to an un-journaled commit.
Further, in the journal destroy path, we have the following sequence:
1. Set mount flag indicating journal is destroying
2. force a commit and wait for it
3. flush pending sb updates
This sequence is important as it ensures that, after this point, there is no sb
update that might be journaled so it is safe to update the sb outside the
journal. (To avoid race discussed in 2d01ddc866)
Also, we don't need a similar check in ext4_grp_locked_error since it is only
called from mballoc and AFAICT it would be always valid to schedule work here.
Fixes: 2d01ddc866 ("ext4: save error info to sb through journal if available")
Reported-by: Mahesh Kumar <maheshkumar657g@gmail.com>
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/9613c465d6ff00cd315602f99283d5f24018c3f7.1742279837.git.ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e879a0d9cb upstream.
If blk-wbt is enabled by default, it's found that raid write performance
is quite bad because all IO are throttled by wbt of underlying disks,
due to flag REQ_IDLE is ignored. And turns out this behaviour exist since
blk-wbt is introduced.
Other than REQ_IDLE, other flags should not be ignored as well, for
example REQ_META can be set for filesystems, clearing it can cause priority
reverse problems; And REQ_NOWAIT should not be cleared as well, because
io will wait instead of failing directly in underlying disks.
Fix those problems by keep IO flags from master bio.
Fises: f51d46d0e7 ("md: add support for REQ_NOWAIT")
Fixes: e34cbd3074 ("blk-wbt: add general throttling mechanism")
Fixes: 5404bc7a87 ("[PATCH] Allow file systems to differentiate between data and meta reads")
Link: https://lore.kernel.org/linux-raid/20250227121657.832356-1-yukuai1@huaweicloud.com
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
[ Harshit: Resolve conflicts due to missing commit: f2a38abf5f
("md/raid1: Atomic write support") and commit: a1d9b4fd42
("md/raid10: Atomic write support") in 6.12.y, we don't have Atomic
writes feature in 6.12.y ]
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1f282cdc1d upstream.
may_decode_fh() is calling has_locked_children() while holding no locks.
That's an oopsable race...
The rest of the callers are safe since they are holding namespace_sem and
are guaranteed a positive refcount on the mount in question.
Rename the current has_locked_children() to __has_locked_children(), make
it static and switch the fs/namespace.c users to it.
Make has_locked_children() a wrapper for __has_locked_children(), calling
the latter under read_seqlock_excl(&mount_lock).
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Fixes: 620c266f39 ("fhandle: relax open_by_handle_at() permission checks")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[ Harshit: Resolved conflicts due to missing commit:
db04662e2f ("fs: allow detached mounts in clone_private_mount()") in
linux-6.12.y ]
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 90fb7db49c upstream.
Fix a possible heap overflow in e1000_set_eeprom function by adding
input validation for the requested length of the change in the EEPROM.
In addition, change the variable type from int to size_t for better
code practices and rearrange declarations to RCT.
Cc: stable@vger.kernel.org
Fixes: bc7f75fa97 ("[E1000E]: New pci-express e1000 driver (currently for ICH9 devices only)")
Co-developed-by: Mikael Wessel <post@mikaelkw.online>
Signed-off-by: Mikael Wessel <post@mikaelkw.online>
Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 70bccd9855 upstream.
There can be a NULL pointer dereference bug here. NULL is passed to
__cifs_sfu_make_node without checks, which passes it unchecked to
cifs_strndup_to_utf16, which in turn passes it to
cifs_local_to_utf16_bytes where '*from' is dereferenced, causing a crash.
This patch adds a check for NULL 'src' in cifs_strndup_to_utf16 and
returns NULL early to prevent dereferencing NULL pointer.
Found by Linux Verification Center (linuxtesting.org) with SVACE
Signed-off-by: Makar Semyonov <m.semenov@tssltd.ru>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d77b6ff0ce upstream.
batadv_nc_skb_decode_packet() trusts coded_len and checks only against
skb->len. XOR starts at sizeof(struct batadv_unicast_packet), reducing
payload headroom, and the source skb length is not verified, allowing an
out-of-bounds read and a small out-of-bounds write.
Validate that coded_len fits within the payload area of both destination
and source sk_buffs before XORing.
Fixes: 2df5278b02 ("batman-adv: network coding - receive coded packets and decode them")
Cc: stable@vger.kernel.org
Reported-by: Stanislav Fort <disclosure@aisle.com>
Signed-off-by: Stanislav Fort <stanislav.fort@aisle.com>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9dba9a45c3 upstream.
Fix a use-after-free window by correcting the buffer release sequence in
the deferred receive path. The code freed the RQ buffer first and only
then cleared the context pointer under the lock. Concurrent paths (e.g.,
ABTS and the repost path) also inspect and release the same pointer under
the lock, so the old order could lead to double-free/UAF.
Note that the repost path already uses the correct pattern: detach the
pointer under the lock, then free it after dropping the lock. The
deferred path should do the same.
Fixes: 472e146d1c ("scsi: lpfc: Correct upcalling nvmet_fc transport during io done downcall")
Cc: stable@vger.kernel.org
Signed-off-by: John Evans <evans1210144@gmail.com>
Link: https://lore.kernel.org/r/20250828044008.743-1-evans1210144@gmail.com
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 87b07a1fbc upstream.
A new warning in clang [1] points out a couple of places where a hdr
variable is not initialized then passed along to skb_put_data().
drivers/net/wireless/mediatek/mt76/mt7996/mcu.c:1894:21: warning: variable 'hdr' is uninitialized when passed as a const pointer argument here [-Wuninitialized-const-pointer]
1894 | skb_put_data(skb, &hdr, sizeof(hdr));
| ^~~
drivers/net/wireless/mediatek/mt76/mt7996/mcu.c:3386:21: warning: variable 'hdr' is uninitialized when passed as a const pointer argument here [-Wuninitialized-const-pointer]
3386 | skb_put_data(skb, &hdr, sizeof(hdr));
| ^~~
Zero initialize these headers as done in other places in the driver when
there is nothing stored in the header.
Cc: stable@vger.kernel.org
Fixes: 98686cd216 ("wifi: mt76: mt7996: add driver for MediaTek Wi-Fi 7 (802.11be) devices")
Link: 00dacf8c22 [1]
Closes: https://github.com/ClangBuiltLinux/linux/issues/2104
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20250715-mt7996-fix-uninit-const-pointer-v1-1-b5d8d11d7b78@kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0e20450829 upstream.
The adapter->chan_stats[] array is initialized in
mwifiex_init_channel_scan_gap() with vmalloc(), which doesn't zero out
memory. The array is filled in mwifiex_update_chan_statistics()
and then the user can query the data in mwifiex_cfg80211_dump_survey().
There are two potential issues here. What if the user calls
mwifiex_cfg80211_dump_survey() before the data has been filled in.
Also the mwifiex_update_chan_statistics() function doesn't necessarily
initialize the whole array. Since the array was not initialized at
the start that could result in an information leak.
Also this array is pretty small. It's a maximum of 900 bytes so it's
more appropriate to use kcalloc() instead vmalloc().
Cc: stable@vger.kernel.org
Fixes: bf35443314 ("mwifiex: channel statistics support for mwifiex")
Suggested-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://patch.msgid.link/20250815023055.477719-1-rongqianfeng@vivo.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 51337a9a3a upstream.
GCC doesn't support "hwasan-kernel-mem-intrinsic-prefix", only
"asan-kernel-mem-intrinsic-prefix"[0], while LLVM supports both. This is
already taken into account when checking
"CONFIG_CC_HAS_KASAN_MEMINTRINSIC_PREFIX", but not in the KASAN Makefile
adding those parameters when "CONFIG_KASAN_SW_TAGS" is enabled.
Replace the version check with "CONFIG_CC_HAS_KASAN_MEMINTRINSIC_PREFIX",
which already validates that mem-intrinsic prefix parameter can be used,
and choose the correct name depending on compiler.
GCC 13 and above trigger "CONFIG_CC_HAS_KASAN_MEMINTRINSIC_PREFIX" which
prevents `mem{cpy,move,set}()` being redefined in "mm/kasan/shadow.c"
since commit 36be5cba99 ("kasan: treat meminstrinsic as builtins in
uninstrumented files"), as we expect the compiler to prefix those calls
with `__(hw)asan_` instead. But as the option passed to GCC has been
incorrect, the compiler has not been emitting those prefixes, effectively
never calling the instrumented versions of `mem{cpy,move,set}()` with
"CONFIG_KASAN_SW_TAGS" enabled.
If "CONFIG_FORTIFY_SOURCES" is enabled, this issue would be mitigated as
it redefines `mem{cpy,move,set}()` and properly aliases the
`__underlying_mem*()` that will be called to the instrumented versions.
Link: https://lkml.kernel.org/r/20250821120735.156244-1-ada.coupriediaz@arm.com
Link: https://gcc.gnu.org/onlinedocs/gcc-13.4.0/gcc/Optimize-Options.html [0]
Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Fixes: 36be5cba99 ("kasan: treat meminstrinsic as builtins in uninstrumented files")
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 850470a841 upstream.
set_track_prepare() can incur lock recursion.
The issue is that it is called from hrtimer_start_range_ns
holding the per_cpu(hrtimer_bases)[n].lock, but when enabled
CONFIG_DEBUG_OBJECTS_TIMERS, may wake up kswapd in set_track_prepare,
and try to hold the per_cpu(hrtimer_bases)[n].lock.
Avoid deadlock caused by implicitly waking up kswapd by passing in
allocation flags, which do not contain __GFP_KSWAPD_RECLAIM in the
debug_objects_fill_pool() case. Inside stack depot they are processed by
gfp_nested_mask().
Since ___slab_alloc() has preemption disabled, we mask out
__GFP_DIRECT_RECLAIM from the flags there.
The oops looks something like:
BUG: spinlock recursion on CPU#3, swapper/3/0
lock: 0xffffff8a4bf29c80, .magic: dead4ead, .owner: swapper/3/0, .owner_cpu: 3
Hardware name: Qualcomm Technologies, Inc. Popsicle based on SM8850 (DT)
Call trace:
spin_bug+0x0
_raw_spin_lock_irqsave+0x80
hrtimer_try_to_cancel+0x94
task_contending+0x10c
enqueue_dl_entity+0x2a4
dl_server_start+0x74
enqueue_task_fair+0x568
enqueue_task+0xac
do_activate_task+0x14c
ttwu_do_activate+0xcc
try_to_wake_up+0x6c8
default_wake_function+0x20
autoremove_wake_function+0x1c
__wake_up+0xac
wakeup_kswapd+0x19c
wake_all_kswapds+0x78
__alloc_pages_slowpath+0x1ac
__alloc_pages_noprof+0x298
stack_depot_save_flags+0x6b0
stack_depot_save+0x14
set_track_prepare+0x5c
___slab_alloc+0xccc
__kmalloc_cache_noprof+0x470
__set_page_owner+0x2bc
post_alloc_hook[jt]+0x1b8
prep_new_page+0x28
get_page_from_freelist+0x1edc
__alloc_pages_noprof+0x13c
alloc_slab_page+0x244
allocate_slab+0x7c
___slab_alloc+0x8e8
kmem_cache_alloc_noprof+0x450
debug_objects_fill_pool+0x22c
debug_object_activate+0x40
enqueue_hrtimer[jt]+0xdc
hrtimer_start_range_ns+0x5f8
...
Signed-off-by: yangshiguang <yangshiguang@xiaomi.com>
Fixes: 5cf909c553 ("mm/slub: use stackdepot to save stack trace in objects")
Cc: stable@vger.kernel.org
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7cc183f2e6 upstream.
During our internal testing, we started observing intermittent boot
failures when the machine uses 4-level paging and has a large amount of
persistent memory:
BUG: unable to handle page fault for address: ffffe70000000034
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: 0002 [#1] SMP NOPTI
RIP: 0010:__init_single_page+0x9/0x6d
Call Trace:
<TASK>
__init_zone_device_page+0x17/0x5d
memmap_init_zone_device+0x154/0x1bb
pagemap_range+0x2e0/0x40f
memremap_pages+0x10b/0x2f0
devm_memremap_pages+0x1e/0x60
dev_dax_probe+0xce/0x2ec [device_dax]
dax_bus_probe+0x6d/0xc9
[... snip ...]
</TASK>
It turns out that the kernel panics while initializing vmemmap (struct
page array) when the vmemmap region spans two PGD entries, because the new
PGD entry is only installed in init_mm.pgd, but not in the page tables of
other tasks.
And looking at __populate_section_memmap():
if (vmemmap_can_optimize(altmap, pgmap))
// does not sync top level page tables
r = vmemmap_populate_compound_pages(pfn, start, end, nid, pgmap);
else
// sync top level page tables in x86
r = vmemmap_populate(start, end, nid, altmap);
In the normal path, vmemmap_populate() in arch/x86/mm/init_64.c
synchronizes the top level page table (See commit 9b861528a8 ("x86-64,
mem: Update all PGDs for direct mapping and vmemmap mapping changes")) so
that all tasks in the system can see the new vmemmap area.
However, when vmemmap_can_optimize() returns true, the optimized path
skips synchronization of top-level page tables. This is because
vmemmap_populate_compound_pages() is implemented in core MM code, which
does not handle synchronization of the top-level page tables. Instead,
the core MM has historically relied on each architecture to perform this
synchronization manually.
We're not the first party to encounter a crash caused by not-sync'd top
level page tables: earlier this year, Gwan-gyeong Mun attempted to address
the issue [1] [2] after hitting a kernel panic when x86 code accessed the
vmemmap area before the corresponding top-level entries were synced. At
that time, the issue was believed to be triggered only when struct page
was enlarged for debugging purposes, and the patch did not get further
updates.
It turns out that current approach of relying on each arch to handle the
page table sync manually is fragile because 1) it's easy to forget to sync
the top level page table, and 2) it's also easy to overlook that the
kernel should not access the vmemmap and direct mapping areas before the
sync.
# The solution: Make page table sync more code robust and harder to miss
To address this, Dave Hansen suggested [3] [4] introducing
{pgd,p4d}_populate_kernel() for updating kernel portion of the page tables
and allow each architecture to explicitly perform synchronization when
installing top-level entries. With this approach, we no longer need to
worry about missing the sync step, reducing the risk of future
regressions.
The new interface reuses existing ARCH_PAGE_TABLE_SYNC_MASK,
PGTBL_P*D_MODIFIED and arch_sync_kernel_mappings() facility used by
vmalloc and ioremap to synchronize page tables.
pgd_populate_kernel() looks like this:
static inline void pgd_populate_kernel(unsigned long addr, pgd_t *pgd,
p4d_t *p4d)
{
pgd_populate(&init_mm, pgd, p4d);
if (ARCH_PAGE_TABLE_SYNC_MASK & PGTBL_PGD_MODIFIED)
arch_sync_kernel_mappings(addr, addr);
}
It is worth noting that vmalloc() and apply_to_range() carefully
synchronizes page tables by calling p*d_alloc_track() and
arch_sync_kernel_mappings(), and thus they are not affected by this patch
series.
This series was hugely inspired by Dave Hansen's suggestion and hence
added Suggested-by: Dave Hansen.
Cc stable because lack of this series opens the door to intermittent
boot failures.
This patch (of 3):
Move ARCH_PAGE_TABLE_SYNC_MASK and arch_sync_kernel_mappings() to
linux/pgtable.h so that they can be used outside of vmalloc and ioremap.
Link: https://lkml.kernel.org/r/20250818020206.4517-1-harry.yoo@oracle.com
Link: https://lkml.kernel.org/r/20250818020206.4517-2-harry.yoo@oracle.com
Link: https://lore.kernel.org/linux-mm/20250220064105.808339-1-gwan-gyeong.mun@intel.com [1]
Link: https://lore.kernel.org/linux-mm/20250311114420.240341-1-gwan-gyeong.mun@intel.com [2]
Link: https://lore.kernel.org/linux-mm/d1da214c-53d3-45ac-a8b6-51821c5416e4@intel.com [3]
Link: https://lore.kernel.org/linux-mm/4d800744-7b88-41aa-9979-b245e8bf794b@intel.com [4]
Fixes: 8d400913c2 ("x86/vmemmap: handle unpopulated sub-pmd ranges")
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Acked-by: Kiryl Shutsemau <kas@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: bibo mao <maobibo@loongson.cn>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Christoph Lameter (Ampere) <cl@gentwo.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9614d8bee6 upstream.
With CONFIG_HIGHPTE on 32-bit ARM, move_pages_pte() maps PTE pages using
kmap_local_page(), which requires unmapping in Last-In-First-Out order.
The current code maps dst_pte first, then src_pte, but unmaps them in the
same order (dst_pte, src_pte), violating the LIFO requirement. This
causes the warning in kunmap_local_indexed():
WARNING: CPU: 0 PID: 604 at mm/highmem.c:622 kunmap_local_indexed+0x178/0x17c
addr \!= __fix_to_virt(FIX_KMAP_BEGIN + idx)
Fix this by reversing the unmap order to respect LIFO ordering.
This issue follows the same pattern as similar fixes:
- commit eca6828403 ("crypto: skcipher - fix mismatch between mapping and unmapping order")
- commit 8cf57c6df8 ("nilfs2: eliminate staggered calls to kunmap in nilfs_rename")
Both of which addressed the same fundamental requirement that kmap_local
operations must follow LIFO ordering.
Link: https://lkml.kernel.org/r/20250731144431.773923-1-sashal@kernel.org
Fixes: adef440691 ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit fc582cd26e upstream.
syzbot reports that defer/local task_work adding via msg_ring can hit
a request that has been freed:
CPU: 1 UID: 0 PID: 19356 Comm: iou-wrk-19354 Not tainted 6.16.0-rc4-syzkaller-00108-g17bbde2e1716 #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025
Call Trace:
<TASK>
dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:408 [inline]
print_report+0xd2/0x2b0 mm/kasan/report.c:521
kasan_report+0x118/0x150 mm/kasan/report.c:634
io_req_local_work_add io_uring/io_uring.c:1184 [inline]
__io_req_task_work_add+0x589/0x950 io_uring/io_uring.c:1252
io_msg_remote_post io_uring/msg_ring.c:103 [inline]
io_msg_data_remote io_uring/msg_ring.c:133 [inline]
__io_msg_ring_data+0x820/0xaa0 io_uring/msg_ring.c:151
io_msg_ring_data io_uring/msg_ring.c:173 [inline]
io_msg_ring+0x134/0xa00 io_uring/msg_ring.c:314
__io_issue_sqe+0x17e/0x4b0 io_uring/io_uring.c:1739
io_issue_sqe+0x165/0xfd0 io_uring/io_uring.c:1762
io_wq_submit_work+0x6e9/0xb90 io_uring/io_uring.c:1874
io_worker_handle_work+0x7cd/0x1180 io_uring/io-wq.c:642
io_wq_worker+0x42f/0xeb0 io_uring/io-wq.c:696
ret_from_fork+0x3fc/0x770 arch/x86/kernel/process.c:148
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>
which is supposed to be safe with how requests are allocated. But msg
ring requests alloc and free on their own, and hence must defer freeing
to a sane time.
Add an rcu_head and use kfree_rcu() in both spots where requests are
freed. Only the one in io_msg_tw_complete() is strictly required as it
has been visible on the other ring, but use it consistently in the other
spot as well.
This should not cause any other issues outside of KASAN rightfully
complaining about it.
Link: https://lore.kernel.org/io-uring/686cd2ea.a00a0220.338033.0007.GAE@google.com/
Reported-by: syzbot+54cbbfb4db9145d26fc2@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Fixes: 0617bb500b ("io_uring/msg_ring: improve handling of target CQE posting")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit fc582cd26e)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 44822df89e upstream.
In __iodyn_find_io_region(), pcmcia_make_resource() is assigned to
res and used in pci_bus_alloc_resource(). There is a dereference of res
in pci_bus_alloc_resource(), which could lead to a NULL pointer
dereference on failure of pcmcia_make_resource().
Fix this bug by adding a check of res.
Cc: stable@vger.kernel.org
Fixes: 49b1153adf ("pcmcia: move all pcmcia_resource_ops providers into one module")
Signed-off-by: Ma Ke <make24@iscas.ac.cn>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a7ed7b9d0e upstream.
On arm64, it has been possible for a module's sections to be placed more
than 128M away from each other since commit:
commit 3e35d303ab ("arm64: module: rework module VA range selection")
Due to this, an ftrace callsite in a module's .init.text section can be
out of branch range for the module's ftrace PLT entry (in the module's
.text section). Any attempt to enable tracing of that callsite will
result in a BRK being patched into the callsite, resulting in a fatal
exception when the callsite is later executed.
Fix this by adding an additional trampoline for .init.text, which will
be within range.
No additional trampolines are necessary due to the way a given
module's executable sections are packed together. Any executable
section beginning with ".init" will be placed in MOD_INIT_TEXT,
and any other executable section, including those beginning with ".exit",
will be placed in MOD_TEXT.
Fixes: 3e35d303ab ("arm64: module: rework module VA range selection")
Cc: <stable@vger.kernel.org> # 6.5.x
Signed-off-by: panfan <panfan@qti.qualcomm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20250905032236.3220885-1-panfan@qti.qualcomm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9b2bfdbf43 ]
When transmitting a PTP frame which is timestamp using 2 step, the
following warning appears if CONFIG_PROVE_LOCKING is enabled:
=============================
[ BUG: Invalid wait context ]
6.17.0-rc1-00326-ge6160462704e #427 Not tainted
-----------------------------
ptp4l/119 is trying to lock:
c2a44ed4 (&vsc8531->ts_lock){+.+.}-{3:3}, at: vsc85xx_txtstamp+0x50/0xac
other info that might help us debug this:
context-{4:4}
4 locks held by ptp4l/119:
#0: c145f068 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x58/0x1440
#1: c29df974 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: __dev_queue_xmit+0x5c4/0x1440
#2: c2aaaad0 (_xmit_ETHER#2){+.-.}-{2:2}, at: sch_direct_xmit+0x108/0x350
#3: c2aac170 (&lan966x->tx_lock){+.-.}-{2:2}, at: lan966x_port_xmit+0xd0/0x350
stack backtrace:
CPU: 0 UID: 0 PID: 119 Comm: ptp4l Not tainted 6.17.0-rc1-00326-ge6160462704e #427 NONE
Hardware name: Generic DT based system
Call trace:
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x7c/0xac
dump_stack_lvl from __lock_acquire+0x8e8/0x29dc
__lock_acquire from lock_acquire+0x108/0x38c
lock_acquire from __mutex_lock+0xb0/0xe78
__mutex_lock from mutex_lock_nested+0x1c/0x24
mutex_lock_nested from vsc85xx_txtstamp+0x50/0xac
vsc85xx_txtstamp from lan966x_fdma_xmit+0xd8/0x3a8
lan966x_fdma_xmit from lan966x_port_xmit+0x1bc/0x350
lan966x_port_xmit from dev_hard_start_xmit+0xc8/0x2c0
dev_hard_start_xmit from sch_direct_xmit+0x8c/0x350
sch_direct_xmit from __dev_queue_xmit+0x680/0x1440
__dev_queue_xmit from packet_sendmsg+0xfa4/0x1568
packet_sendmsg from __sys_sendto+0x110/0x19c
__sys_sendto from sys_send+0x18/0x20
sys_send from ret_fast_syscall+0x0/0x1c
Exception stack(0xf0b05fa8 to 0xf0b05ff0)
5fa0: 00000001 0000000e 0000000e 0004b47a 0000003a 00000000
5fc0: 00000001 0000000e 00000000 00000121 0004af58 00044874 00000000 00000000
5fe0: 00000001 bee9d420 00025a10 b6e75c7c
So, instead of using the ts_lock for tx_queue, use the spinlock that
skb_buff_head has.
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Fixes: 7d272e63e0 ("net: phy: mscc: timestamping and PHC support")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://patch.msgid.link/20250902121259.3257536-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fd2004d82d ]
bind_bhash.c passes (SO_REUSEADDR | SO_REUSEPORT) to setsockopt().
In the asm-generic definition, the value happens to match with the
bare SO_REUSEPORT, (2 | 15) == 15, but not on some arch.
arch/alpha/include/uapi/asm/socket.h:18:#define SO_REUSEADDR 0x0004
arch/alpha/include/uapi/asm/socket.h:24:#define SO_REUSEPORT 0x0200
arch/mips/include/uapi/asm/socket.h:24:#define SO_REUSEADDR 0x0004 /* Allow reuse of local addresses. */
arch/mips/include/uapi/asm/socket.h:33:#define SO_REUSEPORT 0x0200 /* Allow local address and port reuse. */
arch/parisc/include/uapi/asm/socket.h:12:#define SO_REUSEADDR 0x0004
arch/parisc/include/uapi/asm/socket.h:18:#define SO_REUSEPORT 0x0200
arch/sparc/include/uapi/asm/socket.h:13:#define SO_REUSEADDR 0x0004
arch/sparc/include/uapi/asm/socket.h:20:#define SO_REUSEPORT 0x0200
include/uapi/asm-generic/socket.h:12:#define SO_REUSEADDR 2
include/uapi/asm-generic/socket.h:27:#define SO_REUSEPORT 15
Let's pass SO_REUSEPORT only.
Fixes: c35ecb95c4 ("selftests/net: Add test for timing a bind request to a port with a populated bhash entry")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250903222938.2601522-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4844123fe0 ]
If alloc_skb() fails in pad_compress_skb(), it returns NULL without
releasing the old skb. The caller does:
skb = pad_compress_skb(ppp, skb);
if (!skb)
goto drop;
drop:
kfree_skb(skb);
When pad_compress_skb() returns NULL, the reference to the old skb is
lost and kfree_skb(skb) ends up doing nothing, leading to a memory leak.
Align pad_compress_skb() semantics with realloc(): only free the old
skb if allocation and compression succeed. At the call site, use the
new_skb variable so the original skb is not lost when pad_compress_skb()
fails.
Fixes: b3f9b92a6e ("[PPP]: add PPP MPPE encryption module")
Signed-off-by: Qingfang Deng <dqfext@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Yue Haibing <yuehaibing@huawei.com>
Link: https://patch.msgid.link/20250903100726.269839-1-dqfext@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8bbceba7dc ]
Add proper error checking for dmaengine_desc_get_metadata_ptr() which
can return an error pointer and lead to potential crashes or undefined
behaviour if the pointer retrieval fails.
Properly handle the error by unmapping DMA buffer, freeing the skb and
returning early to prevent further processing with invalid data.
Fixes: 6a91b846af ("net: axienet: Introduce dmaengine support")
Signed-off-by: Abin Joseph <abin.joseph@amd.com>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Link: https://patch.msgid.link/20250903025213.3120181-1-abin.joseph@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0a228624bc ]
When device_register() return error in atm_register_sysfs(), which can be
triggered by kzalloc fail in device_private_init() or other reasons,
kmemleak reports the following memory leaks:
unreferenced object 0xffff88810182fb80 (size 8):
comm "insmod", pid 504, jiffies 4294852464
hex dump (first 8 bytes):
61 64 75 6d 6d 79 30 00 adummy0.
backtrace (crc 14dfadaf):
__kmalloc_node_track_caller_noprof+0x335/0x450
kvasprintf+0xb3/0x130
kobject_set_name_vargs+0x45/0x120
dev_set_name+0xa9/0xe0
atm_register_sysfs+0xf3/0x220
atm_dev_register+0x40b/0x780
0xffffffffa000b089
do_one_initcall+0x89/0x300
do_init_module+0x27b/0x7d0
load_module+0x54cd/0x5ff0
init_module_from_file+0xe4/0x150
idempotent_init_module+0x32c/0x610
__x64_sys_finit_module+0xbd/0x120
do_syscall_64+0xa8/0x270
entry_SYSCALL_64_after_hwframe+0x77/0x7f
When device_create_file() return error in atm_register_sysfs(), the same
issue also can be triggered.
Function put_device() should be called to release kobj->name memory and
other device resource, instead of kfree().
Fixes: 1fa5ae857b ("driver core: get rid of struct device's bus_id string array")
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250901063537.1472221-1-wangliang74@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a125c8fb9d ]
In mctp_getsockopt(), unrecognized options currently return -EINVAL.
In contrast, mctp_setsockopt() returns -ENOPROTOOPT for unknown
options.
Update mctp_getsockopt() to also return -ENOPROTOOPT for unknown
options. This aligns the behavior of getsockopt() and setsockopt(),
and matches the standard kernel socket API convention for handling
unsupported options.
Fixes: 99ce45d5e7 ("mctp: Implement extended addressing")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Link: https://patch.msgid.link/20250902102059.1370008-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1f5d2fd1ca ]
When the "proxy" option is enabled on a VXLAN device, the device will
suppress ARP requests and IPv6 Neighbor Solicitation messages if it is
able to reply on behalf of the remote host. That is, if a matching and
valid neighbor entry is configured on the VXLAN device whose MAC address
is not behind the "any" remote (0.0.0.0 / ::).
The code currently assumes that the FDB entry for the neighbor's MAC
address points to a valid remote destination, but this is incorrect if
the entry is associated with an FDB nexthop group. This can result in a
NPD [1][3] which can be reproduced using [2][4].
Fix by checking that the remote destination exists before dereferencing
it.
[1]
BUG: kernel NULL pointer dereference, address: 0000000000000000
[...]
CPU: 4 UID: 0 PID: 365 Comm: arping Not tainted 6.17.0-rc2-virtme-g2a89cb21162c #2 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-4.fc41 04/01/2014
RIP: 0010:vxlan_xmit+0xb58/0x15f0
[...]
Call Trace:
<TASK>
dev_hard_start_xmit+0x5d/0x1c0
__dev_queue_xmit+0x246/0xfd0
packet_sendmsg+0x113a/0x1850
__sock_sendmsg+0x38/0x70
__sys_sendto+0x126/0x180
__x64_sys_sendto+0x24/0x30
do_syscall_64+0xa4/0x260
entry_SYSCALL_64_after_hwframe+0x4b/0x53
[2]
#!/bin/bash
ip address add 192.0.2.1/32 dev lo
ip nexthop add id 1 via 192.0.2.2 fdb
ip nexthop add id 10 group 1 fdb
ip link add name vx0 up type vxlan id 10010 local 192.0.2.1 dstport 4789 proxy
ip neigh add 192.0.2.3 lladdr 00:11:22:33:44:55 nud perm dev vx0
bridge fdb add 00:11:22:33:44:55 dev vx0 self static nhid 10
arping -b -c 1 -s 192.0.2.1 -I vx0 192.0.2.3
[3]
BUG: kernel NULL pointer dereference, address: 0000000000000000
[...]
CPU: 13 UID: 0 PID: 372 Comm: ndisc6 Not tainted 6.17.0-rc2-virtmne-g6ee90cb26014 #3 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1v996), BIOS 1.17.0-4.fc41 04/01/2x014
RIP: 0010:vxlan_xmit+0x803/0x1600
[...]
Call Trace:
<TASK>
dev_hard_start_xmit+0x5d/0x1c0
__dev_queue_xmit+0x246/0xfd0
ip6_finish_output2+0x210/0x6c0
ip6_finish_output+0x1af/0x2b0
ip6_mr_output+0x92/0x3e0
ip6_send_skb+0x30/0x90
rawv6_sendmsg+0xe6e/0x12e0
__sock_sendmsg+0x38/0x70
__sys_sendto+0x126/0x180
__x64_sys_sendto+0x24/0x30
do_syscall_64+0xa4/0x260
entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7f383422ec77
[4]
#!/bin/bash
ip address add 2001:db8:1::1/128 dev lo
ip nexthop add id 1 via 2001:db8:1::1 fdb
ip nexthop add id 10 group 1 fdb
ip link add name vx0 up type vxlan id 10010 local 2001:db8:1::1 dstport 4789 proxy
ip neigh add 2001:db8:1::3 lladdr 00:11:22:33:44:55 nud perm dev vx0
bridge fdb add 00:11:22:33:44:55 dev vx0 self static nhid 10
ndisc6 -r 1 -s 2001:db8:1::1 -w 1 2001:db8:1::3 vx0
Fixes: 1274e1cc42 ("vxlan: ecmp support for mac fdb entries")
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250901065035.159644-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5cde39ea38 ]
vxlan_find_mac() is only expected to be called from the Tx path as it
updates the 'used' timestamp. Rename it to vxlan_find_mac_tx() to
reflect that and to avoid incorrect updates of this timestamp like those
addressed by commit 9722f834fe ("vxlan: Avoid unnecessary updates to
FDB 'used' time").
No functional changes intended.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250415121143.345227-12-idosch@nvidia.com
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Stable-dep-of: 1f5d2fd1ca ("vxlan: Fix NPD in {arp,neigh}_reduce() when using nexthop objects")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 804b09be09 ]
The Tx path does not run from an RCU read-side critical section which
makes the current lockless accesses to FDB entries invalid. As far as I
am aware, this has not been a problem in practice, but traces will be
generated once we transition the FDB lookup to rhashtable_lookup().
Add rcu_read_{lock,unlock}() around the handling of FDB entries in the
Tx path. Remove the RCU read-side critical section from vxlan_xmit_nh()
as now the function is always called from an RCU read-side critical
section.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250415121143.345227-2-idosch@nvidia.com
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Stable-dep-of: 1f5d2fd1ca ("vxlan: Fix NPD in {arp,neigh}_reduce() when using nexthop objects")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 40a9994f2f ]
The 'NTF_USE' flag can be used by user space to refresh FDB entries so
that they will not age out. Currently, the VXLAN driver implements it by
refreshing the 'used' field in the FDB entry as this is the field
according to which FDB entries are aged out.
Subsequent patches will switch the VXLAN driver to age out entries based
on the 'updated' field. Prepare for this change by refreshing the
'updated' field upon 'NTF_USE'. This is consistent with the bridge
driver's FDB:
# ip link add name br1 up type bridge
# ip link add name swp1 master br1 up type dummy
# bridge fdb add 00:11:22:33:44:55 dev swp1 master dynamic vlan 1
# sleep 10
# bridge fdb replace 00:11:22:33:44:55 dev swp1 master dynamic vlan 1
# bridge -s -j fdb get 00:11:22:33:44:55 br br1 vlan 1 | jq '.[]["updated"]'
10
# sleep 10
# bridge fdb replace 00:11:22:33:44:55 dev swp1 master use dynamic vlan 1
# bridge -s -j fdb get 00:11:22:33:44:55 br br1 vlan 1 | jq '.[]["updated"]'
0
Before:
# ip link add name vx1 up type vxlan id 10010 dstport 4789
# bridge fdb add 00:11:22:33:44:55 dev vx1 self dynamic dst 198.51.100.1
# sleep 10
# bridge fdb replace 00:11:22:33:44:55 dev vx1 self dynamic dst 198.51.100.1
# bridge -s -j -p fdb get 00:11:22:33:44:55 br vx1 self | jq '.[]["updated"]'
10
# sleep 10
# bridge fdb replace 00:11:22:33:44:55 dev vx1 self use dynamic dst 198.51.100.1
# bridge -s -j -p fdb get 00:11:22:33:44:55 br vx1 self | jq '.[]["updated"]'
20
After:
# ip link add name vx1 up type vxlan id 10010 dstport 4789
# bridge fdb add 00:11:22:33:44:55 dev vx1 self dynamic dst 198.51.100.1
# sleep 10
# bridge fdb replace 00:11:22:33:44:55 dev vx1 self dynamic dst 198.51.100.1
# bridge -s -j -p fdb get 00:11:22:33:44:55 br vx1 self | jq '.[]["updated"]'
10
# sleep 10
# bridge fdb replace 00:11:22:33:44:55 dev vx1 self use dynamic dst 198.51.100.1
# bridge -s -j -p fdb get 00:11:22:33:44:55 br vx1 self | jq '.[]["updated"]'
0
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250204145549.1216254-5-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: 1f5d2fd1ca ("vxlan: Fix NPD in {arp,neigh}_reduce() when using nexthop objects")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 46e0ccfb88 ]
The SKB_DROP_REASON_VXLAN_NO_REMOTE skb drop reason was introduced in
the specific context of vxlan. As it turns out, there are similar cases
when a packet needs to be dropped in other parts of the network stack,
such as the bridge module.
Rename SKB_DROP_REASON_VXLAN_NO_REMOTE and give it a more generic name,
so that it can be used in other parts of the network stack. This is not
a functional change, and the numeric value of the drop reason even
remains unchanged.
Signed-off-by: Radu Rendec <rrendec@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20241219163606.717758-2-rrendec@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: 1f5d2fd1ca ("vxlan: Fix NPD in {arp,neigh}_reduce() when using nexthop objects")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 03483dbde8 ]
Replace kfree_skb() with kfree_skb_reason() in vxlan_mdb_xmit. No drop
reasons are introduced in this commit.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: 1f5d2fd1ca ("vxlan: Fix NPD in {arp,neigh}_reduce() when using nexthop objects")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b71a576e45 ]
Replace kfree_skb() with kfree_skb_reason() in vxlan_xmit(). Following
new skb drop reasons are introduced for vxlan:
/* no remote found for xmit */
SKB_DROP_REASON_VXLAN_NO_REMOTE
/* packet without necessary metadata reached a device which is
* in "external" mode
*/
SKB_DROP_REASON_TUNNEL_TXINFO
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: 1f5d2fd1ca ("vxlan: Fix NPD in {arp,neigh}_reduce() when using nexthop objects")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d209706f56 ]
Change the return type of vxlan_set_mac() from bool to enum
skb_drop_reason. In this commit, the drop reason
"SKB_DROP_REASON_LOCAL_MAC" is introduced for the case that the source
mac of the packet is a local mac.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: 1f5d2fd1ca ("vxlan: Fix NPD in {arp,neigh}_reduce() when using nexthop objects")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6ead38147e ]
VXLAN FDB entries can point to either a remote destination or an FDB
nexthop group. The latter is usually used in EVPN deployments where
learning is disabled.
However, when learning is enabled, an incoming packet might try to
refresh an FDB entry that points to an FDB nexthop group and therefore
does not have a remote. Such packets should be dropped, but they are
only dropped after dereferencing the non-existent remote, resulting in a
NPD [1] which can be reproduced using [2].
Fix by dropping such packets earlier. Remove the misleading comment from
first_remote_rcu().
[1]
BUG: kernel NULL pointer dereference, address: 0000000000000000
[...]
CPU: 13 UID: 0 PID: 361 Comm: mausezahn Not tainted 6.17.0-rc1-virtme-g9f6b606b6b37 #1 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-4.fc41 04/01/2014
RIP: 0010:vxlan_snoop+0x98/0x1e0
[...]
Call Trace:
<TASK>
vxlan_encap_bypass+0x209/0x240
encap_bypass_if_local+0xb1/0x100
vxlan_xmit_one+0x1375/0x17e0
vxlan_xmit+0x6b4/0x15f0
dev_hard_start_xmit+0x5d/0x1c0
__dev_queue_xmit+0x246/0xfd0
packet_sendmsg+0x113a/0x1850
__sock_sendmsg+0x38/0x70
__sys_sendto+0x126/0x180
__x64_sys_sendto+0x24/0x30
do_syscall_64+0xa4/0x260
entry_SYSCALL_64_after_hwframe+0x4b/0x53
[2]
#!/bin/bash
ip address add 192.0.2.1/32 dev lo
ip address add 192.0.2.2/32 dev lo
ip nexthop add id 1 via 192.0.2.3 fdb
ip nexthop add id 10 group 1 fdb
ip link add name vx0 up type vxlan id 10010 local 192.0.2.1 dstport 12345 localbypass
ip link add name vx1 up type vxlan id 10020 local 192.0.2.2 dstport 54321 learning
bridge fdb add 00:11:22:33:44:55 dev vx0 self static dst 192.0.2.2 port 54321 vni 10020
bridge fdb add 00:aa:bb:cc:dd:ee dev vx1 self static nhid 10
mausezahn vx0 -a 00:aa:bb:cc:dd:ee -b 00:11:22:33:44:55 -c 1 -q
Fixes: 1274e1cc42 ("vxlan: ecmp support for mac fdb entries")
Reported-by: Marlin Cremers <mcremers@cloudbear.nl>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250901065035.159644-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 289fd4e752 ]
Change the return type of vxlan_snoop() from bool to enum
skb_drop_reason. In this commit, two drop reasons are introduced:
SKB_DROP_REASON_MAC_INVALID_SOURCE
SKB_DROP_REASON_VXLAN_ENTRY_EXISTS
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: 6ead38147e ("vxlan: Fix NPD when refreshing an FDB entry with a nexthop object")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4c06d9daf8 ]
Introduce skb drop reasons to the function vxlan_rcv(). Following new
drop reasons are added:
SKB_DROP_REASON_VXLAN_INVALID_HDR
SKB_DROP_REASON_VXLAN_VNI_NOT_FOUND
SKB_DROP_REASON_IP_TUNNEL_ECN
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: 6ead38147e ("vxlan: Fix NPD when refreshing an FDB entry with a nexthop object")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7f20dbd7de ]
Introduce the function pskb_inet_may_pull_reason() and make
pskb_inet_may_pull a simple inline call to it. The drop reasons of it just
come from pskb_may_pull_reason().
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: 6ead38147e ("vxlan: Fix NPD when refreshing an FDB entry with a nexthop object")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 454bbde8f0 ]
Introduce the function pskb_network_may_pull_reason() and make
pskb_network_may_pull() a simple inline call to it. The drop reasons of
it just come from pskb_may_pull_reason.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: 6ead38147e ("vxlan: Fix NPD when refreshing an FDB entry with a nexthop object")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fa390321ab ]
When tcp_ao_copy_all_matching() fails in tcp_v6_syn_recv_sock() it just
exits the function. This ends up causing a memory-leak:
unreferenced object 0xffff0000281a8200 (size 2496):
comm "softirq", pid 0, jiffies 4295174684
hex dump (first 32 bytes):
7f 00 00 06 7f 00 00 06 00 00 00 00 cb a8 88 13 ................
0a 00 03 61 00 00 00 00 00 00 00 00 00 00 00 00 ...a............
backtrace (crc 5ebdbe15):
kmemleak_alloc+0x44/0xe0
kmem_cache_alloc_noprof+0x248/0x470
sk_prot_alloc+0x48/0x120
sk_clone_lock+0x38/0x3b0
inet_csk_clone_lock+0x34/0x150
tcp_create_openreq_child+0x3c/0x4a8
tcp_v6_syn_recv_sock+0x1c0/0x620
tcp_check_req+0x588/0x790
tcp_v6_rcv+0x5d0/0xc18
ip6_protocol_deliver_rcu+0x2d8/0x4c0
ip6_input_finish+0x74/0x148
ip6_input+0x50/0x118
ip6_sublist_rcv+0x2fc/0x3b0
ipv6_list_rcv+0x114/0x170
__netif_receive_skb_list_core+0x16c/0x200
netif_receive_skb_list_internal+0x1f0/0x2d0
This is because in tcp_v6_syn_recv_sock (and the IPv4 counterpart), when
exiting upon error, inet_csk_prepare_forced_close() and tcp_done() need
to be called. They make sure the newsk will end up being correctly
free'd.
tcp_v4_syn_recv_sock() makes this very clear by having the put_and_exit
label that takes care of things. So, this patch here makes sure
tcp_v4_syn_recv_sock and tcp_v6_syn_recv_sock have similar
error-handling and thus fixes the leak for TCP-AO.
Fixes: 06b22ef295 ("net/tcp: Wire TCP-AO to request sockets")
Signed-off-by: Christoph Paasch <cpaasch@openai.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250830-tcpao_leak-v1-1-e5878c2c3173@openai.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 97acb0259c ]
During GTK rekey, mac80211 issues a clear key (if the old key exists)
followed by an install key operation in the same context. This causes
ath11k to send two WMI commands in quick succession: one to clear the
old key and another to install the new key in the same slot.
Under certain conditions—especially under high load or time sensitive
scenarios, firmware may process these commands asynchronously in a way
that firmware assumes the key is cleared whereas hardware has a valid key.
This inconsistency between hardware and firmware leads to group addressed
packet drops. Only setting the same key again can restore a valid key in
firmware and allow packets to be transmitted.
This issue remained latent because the host's clear key commands were
not effective in firmware until commit 436a4e8865 ("ath11k: clear the
keys properly via DISABLE_KEY"). That commit enabled the host to
explicitly clear group keys, which inadvertently exposed the race.
To mitigate this, restrict group key clearing across all modes (AP, STA,
MESH). During rekey, the new key can simply be set on top of the previous
one, avoiding the need for a clear followed by a set.
However, in AP mode specifically, permit group key clearing when no
stations are associated. This exception supports transitions from secure
modes (e.g., WPA2/WPA3) to open mode, during which all associated peers
are removed and the group key is cleared as part of the transition.
Add a per-BSS station counter to track the presence of stations during
set key operations. Also add a reset_group_keys flag to track the key
re-installation state and avoid repeated installation of the same key
when the number of connected stations transitions to non-zero within a
rekey period.
Additionally, for AP and Mesh modes, when the first station associates,
reinstall the same group key that was last set. This ensures that the
firmware recovers from any race that may have occurred during a previous
key clear when no stations were associated.
This change ensures that key clearing is permitted only when no clients
are connected, avoiding packet loss while enabling dynamic security mode
transitions.
Tested-on: QCN9074 hw1.0 PCI WLAN.HK.2.9.0.1-02146-QCAHKSWPL_SILICONZ-1
Tested-on: WCN6855 hw2.1 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.41
Reported-by: Steffen Moser <lists@steffen-moser.de>
Closes: https://lore.kernel.org/linux-wireless/c6366409-9928-4dd7-bf7b-ba7fcf20eabf@steffen-moser.de
Fixes: 436a4e8865 ("ath11k: clear the keys properly via DISABLE_KEY")
Signed-off-by: Rameshkumar Sundaram <rameshkumar.sundaram@oss.qualcomm.com>
Tested-by: Nicolas Escande <nico.escande@gmail.com>
Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com>
Link: https://patch.msgid.link/20250810170018.1124014-1-rameshkumar.sundaram@oss.qualcomm.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b7e5c3e3bf ]
incorrectly used ixgbe_lp_map in loops intended to populate the
supported and advertised EEE linkmode bitmaps based on ixgbe_ls_map.
This results in incorrect bit setting and potential out-of-bounds
access, since ixgbe_lp_map and ixgbe_ls_map have different sizes
and purposes.
ixgbe_lp_map[i] -> ixgbe_ls_map[i]
Use ixgbe_ls_map for supported and advertised linkmodes, and keep
ixgbe_lp_map usage only for link partner (lp_advertised) mapping.
Fixes: 9356b6db9d ("net: ethernet: ixgbe: Convert EEE to use linkmodes")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a556f06338 ]
list_first_entry() never returns NULL - if the list is empty, it still
returns a pointer to an invalid object, leading to potential invalid
memory access when dereferenced.
Fix this by using list_first_entry_or_null instead of list_first_entry.
Fixes: e3219ce6a7 ("i40e: Add support for client interface for IWARP driver")
Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9fcdb1c3c4 ]
The 'command' and 'netdev_ops' debugfs files are a legacy debugging
interface supported by the i40e driver since its early days by commit
02e9c29081 ("i40e: debugfs interface").
Both of these debugfs files provide a read handler which is mostly useless,
and which is implemented with questionable logic. They both use a static
256 byte buffer which is initialized to the empty string. In the case of
the 'command' file this buffer is literally never used and simply wastes
space. In the case of the 'netdev_ops' file, the last command written is
saved here.
On read, the files contents are presented as the name of the device
followed by a colon and then the contents of their respective static
buffer. For 'command' this will always be "<device>: ". For 'netdev_ops',
this will be "<device>: <last command written>". But note the buffer is
shared between all devices operated by this module. At best, it is mostly
meaningless information, and at worse it could be accessed simultaneously
as there doesn't appear to be any locking mechanism.
We have also recently received multiple reports for both read functions
about their use of snprintf and potential overflow that could result in
reading arbitrary kernel memory. For the 'command' file, this is definitely
impossible, since the static buffer is always zero and never written to.
For the 'netdev_ops' file, it does appear to be possible, if the user
carefully crafts the command input, it will be copied into the buffer,
which could be large enough to cause snprintf to truncate, which then
causes the copy_to_user to read beyond the length of the buffer allocated
by kzalloc.
A minimal fix would be to replace snprintf() with scnprintf() which would
cap the return to the number of bytes written, preventing an overflow. A
more involved fix would be to drop the mostly useless static buffers,
saving 512 bytes and modifying the read functions to stop needing those as
input.
Instead, lets just completely drop the read access to these files. These
are debug interfaces exposed as part of debugfs, and I don't believe that
dropping read access will break any script, as the provided output is
pretty useless. You can find the netdev name through other more standard
interfaces, and the 'netdev_ops' interface can easily result in garbage if
you issue simultaneous writes to multiple devices at once.
In order to properly remove the i40e_dbg_netdev_ops_buf, we need to
refactor its write function to avoid using the static buffer. Instead, use
the same logic as the i40e_dbg_command_write, with an allocated buffer.
Update the code to use this instead of the static buffer, and ensure we
free the buffer on exit. This fixes simultaneous writes to 'netdev_ops' on
multiple devices, and allows us to remove the now unused static buffer
along with removing the read access.
Fixes: 02e9c29081 ("i40e: debugfs interface")
Reported-by: Kunwu Chan <chentao@kylinos.cn>
Closes: https://lore.kernel.org/intel-wired-lan/20231208031950.47410-1-chentao@kylinos.cn/
Reported-by: Wang Haoran <haoranwangsec@gmail.com>
Closes: https://lore.kernel.org/all/CANZ3JQRRiOdtfQJoP9QM=6LS1Jto8PGBGw6y7-TL=BcnzHQn1Q@mail.gmail.com/
Reported-by: Amir Mohammad Jahangirzad <a.jahangirzad@gmail.com>
Closes: https://lore.kernel.org/all/20250722115017.206969-1-a.jahangirzad@gmail.com/
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kunwu Chan <kunwu.chan@linux.dev>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit acf3a5c8be ]
On control planes that allow changing the MAC address of the interface,
the driver must provide a MAC type to avoid errors such as:
idpf 0000:0a:00.0: Transaction failed (op 535)
idpf 0000:0a:00.0: Received invalid MAC filter payload (op 535) (len 0)
idpf 0000:0a:00.0: Transaction failed (op 536)
These errors occur during driver load or when changing the MAC via:
ip link set <iface> address <mac>
Add logic to set the MAC type when sending ADD/DEL (opcodes 535/536) to
the control plane. Since only one primary MAC is supported per vport, the
driver only needs to send an ADD opcode when setting it. Remove the old
address by calling __idpf_del_mac_filter(), which skips the message and
just clears the entry from the internal list. This avoids an error on DEL
as it attempts to remove an address already cleared by the preceding ADD
opcode.
Fixes: ce1b75d063 ("idpf: add ptypes and MAC filter support")
Reported-by: Jian Liu <jianliu@redhat.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f6486338fd ]
Recent versions of the E810 firmware have support for an extra interrupt to
handle report of the "low latency" Tx timestamps coming from the
specialized low latency firmware interface. Instead of polling the
registers, software can wait until the low latency interrupt is fired.
This logic makes use of the Tx timestamp tracking structure, ice_ptp_tx, as
it uses the same "ready" bitmap to track which Tx timestamps complete.
Unfortunately, the ice_ll_ts_intr() function does not check if the
tracker is initialized before its first access. This results in NULL
dereference or use-after-free bugs similar to the issues fixed in the
ice_ptp_ts_irq() function.
Fix this by only checking the in_use bitmap (and other fields) if the
tracker is marked as initialized. The reset flow will clear the init field
under lock before it tears the tracker down, thus preventing any
use-after-free or NULL access.
Fixes: 82e71b226e ("ice: Enable SW interrupt from FW for LL TS")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 030e1c4566 ]
The code currently reads both U32 attributes and U64 attributes as
U64, so when a U32 attribute is provided by userspace (ie, when not
using XPN), on big endian systems, we'll load that value into the
upper 32bits of the next_pn field instead of the lower 32bits. This
means that the value that userspace provided is ignored (we only care
about the lower 32bits for non-XPN), and we'll start using PNs from 0.
Switch to nla_get_uint, which will read the value correctly on all
arches, whether it's 32b or 64b.
Fixes: 48ef50fa86 ("macsec: Netlink support of XPN cipher suites (IEEE 802.1AEbw)")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1c1df1661b89238caf5beefb84a10ebfd56c66ea.1756459839.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c6dd1aa2cb ]
The icmp_ndo_send function was originally introduced to ensure proper
rate limiting when icmp_send is called by a network device driver,
where the packet's source address may have already been transformed
by SNAT.
However, the original implementation only considers the
IP_CT_DIR_ORIGINAL direction for SNAT and always replaced the packet's
source address with that of the original-direction tuple. This causes
two problems:
1. For SNAT:
Reply-direction packets were incorrectly translated using the source
address of the CT original direction, even though no translation is
required.
2. For DNAT:
Reply-direction packets were not handled at all. In DNAT, the original
direction's destination is translated. Therefore, in the reply
direction the source address must be set to the reply-direction
source, so rate limiting works as intended.
Fix this by using the connection direction to select the correct tuple
for source address translation, and adjust the pre-checks to handle
reply-direction packets in case of DNAT.
Additionally, wrap the `ct->status` access in READ_ONCE(). This avoids
possible KCSAN reports about concurrent updates to `ct->status`.
Fixes: 0b41713b60 ("icmp: introduce helper for nat'd source address in network device context")
Signed-off-by: Fabian Bläse <fabian@blaese.de>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0704a3da7c ]
dsp_hwec_enable() allocates dup pointer by kstrdup(arg),
but then it updates dup variable by strsep(&dup, ",").
As a result when it calls kfree(dup), the dup variable may be
a modified pointer that no longer points to the original allocated
memory, causing a memory leak.
The issue is the same pattern as fixed in commit c6a502c229
("mISDN: Fix memory leak in dsp_pipeline_build()").
Fixes: 9a43816182 ("mISDN: Remove VLAs")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250828081457.36061-1-linmq006@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d6a367ec6c ]
Jakub says:
nft_flowtable.sh is one of the most flake-atious test for netdev CI currently :(
The root cause is two-fold:
1. the failing part of the test is supposed to make sure that ip
fragments are forwarded for offloaded flows.
(flowtable has to pass them to classic forward path).
path mtu discovery for these subtests is disabled.
2. nft_flowtable.sh has two passes. One with fixed mtus/file size and
one where link mtus and file sizes are random.
The CI failures all have same pattern:
re-run with random mtus and file size: -o 27663 -l 4117 -r 10089 -s 54384840
[..]
PASS: dscp_egress: dscp packet counters match
FAIL: file mismatch for ns1 -> ns2
In some cases this error triggers a bit ealier, sometimes in a later
subtest:
re-run with random mtus and file size: -o 20201 -l 4555 -r 12657 -s 9405856
[..]
PASS: dscp_egress: dscp packet counters match
PASS: dscp_fwd: dscp packet counters match
2025/08/17 20:37:52 socat[18954] E write(7, 0x560716b96000, 8192): Broken pipe
FAIL: file mismatch for ns1 -> ns2
-rw------- 1 root root 9405856 Aug 17 20:36 /tmp/tmp.2n63vlTrQe
But all logs I saw show same scenario:
1. Failing tests have pmtu discovery off (i.e., ip fragmentation)
2. The test file is much larger than first-pass default (2M Byte)
3. peers have much larger MTUs compared to the 'network'.
These errors are very reproducible when re-running the test with
the same commandline arguments.
The timeout became much more prominent with
1d2fbaad7c ("tcp: stronger sk_rcvbuf checks"): reassembled packets
typically have a skb->truesize more than double the skb length.
As that commit is intentional and pmtud-off with
large-tcp-packets-as-fragments is not normal adjust the test to use a
smaller file for the pmtu-off subtests.
While at it, add more information to pass/fail messages and
also run the dscp alteration subtest with pmtu discovery enabled.
Link: https://netdev.bots.linux.dev/contest.html?test=nft-flowtable-sh
Fixes: f84ab63490 ("selftests: netfilter: nft_flowtable.sh: re-run with random mtu sizes")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20250822071330.4168f0db@kernel.org/
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20250828214918.3385-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2801079119 ]
Move the creation of debugfs files into a dedicated function, and ensure
they are explicitly removed during vhci_release(), before associated
data structures are freed.
Previously, debugfs files such as "force_suspend", "force_wakeup", and
others were created under hdev->debugfs but not removed in
vhci_release(). Since vhci_release() frees the backing vhci_data
structure, any access to these files after release would result in
use-after-free errors.
Although hdev->debugfs is later freed in hci_release_dev(), user can
access files after vhci_data is freed but before hdev->debugfs is
released.
Fixes: ab4e4380d4 ("Bluetooth: Add vhci devcoredump support")
Signed-off-by: Ivan Pravdin <ipravdin.official@gmail.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 54416fd767 ]
The helper registration return value is passed-through by module_init
callbacks which modprobe confuses with the harmless -EEXIST returned
when trying to load an already loaded module.
Make sure modprobe fails so users notice their helper has not been
registered and won't work.
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Fixes: 12f7a50533 ("netfilter: add user-space connection tracking helper infrastructure")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 479a54ab92 ]
When send a broadcast packet to a tap device, which was added to a bridge,
br_nf_local_in() is called to confirm the conntrack. If another conntrack
with the same hash value is added to the hash table, which can be
triggered by a normal packet to a non-bridge device, the below warning
may happen.
------------[ cut here ]------------
WARNING: CPU: 1 PID: 96 at net/bridge/br_netfilter_hooks.c:632 br_nf_local_in+0x168/0x200
CPU: 1 UID: 0 PID: 96 Comm: tap_send Not tainted 6.17.0-rc2-dirty #44 PREEMPT(voluntary)
RIP: 0010:br_nf_local_in+0x168/0x200
Call Trace:
<TASK>
nf_hook_slow+0x3e/0xf0
br_pass_frame_up+0x103/0x180
br_handle_frame_finish+0x2de/0x5b0
br_nf_hook_thresh+0xc0/0x120
br_nf_pre_routing_finish+0x168/0x3a0
br_nf_pre_routing+0x237/0x5e0
br_handle_frame+0x1ec/0x3c0
__netif_receive_skb_core+0x225/0x1210
__netif_receive_skb_one_core+0x37/0xa0
netif_receive_skb+0x36/0x160
tun_get_user+0xa54/0x10c0
tun_chr_write_iter+0x65/0xb0
vfs_write+0x305/0x410
ksys_write+0x60/0xd0
do_syscall_64+0xa4/0x260
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>
---[ end trace 0000000000000000 ]---
To solve the hash conflict, nf_ct_resolve_clash() try to merge the
conntracks, and update skb->_nfct. However, br_nf_local_in() still use the
old ct from local variable 'nfct' after confirm(), which leads to this
warning.
If confirm() does not insert the conntrack entry and return NF_DROP, the
warning may also occur. There is no need to reserve the WARN_ON_ONCE, just
remove it.
Link: https://lore.kernel.org/netdev/20250820043329.2902014-1-wangliang74@huawei.com/
Fixes: 62e7151ae3 ("netfilter: bridge: confirm multicast packets before passing them up the stack")
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9cb83d4be0 ]
The brcmf_btcoex_detach() only shuts down the btcoex timer, if the
flag timer_on is false. However, the brcmf_btcoex_timerfunc(), which
runs as timer handler, sets timer_on to false. This creates critical
race conditions:
1.If brcmf_btcoex_detach() is called while brcmf_btcoex_timerfunc()
is executing, it may observe timer_on as false and skip the call to
timer_shutdown_sync().
2.The brcmf_btcoex_timerfunc() may then reschedule the brcmf_btcoex_info
worker after the cancel_work_sync() has been executed, resulting in
use-after-free bugs.
The use-after-free bugs occur in two distinct scenarios, depending on
the timing of when the brcmf_btcoex_info struct is freed relative to
the execution of its worker thread.
Scenario 1: Freed before the worker is scheduled
The brcmf_btcoex_info is deallocated before the worker is scheduled.
A race condition can occur when schedule_work(&bt_local->work) is
called after the target memory has been freed. The sequence of events
is detailed below:
CPU0 | CPU1
brcmf_btcoex_detach | brcmf_btcoex_timerfunc
| bt_local->timer_on = false;
if (cfg->btcoex->timer_on) |
... |
cancel_work_sync(); |
... |
kfree(cfg->btcoex); // FREE |
| schedule_work(&bt_local->work); // USE
Scenario 2: Freed after the worker is scheduled
The brcmf_btcoex_info is freed after the worker has been scheduled
but before or during its execution. In this case, statements within
the brcmf_btcoex_handler() — such as the container_of macro and
subsequent dereferences of the brcmf_btcoex_info object will cause
a use-after-free access. The following timeline illustrates this
scenario:
CPU0 | CPU1
brcmf_btcoex_detach | brcmf_btcoex_timerfunc
| bt_local->timer_on = false;
if (cfg->btcoex->timer_on) |
... |
cancel_work_sync(); |
... | schedule_work(); // Reschedule
|
kfree(cfg->btcoex); // FREE | brcmf_btcoex_handler() // Worker
/* | btci = container_of(....); // USE
The kfree() above could | ...
also occur at any point | btci-> // USE
during the worker's execution|
*/ |
To resolve the race conditions, drop the conditional check and call
timer_shutdown_sync() directly. It can deactivate the timer reliably,
regardless of its current state. Once stopped, the timer_on state is
then set to false.
Fixes: 61730d4dff ("brcmfmac: support critical protocol API for DHCP")
Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Link: https://patch.msgid.link/20250822050839.4413-1-duoming@zju.edu.cn
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e251709aad ]
During SD suspend/resume without a full card rescan (when using
non-removable SD cards for rootfs), the SD card initialization may fail
after resume. This occurs because, after a host controller reset, the
card detect logic may take time to stabilize due to debounce logic.
Without waiting for stabilization, the host may attempt powering up the
card prematurely, leading to command timeouts during resume flow.
Add sdhci_arasan_set_power_and_bus_voltage() to wait for the card detect
stable bit before power up the card. Since the stabilization time
is not fixed, a maximum timeout of one second is used to ensure
sufficient wait time for the card detect signal to stabilize.
Signed-off-by: Sai Krishna Potthuri <sai.krishna.potthuri@amd.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250730060543.1735971-1-sai.krishna.potthuri@amd.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d35ec48fa6 ]
Since $(LD) is directly used, hence -nostdlib is unneeded, MIPS has
removed this, we should remove it too.
bdbf2038fb ("MIPS: VDSO: remove -nostdlib compiler flag").
In fact, other architectures also use $(LD) now.
fe00e50b2d ("ARM: 8858/1: vdso: use $(LD) instead of $(CC) to link VDSO")
691efbedc6 ("arm64: vdso: use $(LD) instead of $(CC) to link VDSO")
2ff906994b ("MIPS: VDSO: Use $(LD) instead of $(CC) to link VDSO")
2b2a25845d ("s390/vdso: Use $(LD) instead of $(CC) to link vDSO")
Cc: stable@vger.kernel.org
Reviewed-by: Yanteng Si <siyanteng@cqsoftware.com.cn>
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c271c86a4c ]
Glibc added support for .gnu.hash in 2006 and .hash has been obsoleted
far before the first LoongArch CPU was taped. Using --hash-style=sysv
might imply unaddressed issues and confuse readers.
Some architectures use an explicit --hash-style=both for vDSO here, but
DT_GNU_HASH has already been supported by Glibc and Musl and become the
de-facto standard of the distros when the first LoongArch CPU was taped.
So DT_HASH seems just wasting storage space for LoongArch.
Just drop the option and rely on the linker default, which is likely
"gnu" (Arch, Debian, Gentoo, LFS) on all LoongArch distros (confirmed on
Arch, Debian, Gentoo, and LFS; AOSC now defaults to "both" but it seems
just an oversight).
Following the logic of commit 48f6430505 ("arm64/vdso: Remove
--hash-style=sysv").
Link: https://github.com/AOSC-Dev/aosc-os-abbs/pull/9796
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Stable-dep-of: d35ec48fa6 ("LoongArch: vDSO: Remove -nostdlib complier flag")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c653ffc283 ]
The only user of hid_snto32() is Logitech HID++ driver, which always
calls hid_snto32() with valid size (constant, either 12 or 8) and
therefore can simply use sign_extend32().
Make the switch and remove hid_snto32(). Move snto32() and s32ton() to
avoid introducing forward declaration.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Link: https://patch.msgid.link/20241003144656.3786064-2-dmitry.torokhov@gmail.com
[bentiss: fix checkpatch warning]
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
Stable-dep-of: a6b87bfc2a ("HID: core: Harden s32ton() against conversion to 0 bits")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8073330629 ]
Add missing microSD slot vqmmc-supply property, otherwise the kernel
might shut down LDO5 regulator and that would power off the microSD
card slot, possibly while it is in use. Add the property to make sure
the kernel is aware of the LDO5 regulator which supplies the microSD
slot and keeps the LDO5 enabled.
Fixes: 562d222f23 ("arm64: dts: imx8mp: Add support for Data Modul i.MX8M Plus eDM SBC")
Signed-off-by: Marek Vasut <marek.vasut@mailbox.org>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c53cf8ce3b ]
Add missing microSD slot vqmmc-supply property, otherwise the kernel
might shut down LDO5 regulator and that would power off the microSD
card slot, possibly while it is in use. Add the property to make sure
the kernel is aware of the LDO5 regulator which supplies the microSD
slot and keeps the LDO5 enabled.
Fixes: 8d6712695b ("arm64: dts: imx8mp: Add support for DH electronics i.MX8M Plus DHCOM and PDK2")
Signed-off-by: Marek Vasut <marek.vasut@mailbox.org>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5245dc5ff9 ]
Fix SD card removal caused by automatic LDO5 power off after boot:
LDO5: disabling
mmc1: card 59b4 removed
EXT4-fs (mmcblk1p2): shut down requested (2)
Aborting journal on device mmcblk1p2-8.
JBD2: I/O error when updating journal superblock for mmcblk1p2-8.
To prevent this, add vqmmc regulator for USDHC, using a GPIO-controlled
regulator that is supplied by LDO5. Since this is implemented on SoM but
used on baseboards with SD-card interface, implement the functionality
on SoM part and optionally enable it on baseboards if needed.
Fixes: 418d1d840e ("arm64: dts: freescale: add initial device tree for TQMa8MPQL with i.MX8MP")
Signed-off-by: Markus Niebel <Markus.Niebel@ew.tq-group.com>
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f0ba0e7172 ]
Don't call ZONE FINISH for conventional zones as this will result in I/O
errors. Instead check if the zone that needs finishing is a conventional
zone and if yes skip it.
Also factor out the actual handling of finishing a single zone into a
helper function, as do_zone_finish() is growing ever bigger and the
indentations levels are getting higher.
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 112ca94f6c ]
Now if preemption happens between protected_save_fpu_context() and
protected_save_lbt_context(), FTOP context is lost. Because FTOP is
saved by protected_save_lbt_context() but protected_save_fpu_context()
disables TM before that. So save LBT before FPU in setup_sigcontext()
to avoid this potential risk.
Signed-off-by: Hanlu Li <lihanlu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 986bf6ed44 ]
At inode_logged() we do a couple lockless checks for ->logged_trans, and
these are generally safe except the second one in case we get a load or
store tearing due to a concurrent call updating ->logged_trans (either at
btrfs_log_inode() or later at inode_logged()).
In the first case it's safe to compare to the current transaction ID since
once ->logged_trans is set the current transaction, we never set it to a
lower value.
In the second case, where we check if it's greater than zero, we are prone
to load/store tearing races, since we can have a concurrent task updating
to the current transaction ID with store tearing for example, instead of
updating with a single 64 bits write, to update with two 32 bits writes or
four 16 bits writes. In that case the reading side at inode_logged() could
see a positive value that does not match the current transaction and then
return a false negative.
Fix this by doing the second check while holding the inode's spinlock, add
some comments about it too. Also add the data_race() annotation to the
first check to avoid any reports from KCSAN (or similar tools) and comment
about it.
Fixes: 0f8ce49821 ("btrfs: avoid inode logging during rename and link when possible")
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 59a0dd4ab9 ]
At inode_logged() if we find that the inode was not logged before we
update its ->last_dir_index_offset to (u64)-1 with the goal that the
next directory log operation will see the (u64)-1 and then figure out
it must check what was the index of the last logged dir index key and
update ->last_dir_index_offset to that key's offset (this is done in
update_last_dir_index_offset()).
This however has a possibility for a time window where a race can happen
and lead to directory logging skipping dir index keys that should be
logged. The race happens like this:
1) Task A calls inode_logged(), sees ->logged_trans as 0 and then checks
that the inode item was logged before, but before it sets the inode's
->last_dir_index_offset to (u64)-1...
2) Task B is at btrfs_log_inode() which calls inode_logged() early, and
that has set ->last_dir_index_offset to (u64)-1;
3) Task B then enters log_directory_changes() which calls
update_last_dir_index_offset(). There it sees ->last_dir_index_offset
is (u64)-1 and that the inode was logged before (ctx->logged_before is
true), and so it searches for the last logged dir index key in the log
tree and it finds that it has an offset (index) value of N, so it sets
->last_dir_index_offset to N, so that we can skip index keys that are
less than or equal to N (later at process_dir_items_leaf());
4) Task A now sets ->last_dir_index_offset to (u64)-1, undoing the update
that task B just did;
5) Task B will now skip every index key when it enters
process_dir_items_leaf(), since ->last_dir_index_offset is (u64)-1.
Fix this by making inode_logged() not touch ->last_dir_index_offset and
initializing it to 0 when an inode is loaded (at btrfs_alloc_inode()) and
then having update_last_dir_index_offset() treat a value of 0 as meaning
we must check the log tree and update with the index of the last logged
index key. This is fine since the minimum possible value for
->last_dir_index_offset is 1 (BTRFS_DIR_START_INDEX - 1 = 2 - 1 = 1).
This also simplifies the management of ->last_dir_index_offset and now
all accesses to it are done under the inode's log_mutex.
Fixes: 0f8ce49821 ("btrfs: avoid inode logging during rename and link when possible")
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ef07b74e1b ]
There's a race between checking if an inode was logged before and logging
an inode that can cause us to mark an inode as not logged just after it
was logged by a concurrent task:
1) We have inode X which was not logged before neither in the current
transaction not in past transaction since the inode was loaded into
memory, so it's ->logged_trans value is 0;
2) We are at transaction N;
3) Task A calls inode_logged() against inode X, sees that ->logged_trans
is 0 and there is a log tree and so it proceeds to search in the log
tree for an inode item for inode X. It doesn't see any, but before
it sets ->logged_trans to N - 1...
3) Task B calls btrfs_log_inode() against inode X, logs the inode and
sets ->logged_trans to N;
4) Task A now sets ->logged_trans to N - 1;
5) At this point anyone calling inode_logged() gets 0 (inode not logged)
since ->logged_trans is greater than 0 and less than N, but our inode
was really logged. As a consequence operations like rename, unlink and
link that happen afterwards in the current transaction end up not
updating the log when they should.
Fix this by ensuring inode_logged() only updates ->logged_trans in case
the inode item is not found in the log tree if after tacking the inode's
lock (spinlock struct btrfs_inode::lock) the ->logged_trans value is still
zero, since the inode lock is what protects setting ->logged_trans at
btrfs_log_inode().
Fixes: 0f8ce49821 ("btrfs: avoid inode logging during rename and link when possible")
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit abad3d0bad ]
Lonial reported that an out-of-bounds access in cgroup local storage
can be crafted via tail calls. Given two programs each utilizing a
cgroup local storage with a different value size, and one program
doing a tail call into the other. The verifier will validate each of
the indivial programs just fine. However, in the runtime context
the bpf_cg_run_ctx holds an bpf_prog_array_item which contains the
BPF program as well as any cgroup local storage flavor the program
uses. Helpers such as bpf_get_local_storage() pick this up from the
runtime context:
ctx = container_of(current->bpf_ctx, struct bpf_cg_run_ctx, run_ctx);
storage = ctx->prog_item->cgroup_storage[stype];
if (stype == BPF_CGROUP_STORAGE_SHARED)
ptr = &READ_ONCE(storage->buf)->data[0];
else
ptr = this_cpu_ptr(storage->percpu_buf);
For the second program which was called from the originally attached
one, this means bpf_get_local_storage() will pick up the former
program's map, not its own. With mismatching sizes, this can result
in an unintended out-of-bounds access.
To fix this issue, we need to extend bpf_map_owner with an array of
storage_cookie[] to match on i) the exact maps from the original
program if the second program was using bpf_get_local_storage(), or
ii) allow the tail call combination if the second program was not
using any of the cgroup local storage maps.
Fixes: 7d9c342789 ("bpf: Make cgroup storages shared between programs on the same cgroup")
Reported-by: Lonial Con <kongln9170@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20250730234733.530041-4-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fd1c98f0ef ]
Given this is only relevant for BPF tail call maps, it is adding up space
and penalizing other map types. We also need to extend this with further
objects to track / compare to. Therefore, lets move this out into a separate
structure and dynamically allocate it only for BPF tail call maps.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20250730234733.530041-2-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Stable-dep-of: abad3d0bad ("bpf: Fix oob access in cgroup local storage")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Input tensor injection is a debug feature that allows a user-controlled
input to be passed directly to IMX500's inference engine (bypassing the
in-built ISP).
Three new custom controls are added to ENABLE_INJECTION before streaming
begins, to provide appropriate input tensors via an INPUT_TENSOR_FD, and
to provide notification of DNN results in the sensor output via
INJECTION_CMP_FRM.
Signed-off-by: Richard Oliver <richard.oliver@raspberrypi.com>
2025-09-09 16:25:05 +01:00
1915 changed files with 23571 additions and 22083 deletions
pr_err("ftrace: no module PLT for %ps\n",(void*)*addr);
returnfalse;
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.