linux

mirror of https://github.com/raspberrypi/linux.git synced 2025-12-11 20:39:55 +00:00

Author	SHA1	Message	Date
Alex Deucher	5539bc82ce	drm/amdgpu: fix a memory leak in fence cleanup when unloading commit `7838fb5f11` upstream. Commit `b61badd20b` ("drm/amdgpu: fix usage slab after free") reordered when amdgpu_fence_driver_sw_fini() was called after that patch, amdgpu_fence_driver_sw_fini() effectively became a no-op as the sched entities we never freed because the ring pointers were already set to NULL. Remove the NULL setting. Reported-by: Lin.Cao <lincao12@amd.com> Cc: Vitaly Prosyak <vitaly.prosyak@amd.com> Cc: Christian König <christian.koenig@amd.com> Fixes: `b61badd20b` ("drm/amdgpu: fix usage slab after free") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `a525fa37aa`) Cc: stable@vger.kernel.org [ Adapt to conditional check ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-19 16:35:51 +02:00
David Rosca	6dc4eddeb7	drm/amdgpu/vcn4: Fix IB parsing with multiple engine info packages commit `2b10cb58d7` upstream. There can be multiple engine info packages in one IB and the first one may be common engine, not decode/encode. We need to parse the entire IB instead of stopping after finding first engine info. Signed-off-by: David Rosca <david.rosca@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `dc8f9f0f45`) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-19 16:35:46 +02:00
David Rosca	c53a6447d1	drm/amdgpu/vcn: Allow limiting ctx to instance 0 for AV1 at any time commit `3318f2d20c` upstream. There is no reason to require this to happen on first submitted IB only. We need to wait for the queue to be idle, but it can be done at any time (including when there are multiple video sessions active). Signed-off-by: David Rosca <david.rosca@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `8908fdce06`) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-19 16:35:46 +02:00
David Rosca	8d7cc14712	drm/amdgpu: Add back JPEG to video caps for carrizo and newer [ Upstream commit `2036be3174` ] JPEG is not supported on Vega only. Fixes: `0a6e7b06bd` ("drm/amdgpu: Remove JPEG from vega and carrizo video caps") Signed-off-by: David Rosca <david.rosca@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `0f4dfe86fe`) Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-09-19 16:35:43 +02:00
Colin Ian King	e9998d65bc	drm/amd/amdgpu: Fix missing error return on kzalloc failure [ Upstream commit `467e00b30d` ] Currently the kzalloc failure check just sets reports the failure and sets the variable ret to -ENOMEM, which is not checked later for this specific error. Fix this by just returning -ENOMEM rather than setting ret. Fixes: `4fb9307154` ("drm/amd/amdgpu: remove redundant host to psp cmd buf allocations") Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `1ee9d1a096`) Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-09-09 18:58:25 +02:00
Alex Deucher	2a7cf13dd6	Revert "drm/amdgpu: Avoid extra evict-restore process." This reverts commit `71598a5a77` which is commit `1f02f2044b` upstream. This commit introduced a regression, however the fix for the regression: `aa5fc4362f` ("drm/amdgpu: fix task hang from failed job submission during process kill") depends on things not yet present in 6.12.y and older kernels. Since this commit is more of an optimization, just revert it for 6.12.y and older stable kernels. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x - 6.12.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-09 18:58:23 +02:00
Alex Deucher	608a015c65	drm/amdgpu: drop hw access in non-DC audio fini commit `71403f58b4` upstream. We already disable the audio pins in hw_fini so there is no need to do it again in sw_fini. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4481 Cc: oushixiong <oushixiong1025@163.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `5eeb16ca72`) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-09 18:58:17 +02:00
Alex Deucher	07b367f7eb	Revert "drm/amdgpu: fix incorrect vm flags to map bo" commit `ac4ed2da4c` upstream. This reverts commit `b08425fa77`. Revert this to align with 6.17 because the fixes tag was wrong on this commit. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `be33e8a239`) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-09-04 15:31:53 +02:00
Alex Deucher	75a10c872c	drm/amdgpu: update mmhub 4.1.0 client id mappings commit `a0b34e4c86` upstream. Update the client id mapping so the correct clients get printed when there is a mmhub page fault. Tested-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-08-28 16:31:02 +02:00
Alex Deucher	c56ef0e7d2	drm/amdgpu: update mmhub 3.0.1 client id mappings commit `0bae62cc98` upstream. Update the client id mapping so the correct clients get printed when there is a mmhub page fault. Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `2a2681eda7`) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-08-28 16:31:02 +02:00
Lijo Lazar	20e97e9a29	drm/amdgpu: Update external revid for GC v9.5.0 commit `05c8b69051` upstream. Use different external revid for GC v9.5.0 SOCs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `21c6764ed4`) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-08-28 16:31:02 +02:00
Nathan Chancellor	fc647f6c51	drm/amdgpu: Initialize data to NULL in imu_v12_0_program_rlc_ram() commit `c90f2e1172` upstream. After a recent change in clang to expose uninitialized warnings from const variables and pointers [1], there is a warning in imu_v12_0_program_rlc_ram() because data is passed uninitialized to program_imu_rlc_ram(): drivers/gpu/drm/amd/amdgpu/imu_v12_0.c:374:30: error: variable 'data' is uninitialized when used here [-Werror,-Wuninitialized] 374 \| program_imu_rlc_ram(adev, data, (const u32)size); \| ^~~~ As this warning happens early in clang's frontend, it does not realize that due to the assignment of r to -EINVAL, program_imu_rlc_ram() is never actually called, and even if it were, data would not be dereferenced because size is 0. Just initialize data to NULL to silence the warning, as the commit that added program_imu_rlc_ram() mentioned it would eventually be used over the old method, at which point data can be properly initialized and used. Cc: stable@vger.kernel.org Closes: https://github.com/ClangBuiltLinux/linux/issues/2107 Fixes: `56159fffaa` ("drm/amdgpu: use new method to program rlc ram") Link: `2464313eef` [1] Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-08-28 16:31:02 +02:00
Gang Ba	71598a5a77	drm/amdgpu: Avoid extra evict-restore process. commit `1f02f2044b` upstream. If vm belongs to another process, this is fclose after fork, wait may enable signaling KFD eviction fence and cause parent process queue evicted. [677852.634569] amdkfd_fence_enable_signaling+0x56/0x70 [amdgpu] [677852.634814] __dma_fence_enable_signaling+0x3e/0xe0 [677852.634820] dma_fence_wait_timeout+0x3a/0x140 [677852.634825] amddma_resv_wait_timeout+0x7f/0xf0 [amdkcl] [677852.634831] amdgpu_vm_wait_idle+0x2d/0x60 [amdgpu] [677852.635026] amdgpu_flush+0x34/0x50 [amdgpu] [677852.635208] filp_flush+0x38/0x90 [677852.635213] filp_close+0x14/0x30 [677852.635216] do_close_on_exec+0xdd/0x130 [677852.635221] begin_new_exec+0x1da/0x490 [677852.635225] load_elf_binary+0x307/0xea0 [677852.635231] ? srso_alias_return_thunk+0x5/0xfbef5 [677852.635235] ? ima_bprm_check+0xa2/0xd0 [677852.635240] search_binary_handler+0xda/0x260 [677852.635245] exec_binprm+0x58/0x1a0 [677852.635249] bprm_execve.part.0+0x16f/0x210 [677852.635254] bprm_execve+0x45/0x80 [677852.635257] do_execveat_common.isra.0+0x190/0x200 Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Gang Ba <Gang.Ba@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-08-28 16:31:02 +02:00
Alex Deucher	9225818539	drm/amdgpu/discovery: fix fw based ip discovery commit `514678da56` upstream. We only need the fw based discovery table for sysfs. No need to parse it. Additionally parsing some of the board specific tables may result in incorrect data on some boards. just load the binary and don't parse it on those boards. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4441 Fixes: `80a0e82829` ("drm/amdgpu/discovery: optionally use fw based ip discovery") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `62eedd150f`) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-08-28 16:31:02 +02:00
Xaver Hugl	9ef515e171	amdgpu/amdgpu_discovery: increase timeout limit for IFWI init commit `928587381b` upstream. With a timeout of only 1 second, my rx 5700XT fails to initialize, so this increases the timeout to 2s. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3697 Signed-off-by: Xaver Hugl <xaver.hugl@kde.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `9ed3d7bdf2`) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-08-28 16:30:59 +02:00
Jack Xiao	cd54dc1fd7	drm/amdgpu: fix incorrect vm flags to map bo [ Upstream commit `040bc6d0e0` ] It should use vm flags instead of pte flags to specify bo vm attributes. Fixes: `7946340fa3` ("drm/amdgpu: Move csa related code to separate file") Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `b08425fa77`) Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-08-20 18:30:50 +02:00
YiPeng Chai	0d086c85ec	drm/amdgpu: fix vram reservation issue [ Upstream commit `10ef476aad` ] The vram block allocation flag must be cleared before making vram reservation, otherwise reserving addresses within the currently freed memory range will always fail. Fixes: `c9cad937c0` ("drm/amdgpu: add drm buddy support to amdgpu") Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `d38eaf27de`) Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-08-20 18:30:50 +02:00
Alex Deucher	39dfbf77c6	drm/amdgpu/gfx10: fix kiq locking in KCQ reset [ Upstream commit `a4b2ba8f63` ] The ring test needs to be inside the lock. Fixes: `097af47d3c` ("drm/amdgpu/gfx10: wait for reset done before remap") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-08-15 12:13:44 +02:00
Alex Deucher	6db9f958b4	drm/amdgpu/gfx9.4.3: fix kiq locking in KCQ reset [ Upstream commit `08f116c593` ] The ring test needs to be inside the lock. Fixes: `4c953e53cc` ("drm/amdgpu/gfx_9.4.3: wait for reset done before remap") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-08-15 12:13:44 +02:00
Alex Deucher	b9f5d112e5	drm/amdgpu/gfx9: fix kiq locking in KCQ reset [ Upstream commit `730ea5074d` ] The ring test needs to be inside the lock. Fixes: `fdbd69486b` ("drm/amdgpu/gfx9: wait for reset done before remap") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-08-15 12:13:44 +02:00
Lijo Lazar	bcdd7499bd	drm/amdgpu: Remove nbiov7.9 replay count reporting [ Upstream commit `0f566f0e9c` ] Direct pcie replay count reporting is not available on nbio v7.9. Reporting is done through firmware. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Fixes: `50709d18f4` ("drm/amdgpu: Add pci replay count to nbio v7.9") Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-08-15 12:13:39 +02:00
Arunpravin Paneer Selvam	198604687f	drm/amdgpu: Reset the clear flag in buddy during resume commit `95a16160ca` upstream. - Added a handler in DRM buddy manager to reset the cleared flag for the blocks in the freelist. - This is necessary because, upon resuming, the VRAM becomes cluttered with BIOS data, yet the VRAM backend manager believes that everything has been cleared. v2: - Add lock before accessing drm_buddy_clear_reset_blocks()(Matthew Auld) - Force merge the two dirty blocks.(Matthew Auld) - Add a new unit test case for this issue.(Matthew Auld) - Having this function being able to flip the state either way would be good. (Matthew Brost) v3(Matthew Auld): - Do merge step first to avoid the use of extra reset flag. Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Cc: stable@vger.kernel.org Fixes: `a68c7eaa7a` ("drm/amdgpu: Enable clear page functionality") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3812 Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20250716075125.240637-2-Arunpravin.PaneerSelvam@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-08-01 09:48:42 +01:00
Lijo Lazar	62f2a58a4c	drm/amdgpu: Increase reset counter only on success commit `86790e300d` upstream. Increment the reset counter only if soft recovery succeeded. This is consistent with a ring hard reset behaviour where counter gets incremented only if hard reset succeeded. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `25c314aa3e`) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-07-24 08:56:23 +02:00
Eeli Haapalainen	228ad2ab5b	drm/amdgpu/gfx8: reset compute ring wptr on the GPU on resume commit `8326193401` upstream. Commit `42cdf6f687` ("drm/amdgpu/gfx8: always restore kcq MQDs") made the ring pointer always to be reset on resume from suspend. This caused compute rings to fail since the reset was done without also resetting it for the firmware. Reset wptr on the GPU to avoid a disconnect between the driver and firmware wptr. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3911 Fixes: `42cdf6f687` ("drm/amdgpu/gfx8: always restore kcq MQDs") Signed-off-by: Eeli Haapalainen <eeli.haapalainen@protonmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `2becafc319`) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-07-24 08:56:23 +02:00
Srinivasan Shanmugam	07ed75bfa7	drm/amdgpu: Replace Mutex with Spinlock for RLCG register access to avoid Priority Inversion in SRIOV commit `dc0297f319` upstream. RLCG Register Access is a way for virtual functions to safely access GPU registers in a virtualized environment., including TLB flushes and register reads. When multiple threads or VFs try to access the same registers simultaneously, it can lead to race conditions. By using the RLCG interface, the driver can serialize access to the registers. This means that only one thread can access the registers at a time, preventing conflicts and ensuring that operations are performed correctly. Additionally, when a low-priority task holds a mutex that a high-priority task needs, ie., If a thread holding a spinlock tries to acquire a mutex, it can lead to priority inversion. register access in amdgpu_virt_rlcg_reg_rw especially in a fast code path is critical. The call stack shows that the function amdgpu_virt_rlcg_reg_rw is being called, which attempts to acquire the mutex. This function is invoked from amdgpu_sriov_wreg, which in turn is called from gmc_v11_0_flush_gpu_tlb. The [ BUG: Invalid wait context ] indicates that a thread is trying to acquire a mutex while it is in a context that does not allow it to sleep (like holding a spinlock). Fixes the below: [ 253.013423] ============================= [ 253.013434] [ BUG: Invalid wait context ] [ 253.013446] 6.12.0-amdstaging-drm-next-lol-050225 #14 Tainted: G U OE [ 253.013464] ----------------------------- [ 253.013475] kworker/0:1/10 is trying to lock: [ 253.013487] ffff9f30542e3cf8 (&adev->virt.rlcg_reg_lock){+.+.}-{3:3}, at: amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.013815] other info that might help us debug this: [ 253.013827] context-{4:4} [ 253.013835] 3 locks held by kworker/0:1/10: [ 253.013847] #0: ffff9f3040050f58 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680 [ 253.013877] #1: ffffb789c008be40 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680 [ 253.013905] #2: ffff9f3054281838 (&adev->gmc.invalidate_lock){+.+.}-{2:2}, at: gmc_v11_0_flush_gpu_tlb+0x198/0x4f0 [amdgpu] [ 253.014154] stack backtrace: [ 253.014164] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Tainted: G U OE 6.12.0-amdstaging-drm-next-lol-050225 #14 [ 253.014189] Tainted: [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [ 253.014203] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/18/2024 [ 253.014224] Workqueue: events work_for_cpu_fn [ 253.014241] Call Trace: [ 253.014250] <TASK> [ 253.014260] dump_stack_lvl+0x9b/0xf0 [ 253.014275] dump_stack+0x10/0x20 [ 253.014287] __lock_acquire+0xa47/0x2810 [ 253.014303] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.014321] lock_acquire+0xd1/0x300 [ 253.014333] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.014562] ? __lock_acquire+0xa6b/0x2810 [ 253.014578] __mutex_lock+0x85/0xe20 [ 253.014591] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.014782] ? sched_clock_noinstr+0x9/0x10 [ 253.014795] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.014808] ? local_clock_noinstr+0xe/0xc0 [ 253.014822] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.015012] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.015029] mutex_lock_nested+0x1b/0x30 [ 253.015044] ? mutex_lock_nested+0x1b/0x30 [ 253.015057] amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.015249] amdgpu_sriov_wreg+0xc5/0xd0 [amdgpu] [ 253.015435] gmc_v11_0_flush_gpu_tlb+0x44b/0x4f0 [amdgpu] [ 253.015667] gfx_v11_0_hw_init+0x499/0x29c0 [amdgpu] [ 253.015901] ? __pfx_smu_v13_0_update_pcie_parameters+0x10/0x10 [amdgpu] [ 253.016159] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.016173] ? smu_hw_init+0x18d/0x300 [amdgpu] [ 253.016403] amdgpu_device_init+0x29ad/0x36a0 [amdgpu] [ 253.016614] amdgpu_driver_load_kms+0x1a/0xc0 [amdgpu] [ 253.017057] amdgpu_pci_probe+0x1c2/0x660 [amdgpu] [ 253.017493] local_pci_probe+0x4b/0xb0 [ 253.017746] work_for_cpu_fn+0x1a/0x30 [ 253.017995] process_one_work+0x21e/0x680 [ 253.018248] worker_thread+0x190/0x330 [ 253.018500] ? __pfx_worker_thread+0x10/0x10 [ 253.018746] kthread+0xe7/0x120 [ 253.018988] ? __pfx_kthread+0x10/0x10 [ 253.019231] ret_from_fork+0x3c/0x60 [ 253.019468] ? __pfx_kthread+0x10/0x10 [ 253.019701] ret_from_fork_asm+0x1a/0x30 [ 253.019939] </TASK> v2: s/spin_trylock/spin_lock_irqsave to be safe (Christian). Fixes: `e864180ee4` ("drm/amdgpu: Add lock around VF RLCG interface") Cc: lin cao <lin.cao@amd.com> Cc: Jingwen Chen <Jingwen.Chen2@amd.com> Cc: Victor Skvortsov <victor.skvortsov@amd.com> Cc: Zhigang Luo <zhigang.luo@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> [ Minor context change fixed. ] Signed-off-by: Wenshan Lan <jetlan9@163.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-07-17 18:37:01 +02:00
Flora Cui	04513cf158	drm/amdgpu/ip_discovery: add missing ip_discovery fw commit `2f6dd741cd` upstream. Signed-off-by: Flora Cui <flora.cui@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Jonathan Gray <jsg@jsg.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-07-17 18:37:01 +02:00
Flora Cui	39d6a607d5	drm/amdgpu/discovery: use specific ip_discovery.bin for legacy asics commit `25f602fbbc` upstream. vega10/vega12/vega20/raven/raven2/picasso/arcturus/aldebaran Signed-off-by: Flora Cui <flora.cui@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Jonathan Gray <jsg@jsg.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-07-17 18:37:01 +02:00
Alex Deucher	7ccaa5fa5d	drm/amdgpu/mes: add missing locking in helper functions [ Upstream commit `40f970ba7a` ] We need to take the MES lock. Reviewed-by: Michael Chen <michael.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-07-10 16:05:05 +02:00
Frank Min	81ebb8d755	drm/amdgpu: add kicker fws loading for gfx11/smu13/psp13 [ Upstream commit `854171405e` ] 1. Add kicker firmwares loading for gfx11/smu13/psp13 2. Register additional MODULE_FIRMWARE entries for kicker fws - gc_11_0_0_rlc_kicker.bin - gc_11_0_0_imu_kicker.bin - psp_13_0_0_sos_kicker.bin - psp_13_0_0_ta_kicker.bin - smu_13_0_0_kicker.bin Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `fb5ec2174d`) Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-07-10 16:05:04 +02:00
Sonny Jiang	b47a1f9323	drm/amdgpu: VCN v5_0_1 to prevent FW checking RB during DPG pause [ Upstream commit `46e15197b5` ] Add a protection to ensure programming are all complete prior VCPU starting. This is a WA for an unintended VCPU running. Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `c29521b529`) Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-07-10 16:05:04 +02:00
Alex Deucher	5f2e040f19	drm/amdgpu: switch job hw_fence to amdgpu_fence commit `ebe4354270` upstream. Use the amdgpu fence container so we can store additional data in the fence. This also fixes the start_time handling for MCBP since we were casting the fence to an amdgpu_fence and it wasn't. Fixes: `3f4c175d62` ("drm/amdgpu: MCBP based on DRM scheduler (v9)") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `bf1cd14f9e`) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-07-06 11:01:47 +02:00
Jesse Zhang	9cfa2fea25	drm/amdgpu: Fix SDMA UTC_L1 handling during start/stop sequences commit `7f3b16f3f2` upstream. This commit makes two key fixes to SDMA v4.4.2 handling: 1. disable UTC_L1 in sdma_cntl register when stopping SDMA engines by reading the current value before modifying UTC_L1_ENABLE bit. 2. Ensure UTC_L1_ENABLE is consistently managed by: - Adding the missing register write when enabling UTC_L1 during start - Keeping UTC_L1 enabled by default as per hardware requirements v2: Correct SDMA_CNTL setting (Philip) Suggested-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `375bf56465`) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-07-06 11:01:47 +02:00
Frank Min	593517e556	drm/amdgpu: Add kicker device detection commit `0bbf5fd86c` upstream. 1. add kicker device list 2. add kicker device checking helper function Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `09aa2b408f`) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-07-06 11:01:46 +02:00
John Olender	e2c3133ff4	drm/amdgpu: amdgpu_vram_mgr_new(): Clamp lpfn to total vram commit `4d2f6b4e4c` upstream. The drm_mm allocator tolerated being passed end > mm->size, but the drm_buddy allocator does not. Restore the pre-buddy-allocator behavior of allowing such placements. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3448 Signed-off-by: John Olender <john.olender@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-07-06 11:01:45 +02:00
Mario Limonciello	23116bf9a3	drm/amd: Adjust output for discovery error handling [ Upstream commit `73eab78721` ] commit `017fbb6690` ("drm/amdgpu/discovery: check ip_discovery fw file available") added support for reading an amdgpu IP discovery bin file for some specific products. If it's not found then it will fallback to hardcoded values. However if it's not found there is also a lot of noise about missing files and errors. Adjust the error handling to decrease most messages to DEBUG and to show users less about missing files. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reported-by: Marcus Seyfarth <m.seyfarth@gmail.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4312 Tested-by: Marcus Seyfarth <m.seyfarth@gmail.com> Fixes: `017fbb6690` ("drm/amdgpu/discovery: check ip_discovery fw file available") Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250617183052.1692059-1-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `49f1f9f6c3`) Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-07-06 11:01:41 +02:00
Alex Deucher	840fe792a1	drm/amdgpu/discovery: optionally use fw based ip discovery [ Upstream commit `80a0e82829` ] On chips without native IP discovery support, use the fw binary if available, otherwise we can continue without it. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Flora Cui <flora.cui@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: `73eab78721` ("drm/amd: Adjust output for discovery error handling") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-07-06 11:01:41 +02:00
Philip Yang	777580609d	drm/amdgpu: seq64 memory unmap uses uninterruptible lock [ Upstream commit `a359288ccb` ] To unmap and free seq64 memory when drm node close to free vm, if there is signal accepted, then taking vm lock failed and leaking seq64 va mapping, and then dmesg has error log "still active bo inside vm". Change to use uninterruptible lock fix the mapping leaking and no dmesg error log. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-07-06 11:01:34 +02:00
David (Ming Qiang) Wu	cef081c823	drm/amdgpu: read back register after written for VCN v4.0.5 commit `ee7360fc27` upstream. On VCN v4.0.5 there is a race condition where the WPTR is not updated after starting from idle when doorbell is used. Adding register read-back after written at function end is to ensure all register writes are done before they can be used. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12528 Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Tested-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `07c9db090b`) Cc: stable@vger.kernel.org (cherry picked from commit `ee7360fc27`) Hand modified for contextual changes where there is a for loop in 6.12 that was dropped later on. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-06-27 11:11:40 +01:00
Shiwu Zhang	f5e9d0d206	drm/amdgpu: enlarge the VBIOS binary size limit [ Upstream commit `667b96134c` ] Some chips have a larger VBIOS file so raise the size limit to support the flashing tool. Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-29 11:02:54 +02:00
Lijo Lazar	7ef18e2ffd	drm/amdgpu: Use active umc info from discovery [ Upstream commit `f7a594e405` ] There could be configs where some UMC instances are harvested. This information is obtained through discovery data and populated in umc.active_mask. Avoid reassigning this as AID mask, instead use the mask directly while iterating through umc instances. This is to avoid accesses to harvested UMC instances. v2: fix warning (Alex) Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-29 11:02:54 +02:00
Jiang Liu	a25d045ebf	drm/amdgpu: reset psp->cmd to NULL after releasing the buffer [ Upstream commit `e92f3f94ca` ] Reset psp->cmd to NULL after releasing the buffer in function psp_sw_fini(). Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jiang Liu <gerry@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-29 11:02:53 +02:00
Harish Kasiviswanathan	452807a863	drm/amdgpu: Set snoop bit for SDMA for MI series [ Upstream commit `3394b1f76d` ] SDMA writes has to probe invalidate RW lines. Set snoop bit in mmhub for this to happen. v2: Missed a few mmhub_v9_4. Added now. v3: Calculate hub offset once since it doesn't change inside the loop Modified function names based on review comments. Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-29 11:02:52 +02:00
Alex Deucher	365d302ac7	drm/amdgpu/mes11: fix set_hw_resources_1 calculation [ Upstream commit `1350dd3691` ] It's GPU page size not CPU page size. In most cases they are the same, but not always. This can lead to overallocation on systems with larger pages. Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Cc: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-29 11:02:52 +02:00
Christian König	95b8f2b7d9	drm/amdgpu: remove all KFD fences from the BO on release [ Upstream commit `cb0de06d1b` ] Remove all KFD BOs from the private dma_resv object. This prevents the KFD from being evict unecessarily when an exported BO is released. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-and-tested-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-29 11:02:41 +02:00
Victor Lu	a23f391012	drm/amdgpu: Do not program AGP BAR regs under SRIOV in gfxhub_v1_0.c [ Upstream commit `057fef20b8` ] SRIOV VF does not have write access to AGP BAR regs. Skip the writes to avoid a dmesg warning. Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-29 11:02:29 +02:00
Emily Deng	ecaa856227	drm/amdgpu: Fix missing drain retry fault the last entry [ Upstream commit `fe2fa3be3d` ] While the entry get in svm_range_unmap_from_cpu is the last entry, and the entry is page fault, it also need to be dropped. So for equal case, it also need to be dropped. v2: Only modify the svm_range_restore_pages. Signed-off-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Xiaogang Chen<xiaogang.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-29 11:02:26 +02:00
David Rosca	de3c09de74	drm/amdgpu: Update SRIOV video codec caps [ Upstream commit `19478f2011` ] There have been multiple fixes to the video caps that are missing for SRIOV. Update the SRIOV caps with correct values. Signed-off-by: David Rosca <david.rosca@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-29 11:02:19 +02:00
Alex Deucher	858425dc2d	drm/amdgpu/gfx11: don't read registers in mqd init [ Upstream commit `e27b36ea6b` ] Just use the default values. There's not need to get the value from hardware and it could cause problems if we do that at runtime and gfxoff is active. Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-29 11:02:18 +02:00
Alex Deucher	73d437ae63	drm/amdgpu/gfx12: don't read registers in mqd init [ Upstream commit `fc3c139cf0` ] Just use the default values. There's not need to get the value from hardware and it could cause problems if we do that at runtime and gfxoff is active. Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-29 11:02:18 +02:00
Alex Deucher	94206e0d72	drm/amdgpu: adjust drm_firmware_drivers_only() handling [ Upstream commit `e00e5c2238` ] Move to probe so we can check the PCI device type and only apply the drm_firmware_drivers_only() check for PCI DISPLAY classes. Also add a module parameter to override the nomodeset kernel parameter as a workaround for platforms that have this hardcoded on their kernel command lines. Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-29 11:02:13 +02:00

1 2 3 4 5 ...

14925 Commits