Commit Graph

15974 Commits

Author SHA1 Message Date
Dave Airlie
8a07b2623e Merge tag 'drm-misc-next-2024-10-31' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for v6.13:

All of the previous pull request, with MORE!

Core Changes:
- Update documentation for scheduler start/stop and job init.
- Add dedede and sm8350-hdk hardware to ci runs.

Driver Changes:
- Small fixes and cleanups to panfrost, omap, nouveau, ivpu, zynqmp, v3d,
  panthor docs, and leadtek-ltk050h3146w.
- Crashdump support for qaic.
- Support DP compliance in zynqmp.
- Add Samsung S6E88A0-AMS427AP24 panel.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/deeef745-f3fb-4e85-a9d0-e8d38d43c1cf@linux.intel.com
2024-11-01 13:46:03 +10:00
Dave Airlie
e7103f8785 Merge tag 'amd-drm-next-6.13-2024-10-25' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.13-2024-10-25:

amdgpu:
- SDMA queue reset support
- SMU 13.0.6 updates
- Add debugfs interface to help limit jpeg queue scheduling for testing
- JPEG 4.0.3 updates
- Initial runtime repartitioning support
- GFX9 fixes
- Misc code cleanups
- Rework IP structures to better handle multiple instances of an IP
- DML updates
- DSC fixes
- HDR fixes
- Brightness control updates
- Runtime pm cleanup
- DMCUB fixes
- DCN 3.5 updates
- Struct drm_edid cleanup
- Fetch EDID from _DDC if available
- Ring noop optimizations
- MES logging fixes
- 3DLUT fixes
- DCN 4.x fixes
- SMU 13.x fixes
- Fixes for set_soft_freq_range()
- ACPI fixes
- SMU 14.x updates
- PSR-SU fixes
- fdinfo cleanup
- DCN documentation updates

amdkfd:
- Misc code cleanups
- Increase event FIFO size
- Copy wave state fixes for SDMA

radeon:
- Fix possible overflow in packet3 check
- Late init connector fix
- Always set GEM function pointer

Documentation:
- Update drm-memory documentation

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241025132336.2416913-1-alexander.deucher@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
2024-10-29 18:25:24 +10:00
Yang Wang
7daa0f6b28 drm/amdgpu: optimize ACA log print
- skip to print CE ACA log.
- optimize ACA log print for MCA.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-28 16:41:26 -04:00
Le Ma
ea9d8863da drm/amdgpu: add generic func to check if ta fw is applicable
Separated xgmi ta is required for specific APU, and driver needs
parse the ta binary properly with aux xgmi ta packed.

v2: make the check function more generic (Lijo)

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-28 16:41:13 -04:00
Prike Liang
d5e3d8a2a6 drm/amdgpu: clean up the suspend_complete
To check the status of S3 suspend completion,
use the PM core pm_suspend_global_flags bit(1)
to detect S3 abort events. Therefore, clean up
the AMDGPU driver's private flag suspend_complete.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-28 16:40:58 -04:00
Prike Liang
58a8c756fc drm/amdgpu: correct the S3 abort check condition
In the normal S3 entry, the TOS cycle counter is not
reset during BIOS execution the _S3 method, so it doesn't
determine whether the _S3 method is executed exactly.
Howerver, the PM core performs the S3 suspend will set the
PM_SUSPEND_FLAG_FW_RESUME bit if all the devices suspend
successfully. Therefore, drivers can check the
pm_suspend_global_flags bit(1) to detect the S3 suspend
abort event.

Fixes: 6704dbf719 ("drm/amdgpu: update suspend status for aborting from deeper suspend")
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-28 16:39:23 -04:00
Christian König
57e92d991e drm/amdgpu: drop volatile from ring buffer
Volatile only prevents the compiler from re-ordering reads and writes.
Since we always only modify the ring buffer from one CPU thread and have
an explicit barrier before signaling the HW this should have no effect at
all and just prevents compiler optimisations.

While at it drop the local variables as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-28 16:32:03 -04:00
Dan Carpenter
dac64cb3e0 drm/amdgpu: Fix amdgpu_ip_block_hw_fini()
This NULL check is reversed so the function doesn't work.

Fixes: dad01f93f4 ("drm/amdgpu: validate hw_fini before function call")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/f4fc849e-4e76-4448-8657-caa4c69910b0@stanley.mountain
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-24 18:07:10 -04:00
Kent Russell
3c0be69bad amdgpu: Don't print L2 status if there's nothing to print
If a 2nd fault comes in before the 1st is handled, the 1st fault will
clear out the FAULT STATUS registers before the 2nd fault is handled.
Thus we get a lot of zeroes. If status=0, just skip the L2 fault status
information, to avoid confusion of why some VM fault status prints in
dmesg are all zeroes.

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-24 18:06:51 -04:00
Jonathan Kim
e46738a58f drm/amdkfd: sever xgmi io link if host driver has disable sharing
Host drivers can create partial hives per guest by disabling xgmi sharing
between certain peers in the main hive.
Typically, these partial hives are fully connected per guest session.
In the event that the host makes a mistake by adding a non-shared node
to a guest session, have the KFD reflect sharing disabled by severing
the IO link.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Tested-by: James Yao <yiqing.yao@amd.com>
Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-24 18:06:34 -04:00
Lang Yu
46186667f9 drm/amdgpu: refine error handling in amdgpu_ttm_tt_pin_userptr
Free sg table when dma_map_sgtable() failed to avoid memory leak.

Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-24 18:06:24 -04:00
Lijo Lazar
d37bc6a4ed drm/amdgpu: Fix the logic for NPS request failure
On a hive, NPS request is placed by the first one for all devices in the
hive. If the request fails, mark the mode as UNKNOWN so that subsequent
devices on unload don't request it. Also, fix the mutex double lock
issue in error condition, should have been mutex_unlock.

Fixes: ee52489d12 ("drm/amdgpu: Place NPS mode request on unload")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-24 18:04:59 -04:00
YiPeng Chai
3d0ffc6418 drm/amdgpu: Reduce redundant gpu resets on nbio v7.4
On nbio v7.4, ras controller interrupt and athub
interrupt are generated after injecting UE to PCIE,
but gpu reset only needs to be triggered once.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-24 18:04:34 -04:00
Frank Min
108bc59fe8 drm/amdgpu: fix random data corruption for sdma 7
There is random data corruption caused by const fill, this is caused by
write compression mode not correctly configured.

So correct compression mode for const fill.

Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 75400f8d6e)
Cc: stable@vger.kernel.org # 6.11.x
2024-10-22 18:11:43 -04:00
Mario Limonciello
bf58f03931 drm/amd: Guard against bad data for ATIF ACPI method
If a BIOS provides bad data in response to an ATIF method call
this causes a NULL pointer dereference in the caller.

```
? show_regs (arch/x86/kernel/dumpstack.c:478 (discriminator 1))
? __die (arch/x86/kernel/dumpstack.c:423 arch/x86/kernel/dumpstack.c:434)
? page_fault_oops (arch/x86/mm/fault.c:544 (discriminator 2) arch/x86/mm/fault.c:705 (discriminator 2))
? do_user_addr_fault (arch/x86/mm/fault.c:440 (discriminator 1) arch/x86/mm/fault.c:1232 (discriminator 1))
? acpi_ut_update_object_reference (drivers/acpi/acpica/utdelete.c:642)
? exc_page_fault (arch/x86/mm/fault.c:1542)
? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:623)
? amdgpu_atif_query_backlight_caps.constprop.0 (drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:387 (discriminator 2)) amdgpu
? amdgpu_atif_query_backlight_caps.constprop.0 (drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:386 (discriminator 1)) amdgpu
```

It has been encountered on at least one system, so guard for it.

Fixes: d38ceaf99e ("drm/amdgpu: add core driver (v4)")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c9b7c809b8)
Cc: stable@vger.kernel.org
2024-10-22 18:08:12 -04:00
Prike Liang
32e7ee293f drm/amdgpu: Dereference the ATCS ACPI buffer
Need to dereference the atcs acpi buffer after
the method is executed, otherwise it will result in
a memory leak.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:51:19 -04:00
Lijo Lazar
591aec150a drm/amdgpu: Save VCN shared memory with init reset
VCN shared memory is in framebuffer and there are some flags initialized
during sw_init. Ideally, such programming should be during hw_init.

Make sure the flags are saved during reset on initialization since that
reset will affect frame buffer region. For clarity, separate it out to
another function.

Fixes: 1e4acf4d93 ("drm/amdgpu: Add reset on init handler for XGMI")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reported-by: Hao Zhou <hao.zhou@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:51:19 -04:00
Sunil Khatri
971d8e1c3f drm/amdgpu: clean unused functions of uvd/vcn/vce
Some of the functions pointers of amdgpu_ip_funcs
are not used and are left commented out. Hence this
cleans those up which arent used.

Cc: Leo Liu <leo.liu@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:51:19 -04:00
Victor Lu
8b22f04833 drm/amdgpu: clear RB_OVERFLOW bit when enabling interrupts for vega20_ih
Port this change to vega20_ih.c:
commit afbf7955ff ("drm/amdgpu: clear RB_OVERFLOW bit when enabling interrupts")

Original commit message:
"Why:
Setting IH_RB_WPTR register to 0 will not clear the RB_OVERFLOW bit
if RB_ENABLE is not set.

How to fix:
Set WPTR_OVERFLOW_CLEAR bit after RB_ENABLE bit is set.
The RB_ENABLE bit is required to be set, together with
WPTR_OVERFLOW_ENABLE bit so that setting WPTR_OVERFLOW_CLEAR bit
would clear the RB_OVERFLOW."

Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:40 -04:00
Sunil Khatri
0016e87054 drm/amdgpu: Clean the functions pointer set as NULL
We dont need to set the functions to NULL which arent
needed as global structure members are by default
set to zero or NULL for pointers.

Cc: Leo Liu <leo.liu@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:39 -04:00
Sunil Khatri
8231e3af96 drm/amdgpu: clean the dummy soft_reset functions
Remove the dummy soft_reset functions for all
ip blocks.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:39 -04:00
Sunil Khatri
f13c7da118 drm/amdgpu: clean the dummy wait_for_idle functions
Remove the dummy wait_for_idle functions for all
ip blocks.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:39 -04:00
Sunil Khatri
aa980de3b5 drm/amdgpu: clean the dummy suspend functions
Remove the dummy suspend functions for all
ip blocks.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:39 -04:00
Sunil Khatri
fbcd0ad5d1 drm/amdgpu: clean the dummy resume functions
Remove the dummy resume functions for all
ip blocks.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:39 -04:00
Sunil Khatri
780002b654 drm/amdgpu: validate wait_for_idle before function call
Before making a function call to wait_for_idle,
validate the function pointer like we do in sw_init.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:39 -04:00
Sunil Khatri
502d76308d drm/amdgpu: validate resume before function call
Before making a function call to resume, validate
the function pointer like we do in sw_init.

Use the helper function amdgpu_ip_block_resume where
same checks and calls are repeated.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:39 -04:00
Sunil Khatri
e095026f00 drm/amdgpu: validate suspend before function call
Before making a function call to suspend, validate
the function pointer like we do in sw_init.

Use the helper function amdgpu_ip_block_suspend where
same checks and calls are repeated.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:39 -04:00
Sunil Khatri
dad01f93f4 drm/amdgpu: validate hw_fini before function call
Before making a function call to hw_fini, validate
the function pointer like we do in sw_init.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:39 -04:00
Srinivasan Shanmugam
9343b904e7 drm/amdgpu/gfx9: Add cleaner shader for GFX9.4.2
This commit adds the cleaner shader microcode for GFX9.4.2 GPUs. The
cleaner shader is a piece of GPU code that is used to clear or
initialize certain GPU resources, such as Local Data Share (LDS), Vector
General Purpose Registers (VGPRs), and Scalar General Purpose Registers
(SGPRs).

Clearing these resources is important for ensuring data isolation
between different workloads running on the GPU. Without the cleaner
shader, residual data from a previous workload could potentially be
accessed by a subsequent workload, leading to data leaks and incorrect
computation results.

The cleaner shader microcode is represented as an array of 32-bit words
(`gfx_9_4_2_cleaner_shader_hex`). This array is the binary
representation of the cleaner shader code, which is written in a
low-level GPU instruction set.

Also, this patch updates the `gfx_v9_0_sw_init` function to initialize
the cleaner shader if the MEC firmware version is 88 or higher. It sets
the `cleaner_shader_ptr` and `cleaner_shader_size` to the appropriate
values and attempts to initialize the cleaner shader.

When the cleaner shader feature is enabled, the AMDGPU driver loads this
array into a specific location in the GPU memory. The GPU then reads
this memory location to fetch and execute the cleaner shader
instructions.

The cleaner shader is executed automatically by the GPU at the end of
each workload, before the next workload starts. This ensures that all
GPU resources are in a clean state before the start of each workload.

This change ensures that the GPU memory is properly cleared between
different processes, preventing data leakage and enhancing security. It
also aligns with the serialization mechanism between KGD and KFD,
ensuring that the GPU state is consistent across different workloads.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:39 -04:00
Frank Min
c379dcf797 drm/amdgpu: fix typo for sdma6 constant fill packet
Fix typo for sdma6 constant fill packet

Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:38 -04:00
Frank Min
75400f8d6e drm/amdgpu: fix random data corruption for sdma 7
There is random data corruption caused by const fill, this is caused by
write compression mode not correctly configured.

So correct compression mode for const fill.

Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:38 -04:00
Sunil Khatri
5ebdb6fd60 drm/amdgpu: clean the dummy sw_fini functions
Remove the dummy sw_fini functions for all
ip blocks.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:37 -04:00
Lijo Lazar
785504dd7f drm/amdgpu: Use SPX as default in partition config
In certain cases - ex: when a reset is required on initialization - XCP
manager won't have a valid partition mode. In such cases, use SPX as the
default selected mode for which partition configuration details are
populated.

Fixes: 4ae86dc878 ("drm/amdgpu: Add sysfs nodes to get xcp details")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reported-by: Hao Zhou <hao.zhou@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:37 -04:00
Sunil Khatri
278b8fbf06 drm/amdgpu: validate sw_fini before function call
Before making a function call to sw_fini, validate
the function pointer like we do in sw_init.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:37 -04:00
Sunil Khatri
7fd12379bd drm/amdgpu: clean the dummy sw_init functions
Remove the dummy sw_init functions for all
IP blocks.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:37 -04:00
Sunil Khatri
df6e463d8f drm/amdgpu: validate sw_init before function call
Before making a function call to sw_init, validate
the function pointer like we do in late_init.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:37 -04:00
Xiaogang Chen
10112bf828 drm/amdkfd: Not restore userptr buffer if kfd process has been removed
When kfd process has been terminated not restore userptr buffer after mmu
notifier invalidates a range.

Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:12 -04:00
Lijo Lazar
8e3a3e847e drm/amdgpu: Zero-initialize mqd backup memory
Zero-initialize mqd backup memory, otherwise the check for
'already-backed-up' could go wrong.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:50:12 -04:00
Alex Deucher
32f0028969 Revert "drm/amdgpu/gfx9: put queue resets behind a debug option"
This reverts commit 7c1a2d8aba.

Extended validation has completed successfully, so enable
these features by default.

Acked-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Jonathan Kim <jonathan.kim@amd.com>
Cc: Jiadong Zhu <Jiadong.Zhu@amd.com>
2024-10-22 17:50:11 -04:00
Zhu Lingshan
9ee8ab245c drm/amdgpu: init saw registers for mmhub v1.0
This commits init registers in the Stand Along Walker
for mmhub v1.0, to support ISP use cases.

Signed-off-by: Zhu Lingshan <lingshan.zhu@amd.com>
Reported-and-tested-by: Du Bin <bin.du@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:49:38 -04:00
Alex Deucher
d2f57b6d89 drm/amdgpu/discovery: add ISP discovery entries for old APUs
Raven1/2 and Picasso have ISP 2.0.0, however their ISP blocks
are not in the IP discovery table yet.

This commit fixes this issue by adding new ISP entries for
Raven and Picasso in the IP discovery table.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Zhu Lingshan <lingshan.zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:49:30 -04:00
Mario Limonciello
c9b7c809b8 drm/amd: Guard against bad data for ATIF ACPI method
If a BIOS provides bad data in response to an ATIF method call
this causes a NULL pointer dereference in the caller.

```
? show_regs (arch/x86/kernel/dumpstack.c:478 (discriminator 1))
? __die (arch/x86/kernel/dumpstack.c:423 arch/x86/kernel/dumpstack.c:434)
? page_fault_oops (arch/x86/mm/fault.c:544 (discriminator 2) arch/x86/mm/fault.c:705 (discriminator 2))
? do_user_addr_fault (arch/x86/mm/fault.c:440 (discriminator 1) arch/x86/mm/fault.c:1232 (discriminator 1))
? acpi_ut_update_object_reference (drivers/acpi/acpica/utdelete.c:642)
? exc_page_fault (arch/x86/mm/fault.c:1542)
? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:623)
? amdgpu_atif_query_backlight_caps.constprop.0 (drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:387 (discriminator 2)) amdgpu
? amdgpu_atif_query_backlight_caps.constprop.0 (drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:386 (discriminator 1)) amdgpu
```

It has been encountered on at least one system, so guard for it.

Fixes: d38ceaf99e ("drm/amdgpu: add core driver (v4)")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-22 17:22:44 -04:00
Thomas Zimmermann
1f828b4dd4 drm/client: Make client support optional
Only build client code if DRM_CLIENT has been selected. Automatially
do so if one of the default clients has been enabled. If client support
has been disabled, the helpers for client-related events are empty and
the regular client functions are not present.

Amdgpu has an internal DRM client, so it has to select DRM_CLIENT by
itself unconditionally.

v3:
- provide empty drm_client_debugfs_init() if DRM_CLIENT=n (kernel
  test robot)

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Xinhui Pan <Xinhui.Pan@amd.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241014085740.582287-12-tzimmermann@suse.de
2024-10-18 09:23:03 +02:00
Thomas Zimmermann
4cf50bae05 drm/amdgpu: Suspend and resume internal clients with client helpers
Replace calls to drm_fb_helper_set_suspend_unlocked() with calls
to the client functions drm_client_dev_suspend() and
drm_client_dev_resume(). Any registered in-kernel client will now
receive suspend and resume events.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Xinhui Pan <Xinhui.Pan@amd.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241014085740.582287-9-tzimmermann@suse.de
2024-10-18 09:23:03 +02:00
Srinivasan Shanmugam
e7457532cb drm/amd/amdgpu: Fix double unlock in amdgpu_mes_add_ring
This patch addresses a double unlock issue in the amdgpu_mes_add_ring
function. The mutex was being unlocked twice under certain error
conditions, which could lead to undefined behavior.

The fix ensures that the mutex is unlocked only once before jumping to
the clean_up_memory label. The unlock operation is moved to just before
the goto statement within the conditional block that checks the return
value of amdgpu_ring_init. This prevents the second unlock attempt after
the clean_up_memory label, which is no longer necessary as the mutex is
already unlocked by this point in the code flow.

This change resolves the potential double unlock and maintains the
correct mutex handling throughout the function.

Fixes below:
Commit d0c423b647 ("drm/amdgpu/mes: use ring for kernel queue
submission"), leads to the following Smatch static checker warning:

	drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1240 amdgpu_mes_add_ring()
	warn: double unlock '&adev->mes.mutex_hidden' (orig line 1213)

drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
    1143 int amdgpu_mes_add_ring(struct amdgpu_device *adev, int gang_id,
    1144                         int queue_type, int idx,
    1145                         struct amdgpu_mes_ctx_data *ctx_data,
    1146                         struct amdgpu_ring **out)
    1147 {
    1148         struct amdgpu_ring *ring;
    1149         struct amdgpu_mes_gang *gang;
    1150         struct amdgpu_mes_queue_properties qprops = {0};
    1151         int r, queue_id, pasid;
    1152
    1153         /*
    1154          * Avoid taking any other locks under MES lock to avoid circular
    1155          * lock dependencies.
    1156          */
    1157         amdgpu_mes_lock(&adev->mes);
    1158         gang = idr_find(&adev->mes.gang_id_idr, gang_id);
    1159         if (!gang) {
    1160                 DRM_ERROR("gang id %d doesn't exist\n", gang_id);
    1161                 amdgpu_mes_unlock(&adev->mes);
    1162                 return -EINVAL;
    1163         }
    1164         pasid = gang->process->pasid;
    1165
    1166         ring = kzalloc(sizeof(struct amdgpu_ring), GFP_KERNEL);
    1167         if (!ring) {
    1168                 amdgpu_mes_unlock(&adev->mes);
    1169                 return -ENOMEM;
    1170         }
    1171
    1172         ring->ring_obj = NULL;
    1173         ring->use_doorbell = true;
    1174         ring->is_mes_queue = true;
    1175         ring->mes_ctx = ctx_data;
    1176         ring->idx = idx;
    1177         ring->no_scheduler = true;
    1178
    1179         if (queue_type == AMDGPU_RING_TYPE_COMPUTE) {
    1180                 int offset = offsetof(struct amdgpu_mes_ctx_meta_data,
    1181                                       compute[ring->idx].mec_hpd);
    1182                 ring->eop_gpu_addr =
    1183                         amdgpu_mes_ctx_get_offs_gpu_addr(ring, offset);
    1184         }
    1185
    1186         switch (queue_type) {
    1187         case AMDGPU_RING_TYPE_GFX:
    1188                 ring->funcs = adev->gfx.gfx_ring[0].funcs;
    1189                 ring->me = adev->gfx.gfx_ring[0].me;
    1190                 ring->pipe = adev->gfx.gfx_ring[0].pipe;
    1191                 break;
    1192         case AMDGPU_RING_TYPE_COMPUTE:
    1193                 ring->funcs = adev->gfx.compute_ring[0].funcs;
    1194                 ring->me = adev->gfx.compute_ring[0].me;
    1195                 ring->pipe = adev->gfx.compute_ring[0].pipe;
    1196                 break;
    1197         case AMDGPU_RING_TYPE_SDMA:
    1198                 ring->funcs = adev->sdma.instance[0].ring.funcs;
    1199                 break;
    1200         default:
    1201                 BUG();
    1202         }
    1203
    1204         r = amdgpu_ring_init(adev, ring, 1024, NULL, 0,
    1205                              AMDGPU_RING_PRIO_DEFAULT, NULL);
    1206         if (r)
    1207                 goto clean_up_memory;
    1208
    1209         amdgpu_mes_ring_to_queue_props(adev, ring, &qprops);
    1210
    1211         dma_fence_wait(gang->process->vm->last_update, false);
    1212         dma_fence_wait(ctx_data->meta_data_va->last_pt_update, false);
    1213         amdgpu_mes_unlock(&adev->mes);
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    1214
    1215         r = amdgpu_mes_add_hw_queue(adev, gang_id, &qprops, &queue_id);
    1216         if (r)
    1217                 goto clean_up_ring;
                         ^^^^^^^^^^^^^^^^^^

    1218
    1219         ring->hw_queue_id = queue_id;
    1220         ring->doorbell_index = qprops.doorbell_off;
    1221
    1222         if (queue_type == AMDGPU_RING_TYPE_GFX)
    1223                 sprintf(ring->name, "gfx_%d.%d.%d", pasid, gang_id, queue_id);
    1224         else if (queue_type == AMDGPU_RING_TYPE_COMPUTE)
    1225                 sprintf(ring->name, "compute_%d.%d.%d", pasid, gang_id,
    1226                         queue_id);
    1227         else if (queue_type == AMDGPU_RING_TYPE_SDMA)
    1228                 sprintf(ring->name, "sdma_%d.%d.%d", pasid, gang_id,
    1229                         queue_id);
    1230         else
    1231                 BUG();
    1232
    1233         *out = ring;
    1234         return 0;
    1235
    1236 clean_up_ring:
    1237         amdgpu_ring_fini(ring);
    1238 clean_up_memory:
    1239         kfree(ring);
--> 1240         amdgpu_mes_unlock(&adev->mes);
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    1241         return r;
    1242 }

Fixes: d0c423b647 ("drm/amdgpu/mes: use ring for kernel queue submission")
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Suggested-by: Jack Xiao <Jack.Xiao@amd.com>
Reported by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit bfaf188360)
2024-10-15 11:49:08 -04:00
Michael Chen
7760d7f93c drm/amdgpu/mes: fix issue of writing to the same log buffer from 2 MES pipes
With Unified MES enabled in gfx12, need separate event log buffer for the
2 MES pipes to avoid data overwrite.

Signed-off-by: Michael Chen <michael.chen@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 144df260f3)
Cc: stable@vger.kernel.org # 6.11.x
2024-10-15 11:48:36 -04:00
Mohammed Anees
c0ec082f10 drm/amdgpu: prevent BO_HANDLES error from being overwritten
Before this patch, if multiple BO_HANDLES chunks were submitted,
the error -EINVAL would be correctly set but could be overwritten
by the return value from amdgpu_cs_p1_bo_handles(). This patch
ensures that if there are multiple BO_HANDLES, we stop.

Fixes: fec5f8e8c6 ("drm/amdgpu: disallow multiple BO_HANDLES chunks in one submit")
Signed-off-by: Mohammed Anees <pvmohammedanees2003@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 40f2cd9882)
Cc: stable@vger.kernel.org
2024-10-15 11:48:05 -04:00
Alex Deucher
d2c72d96df drm/amdgpu: enable enforce_isolation sysfs node on VFs
It should be enabled on both bare metal and VFs.

Fixes: e189be9b2e ("drm/amdgpu: Add enforce_isolation sysfs attribute")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Cc: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
(cherry picked from commit dc8847b054)
2024-10-15 11:47:41 -04:00
Dan Carpenter
9f7e94af35 drm/amdgpu: Fix off by one in current_memory_partition_show()
The >= ARRAY_SIZE() should be > ARRAY_SIZE() to prevent an out of
bounds read.

Fixes: 012be6f22c ("drm/amdgpu: Add sysfs interfaces for NPS mode")
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15 11:26:35 -04:00
Lijo Lazar
d25d26b8a8 drm/amdgpu: Wait for reset on init completion
When reset on initialization is requested, wait for the reset to finish.
In cases where module is loaded after boot, this makes sure all
initialization work is done after a successful return of modprobe.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Ramesh Errabolu <ramesh.errabolu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15 11:22:26 -04:00