This commit introduces enhancements to the handling of the cleaner
shader fence in the AMDGPU MES driver:
- The MES (Microcode Execution Scheduler) now sends a PM4 packet to the
KIQ (Kernel Interface Queue) to request the cleaner shader, ensuring
that requests are handled in a controlled manner and avoiding the
race conditions.
- The CP (Compute Processor) firmware has been updated to use a private
bus for accessing specific registers, avoiding unnecessary operations
that could lead to issues in VF (Virtual Function) mode.
- The cleaner shader fence memory address is now set correctly in the
`mes_set_hw_res_pkt` structure, allowing for proper synchronization of
the cleaner shader execution.
Cc: Christian König <christian.koenig@amd.com>
Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Shaoyun Liu <shaoyun.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Curret kfd does not allocate pasid values, instead uses pasid value for each
vm from graphic driver. So should not prevent graphic driver from releasing
pasid values since the values are allocated by graphic driver, not kfd driver
anymore. This patch does not stop graphic driver release pasid values.
Fixes: 8544374c0f ("drm/amdkfd: Have kfd driver use same PASID values from graphic driver")
Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This commit introduces enhancements to the handling of the cleaner
shader fence in the AMDGPU MES driver:
- The MES (Microcode Execution Scheduler) now sends a PM4 packet to the
KIQ (Kernel Interface Queue) to request the cleaner shader, ensuring
that requests are handled in a controlled manner and avoiding the
race conditions.
- The CP (Compute Processor) firmware has been updated to use a private
bus for accessing specific registers, avoiding unnecessary operations
that could lead to issues in VF (Virtual Function) mode.
- The cleaner shader fence memory address is now set correctly in the
`mes_set_hw_res_pkt` structure, allowing for proper synchronization of
the cleaner shader execution.
Cc: lin cao <lin.cao@amd.com>
Cc: Jingwen Chen <Jingwen.Chen2@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Suggested-by: Shaoyun Liu <shaoyun.liu@amd.com>
Reviewed by: Shaoyun.liu <Shaoyun.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
SVM migration unmap pages from GPU and then update mapping to GPU to
recover page fault. Currently unmap clears the PDE entry for range
length >= huge page and free PTB bo, update mapping to alloc new PT bo.
There is race bug that the freed entry bo maybe still on the pt_free
list, reused when updating mapping and then freed, leave invalid PDE
entry and cause GPU page fault.
By setting the update to clear only one PDE entry or clear PTB, to
avoid unmap to free PTE bo. This fixes the race bug and improve the
unmap and map to GPU performance. Update mapping to huge page will
still free the PTB bo.
With this change, the vm->pt_freed list and work is not needed. Add
WARN_ON(unlocked) in amdgpu_vm_pt_free_dfs to catch if unmap to free the
PTB.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Split the code on a per instance basis. This will allow
us to use the per instance functions in the future to
handle more things per instance.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
SDMA writes has to probe invalidate RW lines. Set snoop bit in mmhub for
this to happen.
v2: Missed a few mmhub_v9_4. Added now.
v3: Calculate hub offset once since it doesn't change inside the loop
Modified function names based on review comments.
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add extra flag definition for ids_flag field to distinguish
between vf/pf/pt modes
v2: Updated kms driver minor version & removed pf check as default is 0
v3: Fix up version (Alex)
v4: rebase (Alex)
Proposed userspace:
e663bed7d6
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If the user has configured a large carveout on a small APU,
only use GTT for VRAM allocations if GTT is larger than
VRAM.
v2: fix reversed check (Philip)
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
On big and small APUs we send KFD VRAM allocations to GTT
since the carve out is either non-existent or relatively
small. However, if someone sets the carve out size to be
relatively large, we may end up using GTT rather than VRAM.
No change of logic with this patch, but it allows the
driver to determine which logic to use based on the
carve out size in the future.
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Certain SOCs may not need much data from VBIOS. Some data like VBIOS
version used will be missed but it doesn't affect functionality. Add a
flag to make VBIOS image optional.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use bios_release wrapper to release memory allocated for vbios image and
reset the variables.
v2:
Use the same wrapper for clean up in sw_fini (Alex Deucher)
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use amdgpu_gfx_off_ctrl_immediate() when powergating.
There's no need for the delay in gfx off allow. The
powergating is dynamically disabled/enabled as for
RV/PCO on compute queues and allowing gfx off again as
soon the job is submitted improves power savings.
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Suggested-by: Błażej Szczygieł <mumei6102@gmail.com>
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3861
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add register list and enable devcoredump for JPEG5_0_0
V2: (Lijo)
- remove version specific callbacks and use simplified helper functions
V3: (Lijo)
- move amdgpu_jpeg_reg_dump_fini() to sw_fini() and avoid the call here
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add register list and enable devcoredump for JPEG2_5_0
V2: (Lijo)
- remove version specific callbacks and use simplified helper functions
V3: (Lijo)
- move amdgpu_jpeg_reg_dump_fini() to sw_fini() and avoid the call here
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add register list and enable devcoredump for JPEG2_0_0
V2: (Lijo)
- remove version specific callbacks and use simplified helper functions
V3: (Lijo)
- move amdgpu_jpeg_reg_dump_fini() to sw_fini() and avoid the call here
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add register list and enable devcoredump for JPEG3_0_0
V2: (Lijo)
- remove version specific callbacks and use simplified helper functions
V3: (Lijo)
- move amdgpu_jpeg_reg_dump_fini() to sw_fini() and avoid the call here
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add register list and enable devcoredump for JPEG4_0_5
V2: (Lijo)
- remove version specific callbacks and use simplified helper functions
V3: (Lijo)
- move amdgpu_jpeg_reg_dump_fini() to sw_fini() and avoid the call here
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add register list and enable devcoredump for JPEG4_0_0
V2: (Lijo)
- remove version specific callbacks and use simplified helper functions
V3: (Lijo)
- move amdgpu_jpeg_reg_dump_fini() to sw_fini() and avoid the call here
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add register list and enable devcoredump for JPEG5_0_1
V2: (Lijo)
- remove version specific callbacks and use simplified helper functions
V3: (Lijo)
- move amdgpu_jpeg_reg_dump_fini() to sw_fini() and avoid the call here
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add register list and enable devcoredump for JPEG4_0_3
V2: (Lijo)
- remove version specific callbacks and use simplified helper functions
V3: (Lijo)
- move amdgpu_jpeg_reg_dump_fini() to sw_fini() and avoid the call here
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add devcoredump helper functions that can be reused for all jpeg versions.
V2: (Lijo)
- add amdgpu_jpeg_reg_dump_init() and amdgpu_jpeg_reg_dump_fini()
- use reg_list and reg_count from init() to dump and print registers
- memory allocation and freeing is moved to the init() and fini()
V3: (Lijo)
- move amdgpu_jpeg_reg_dump_fini() to sw_fini()
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>