Commit Graph

11961 Commits

Author SHA1 Message Date
Evan Quan
7436538899 drm/amdgpu: avoid gfx register accessing during gfxoff
Make sure gfxoff is disabled before gfx register accessing.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:42 -04:00
Lijo Lazar
bb66ecbf12 drm/amdgpu: Use simplified API for p2p dist calc
Use the simpified API that calculates distance between two devices.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:42 -04:00
Lijo Lazar
d0fa84f174 drm/amdgpu: Disable verbose for p2p dist calc
Disable verbose while getting p2p distance. With verbose, it shows
warning if ACS redirect is set between the devices. Adds noise
to dmesg logs when a few GPU devices are on the same platform.

Example log:

amdgpu 0000:34:00.0: ACS redirect is set between the client and provider (0000:31:00.0)
amdgpu 0000:34:00.0: to disable ACS redirect for this path, add the kernel parameter:
	pci=disable_acs_redir=0000:30:00.0;0000:2e:00.0;0000:33:00.0;0000:2e:10.0

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:42 -04:00
YiPeng Chai
642c040113 drm/amdgpu: Fixed ras warning when uninstalling amdgpu
For the asic using smu v13_0_2, there is the following
warning when uninstalling amdgpu:
  amdgpu: ras disable gfx failed poison:1 ret:-22.

[Why]:
  For the asic using smu v13_0_2, the psp .suspend and
  mode1reset is called before amdgpu_ras_pre_fini during
  amdgpu uninstall, it has disabled all ras features and
  reset the psp. Since the psp is reset, calling
  amdgpu_ras_disable_all_features in amdgpu_ras_pre_fini
  to disable ras features will fail.

[How]:
  If all ras features are disabled, amdgpu_ras_disable_all_features
  will not be called to disable all ras features again.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:42 -04:00
Hawking Zhang
7c32d4e37f drm/amdgpu/gfx11: switch to amdgpu_gfx_rlc_init_microcode
switch to common helper to initialize rlc firmware
for gfx11

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:42 -04:00
Hawking Zhang
39a35d52d4 drm/amdgpu/gfx10: switch to amdgpu_gfx_rlc_init_microcode
switch to common helper to initialize rlc firmware
for gfx10

v2: squash in size validation fix (Alex)

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29 09:41:42 -04:00
Dave Airlie
e8573000f4 Merge tag 'amd-drm-next-6.1-2022-09-23' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.1-2022-09-23:

amdgpu:
- SDMA fix
- Add new firmware types to debugfs/IOCTL version queries
- Misc spelling and grammar fixes
- Misc code cleanups
- DCN 3.2.x fixes
- DCN 3.1.x fixes
- CS cleanup
- Gang submit support
- Clang fixes
- Non-DC audio fix
- GPUVM locking fixes
- Vega10 PWN fan speed fix

amdkgd:
- MQD manager cleanup
- Misc spelling and grammar fixes

UAPI:
- Add new firmware types to the FW version query IOCTL

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220923215729.6061-1-alexander.deucher@amd.com
2022-09-28 14:56:09 +10:00
Dave Airlie
907cc346ff Merge tag 'drm-misc-next-2022-09-23' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for 6.1:

UAPI Changes:

Cross-subsystem Changes:
  - dma-buf: Improve signaling when debugging

Core Changes:
  - Backlight handling improvements
  - format-helper: Add drm_fb_build_fourcc_list()
  - fourcc: Kunit tests improvements
  - modes: Add DRM_MODE_INIT() macro
  - plane: Remove drm_plane_init(), Allocate planes with drm_universal_plane_alloc()
  - plane-helper: Add drm_plane_helper_atomic_check()
  - probe-helper: Add drm_connector_helper_get_modes_fixed() and
    drm_crtc_helper_mode_valid_fixed()
  - tests: Conversion to parametrized tests, test name consistency

Driver Changes:
  - amdgpu: Fix for a VRAM eviction issue
  - ast: Resolution handling improvements
  - mediatek: small code improvements for DP
  - omap: Refcounting fix, small improvements
  - rockchip: RK3568 support, Gamma support for RK3399
  - sun4i: Build failure fix when !OF
  - udl: Multiple fixes here and there
  - vc4: HDMI hotplug handling improvements
  - vkms: Warning fix

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220923073943.d43tne5hni3iknlv@houat
2022-09-28 13:50:46 +10:00
Bokun Zhang
3b7329cf5a drm/amdgpu: Add amdgpu suspend-resume code path under SRIOV
- Under SRIOV, we need to send REQ_GPU_FINI to the hypervisor
  during the suspend time. Furthermore, we cannot request a
  mode 1 reset under SRIOV as VF. Therefore, we will skip it
  as it is called in suspend_noirq() function.

- In the resume code path, we need to send REQ_GPU_INIT to the
  hypervisor and also resume PSP IP block under SRIOV.

Signed-off-by: Bokun Zhang <Bokun.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2022-09-27 18:03:36 -04:00
Jiadong.Zhu
11e38360cc drm/amdgpu: Remove fence_process in count_emitted
The function amdgpu_fence_count_emitted used in work_hander should not call
amdgpu_fence_process which must be used in irq handler.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jiadong.Zhu <Jiadong.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-27 18:03:09 -04:00
Jiadong.Zhu
b3e45b18e5 drm/amdgpu: Correct the position in patch_cond_exec
The current position calulated in gfx_v9_0_ring_emit_patch_cond_exec
underflows when the wptr is divisible by ring->buf_mask + 1.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jiadong.Zhu <Jiadong.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-27 18:02:58 -04:00
Graham Sider
91ef6cfd30 drm/amdgpu: pass queue size and is_aql_queue to MES
Update mes_v11_api_def.h add_queue API with is_aql_queue parameter. Also
re-use gds_size for the queue size (unused for KFD). MES requires the
queue size in order to compute the actual wptr offset within the queue
RB since it increases monotonically for AQL queues.

v2: Make is_aql_queue assign clearer

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-27 17:54:12 -04:00
Evan Quan
7516777434 drm/amdgpu: avoid gfx register accessing during gfxoff
Make sure gfxoff is disabled before gfx register accessing.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-27 17:46:52 -04:00
Hawking Zhang
f6f8bb5989 drm/amdgpu/gfx9: switch to amdgpu_gfx_rlc_init_microcode
switch to common helper to initialize rlc firmware
for gfx9

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-27 17:02:39 -04:00
Hawking Zhang
5b41521268 drm/amdgpu: add helper to init rlc firmware
To initialzie rlc firmware according to rlc
firmware header version

v2: squash in backwards compat fix

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-27 17:02:38 -04:00
Jim Cromie
f158936b60 drm: POC drm on dyndbg - use in core, 2 helpers, 3 drivers.
Use DECLARE_DYNDBG_CLASSMAP across DRM:

 - in .c files, since macro defines/initializes a record

 - in drivers, $mod_{drv,drm,param}.c
   ie where param setup is done, since a classmap is param related

 - in drm/drm_print.c
   since existing __drm_debug param is defined there,
   and we ifdef it, and provide an elaborated alternative.

 - in drm_*_helper modules:
   dp/drm_dp - 1st item in makefile target
   drivers/gpu/drm/drm_crtc_helper.c - random pick iirc.

Since these modules all use identical CLASSMAP declarations (ie: names
and .class_id's) they will all respond together to "class DRM_UT_*"
query-commands:

  :#> echo class DRM_UT_KMS +p > /proc/dynamic_debug/control

NOTES:

This changes __drm_debug from int to ulong, so BIT() is usable on it.

DRM's enum drm_debug_category values need to sync with the index of
their respective class-names here.  Then .class_id == category, and
dyndbg's class FOO mechanisms will enable drm_dbg(DRM_UT_KMS, ...).

Though DRM needs consistent categories across all modules, thats not
generally needed; modules X and Y could define FOO differently (ie a
different NAME => class_id mapping), changes are made according to
each module's private class-map.

No callsites are actually selected by this patch, since none are
class'd yet.

Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Link: https://lore.kernel.org/r/20220912052852.1123868-3-jim.cromie@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-24 15:02:01 +02:00
Arunpravin Paneer Selvam
39dd0cc2e5 drm/amdgpu: Fix VRAM eviction issue
A user reported that when he starts a game (MTGA) with wine,
he observed an error msg failed to pin framebuffer with error -12.
Found an issue with the VRAM mem type eviction decision condition
logic. This patch will fix the if condition code error.

Gitlab bug link:
https://gitlab.freedesktop.org/drm/amd/-/issues/2159

Fixes: ded910f368 ("drm/amdgpu: Implement intersect/compatible functions")
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220922151447.265696-1-Arunpravin.PaneerSelvam@amd.com
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
2022-09-22 19:53:06 +02:00
Alex Deucher
abbc7a3daf drm/amdgpu: don't register a dirty callback for non-atomic
Some asics still support non-atomic code paths.

Fixes: 66f99628eb ("drm/amdgpu: use dirty framebuffer helper")
Reported-by: Arthur Marsh <arthur.marsh@internode.on.net>
Reviewed-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-21 17:36:43 -04:00
Mukul Joshi
37a0bad677 drm/amdgpu: Update PTE flags with TF enabled
This patch updates the PTE flags when translate further (TF) is
enabled:
- With translate_further enabled, invalid PTEs can be 0. Reading
  consecutive invalid PTEs as 0 is considered a fault. To prevent
  this, ensure invalid PTEs have at least 1 bit set.
- The current invalid PTE flags settings to translate a retry fault
  into a no-retry fault, doesn't work with TF enabled. As a result,
  update invalid PTE flags settings which works for both TF enabled
  and disabled case.

Fixes: 352e683b72 ("drm/amdgpu: Enable translate_further to extend UTCL2 reach")
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-21 17:14:09 -04:00
Hawking Zhang
435d6e6f02 drm/amdgpu: add helper to init rlc fw in header v2_4
To initialize rlc firmware in header v2_4

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-21 15:26:27 -04:00
Hawking Zhang
a0d9084d7f drm/amdgpu: add helper to init rlc fw in header v2_3
To initialize rlc firmware in header v2_3

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-21 15:26:21 -04:00
Hawking Zhang
a97d0ec8bb drm/amdgpu: add helper to init rlc fw in header v2_2
To initialize rlc firmware in header v2_2

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-21 15:26:15 -04:00
Hawking Zhang
f3e6173b4b drm/amdgpu: add helper to init rlc fw in header v2_1
To initialize rlc firmware in header v2_1

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-21 15:26:07 -04:00
Hawking Zhang
0641dbefd4 drm/amdgpu: add helper to init rlc fw in header v2_0
To initialize rlc firmware in header v2_0

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-21 15:26:01 -04:00
Philip Yang
3e43b760c9 drm/amdgpu: Fix amdgpu_vm_pt_free warning
Free page table BO from vm resv unlocked context generate below
warnings.

Add a pt_free_work in vm to free page table BO from vm->pt_freed list.
pass vm resv unlock status from page table update caller, and add vm_bo
entry to vm->pt_freed list and schedule the pt_free_work if calling with
vm resv unlocked.

WARNING: CPU: 12 PID: 3238 at
drivers/gpu/drm/ttm/ttm_bo.c:106 ttm_bo_set_bulk_move+0xa1/0xc0
Call Trace:
 amdgpu_vm_pt_free+0x42/0xd0 [amdgpu]
 amdgpu_vm_pt_free_dfs+0xb3/0xf0 [amdgpu]
 amdgpu_vm_ptes_update+0x52d/0x850 [amdgpu]
 amdgpu_vm_update_range+0x2a6/0x640 [amdgpu]
 svm_range_unmap_from_gpus+0x110/0x300 [amdgpu]
 svm_range_cpu_invalidate_pagetables+0x535/0x600 [amdgpu]
 __mmu_notifier_invalidate_range_start+0x1cd/0x230
 unmap_vmas+0x9d/0x140
 unmap_region+0xa8/0x110

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-21 15:25:52 -04:00
Philip Yang
c2dbd69e7b drm/amdgpu: Use vm status_lock to protect pt free
Use vm_status_lock to protect all vm_status state transitions to allow
them to happen without a reservation lock in unlocked page table
updates.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-21 15:25:46 -04:00
Philip Yang
757eb2bedd drm/amdgpu: Use vm status_lock to protect vm evicted list
Use vm_status_lock to protect all vm_status state transitions to allow
them to happen without a reservation lock in unlocked page table
updates.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-21 15:25:40 -04:00
Philip Yang
998debbdc8 drm/amdgpu: Use vm status_lock to protect vm moved list
Use vm_status_lock to protect all vm_status state transitions to allow
them to happen without a reservation lock in unlocked page table
updates.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-21 15:25:35 -04:00
Philip Yang
c1806d78ec drm/amdgpu: Use vm status_lock to protect vm idle list
Use vm_status_lock to protect all vm_status state transitions to allow
them to happen without a reservation lock in unlocked page table
updates.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-21 15:25:29 -04:00
Philip Yang
b38e77cb7b drm/amdgpu: Use vm status_lock to protect relocated list
Use vm_status_lock to protect all vm_status state transitions to allow
them to happen without a reservation lock in unlocked page table
updates.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-21 15:25:24 -04:00
Philip Yang
0479956c94 drm/amdgpu: Rename vm invalidate lock to status_lock
The vm status_lock will be used to protect all vm status lists.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-21 15:24:06 -04:00
hongao
4bb71fce58 drm/amdgpu: fix initial connector audio value
This got lost somewhere along the way, This fixes
audio not working until set_property was called.

Signed-off-by: hongao <hongao@uniontech.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-20 12:42:35 -04:00
Alex Deucher
e9127f5e8f drm/amdgpu: don't register a dirty callback for non-atomic
Some asics still support non-atomic code paths.

Fixes: 66f99628eb ("drm/amdgpu: use dirty framebuffer helper")
Reported-by: Arthur Marsh <arthur.marsh@internode.on.net>
Reviewed-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-20 12:42:20 -04:00
Christian König
6974340554 drm/amdgpu: bump minor for gang submit
Since that has now landed bump the minor to let userspace know about it.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-20 12:42:14 -04:00
Christian König
b091fc6f8e drm/amdgpu: properly initialize return value during CS
The return value is no longer initialized before the loop because of
moving code around.

Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: c2b08e7a6d ("drm/amdgpu: move entity selection and job init earlier during CS")
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-20 12:41:08 -04:00
Christian König
4624459c84 drm/amdgpu: add gang submit frontend v6
Allows submitting jobs as gang which needs to run on multiple engines at the
same time.

All members of the gang get the same implicit, explicit and VM dependencies. So
no gang member will start running until everything else is ready.

The last job is considered the gang leader (usually a submission to the GFX
ring) and used for signaling output dependencies.

Each job is remembered individually as user of a buffer object, so there is no
joining of work at the end.

v2: rebase and fix review comments from Andrey and Yogesh
v3: use READ instead of BOOKKEEP for now because of VM unmaps, set gang
    leader only when necessary
v4: fix order of pushing jobs and adding fences found by Trigger.
v5: fix job index calculation and adding IBs to jobs
v6: fix typo found by Alex

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-20 12:40:46 -04:00
Christian König
68ce8b2422 drm/amdgpu: add gang submit backend v2
Allows submitting jobs as gang which needs to run on multiple
engines at the same time.

Basic idea is that we have a global gang submit fence representing when the
gang leader is finally pushed to run on the hardware last.

Jobs submitted as gang are never re-submitted in case of a GPU reset since this
won't work and will just deadlock the hardware immediately again.

v2: fix logic inversion, improve documentation, fix rcu

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-20 12:40:32 -04:00
Christian König
c05d789fed drm/amdgpu: cleanup instance limit on VCN4 v4
Similar to what we did for VCN3 use the job instead of the parser
entity. Cleanup the coding style quite a bit as well.

v2: merge improved application check into this patch
v3: finally fix the check
v4: limit to the correct engine

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-20 12:40:05 -04:00
Christian König
d4423feeb7 drm/amdgpu: revert "fix limiting AV1 to the first instance on VCN3" v3
This reverts commit 250195ff74.

The job should now be initialized when we reach the parser functions.

v2: merge improved application check into this patch
v3: back to the original test, but use the right ring

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-19 15:18:28 -04:00
Christian König
c2b08e7a6d drm/amdgpu: move entity selection and job init earlier during CS
Initialize the entity for the CS and scheduler job much earlier.

v2: fix job initialisation order and use correct scheduler instance

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-19 15:18:23 -04:00
Christian König
4953b6b22a drm/amdgpu: cleanup error handling in amdgpu_cs_parser_bos
Return early on success and so remove all those "if (r)" in the error
path.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-19 15:18:16 -04:00
Christian König
f4b92fcd74 drm/amdgpu: cleanup CS pass2 v6
Cleanup the coding style and function names to represent the data
they process for pass2 as well.

Go over the chunks only twice now instead of multiple times.

v2: fix job initialisation order and use correct scheduler instance
v3: try to move all functional changes into a separate patch.
v4: separate reordering, pass1 and pass2 change
v5: fix va_start calculation
v6: fix user fence check

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-19 15:18:09 -04:00
YiPeng Chai
83d29a5f8a drm/amdgpu: Fixed psp fence and memory issues when removing amdgpu device
V3:
Fixed psp fence and memory issues for the asic
using smu v13_0_2 when removing amdgpu device.

[Why]:
1. psp_suspend->psp_free_shared_bufs->
       psp_ta_free_shared_buf->
           amdgpu_bo_free_kernel->
             ...->amdgpu_bo_release_notify->
                    amdgpu_fill_buffer
   psp will free vram memory used by psp when psp_suspend
   is called. But for the asic using smu v13_0_2, because
   psp_suspend is called before adev->shutdown is set to
   true when removing the first hive device, amdgpu fill_buffer
   will be called, which will cause fence issues when evicting
   all vram resources in amdgpu vram mgr_fini.
2. Since psp_hw_fini is not called after calling psp_suspend
   and psp_suspend only calls psp_ring_stop, the psp ring memory
   will not be released when amdgpu device is removed.

[How]:
1. Set shutdown to true before calling amdgpu_device_gpu_recover,
   then amdgpu_fill_buffer will not be called when psp_suspend is
   called.
2. Free psp ring memory in psp_sw_fini.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-19 15:17:47 -04:00
YiPeng Chai
f5c7e77970 drm/amdgpu: Adjust removal control flow for smu v13_0_2
Adjust removal control flow for smu v13_0_2:
   During amdgpu uninstallation, when removing the first
device, the kernel needs to first send a mode1reset message
to all gpu devices. Otherwise, smu initialization will fail
the next time amdgpu is installed.

V2:
1. Update commit comments.
2. Remove the global variable amdgpu_device_remove_cnt
   and add a variable to the structure amdgpu_hive_info.
3. Use hive to detect the first removed device instead of
   a global variable.

V3:
 1. Update commit comments.
 2. Split a patch into multiple patches.
 3. The current patch does:
    a. Add a work mode of AMDGPU_RESET_FOR_DEVICE_REMOVE into
       the existing gpu recover path, which make all devices
       in hive list only have HW reset but no resume (except
       the base IP).
    b. Call AMDGPU_RESET_FOR_DEVICE_REMOVE and
       AMDGPU_NEED_FULL_RESET mode of amdgpu_device_gpu_recover
       in amdgpu_pci_remove when removing the first device in
       hive list.
    c. When removing the first device, the IP blocks keyword
       function call sequence is as follows:
.suspend->mode1reset->.resume(basic ip)->.hw_fini->.early_fini->.sw_fini.
   ^                           |
   |-<----------<---------<----|
	The first three sequences are because of a call to
        amdgpu_device_gpu_recover. The three sequences will be
        executed in a loop until all devices in the hive list
        are iterated.
        The sequences starting from .hw_fini only apply to the
        first device. Since .suspend has been called before,
        except the resumed phase1 basic ip blocks, all other ip
        blocks .hw_fini of current device will do nothing.
     d. When removing other devices, the calling sequences is the
        same as legacy:
	   .hw_fini -> .early_fini -> .sw_fini.
	Since .suspend has been called when removing the first device,
        except the resumed phase1 basic ip blocks, all of other ip
        blocks .hw_fini of current device will do nothing.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-19 15:17:20 -04:00
Yifan Zhang
10faf07871 drm/amdgpu: add MES and MES-KIQ version in debugfs
This patch addes MES and MES-KIQ version in debugfs.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-19 15:10:04 -04:00
Yang Li
7f89f9973c drm/amd/display: clean up some inconsistent indentings
clean up some inconsistent indentings

Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2178
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-19 15:07:54 -04:00
Hawking Zhang
670c6edfbb drm/amdgpu: add rlcv/rlcp version info to debugfs
amdgpu_firmware_info debugfs will show rlcv/rlcp
ucode version info

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-19 15:07:40 -04:00
Hawking Zhang
d5c6ad7296 drm/amdgpu: support print rlc v2_x ucode hdr
add rlc v2_x support to print_rlc_hdr helper

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-19 15:07:30 -04:00
Hawking Zhang
ed2eee42d3 drm/amdgpu: save rlcv/rlcp ucode version in amdgpu_gfx
cache rlcv/rlcvp ucode version info in amdgpu_gfx
structure

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-19 15:07:24 -04:00
Mukul Joshi
876552e5d5 drm/amdgpu: Update PTE flags with TF enabled
This patch updates the PTE flags when translate further (TF) is
enabled:
- With translate_further enabled, invalid PTEs can be 0. Reading
  consecutive invalid PTEs as 0 is considered a fault. To prevent
  this, ensure invalid PTEs have at least 1 bit set.
- The current invalid PTE flags settings to translate a retry fault
  into a no-retry fault, doesn't work with TF enabled. As a result,
  update invalid PTE flags settings which works for both TF enabled
  and disabled case.

Fixes: 352e683b72 ("drm/amdgpu: Enable translate_further to extend UTCL2 reach")
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-19 15:06:13 -04:00