Commit Graph

13998 Commits

Author SHA1 Message Date
Guilherme G. Piccoli
ee30b8001c drm/amdgpu/gfx11: Adjust gfxoff before powergating on gfx11 as well
(Bas: speculative change to mirror gfx10/gfx9)

Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:38:15 -04:00
Bas Nieuwenhuizen
a39b52c838 drm/amdgpu/gfx10: Disable gfxoff before disabling powergating.
Otherwise we get a full system lock (looks like a FW mess).

Copied the order from the GFX9 powergating code.

Fixes: 366468ff6c ("drm/amdgpu: Allow GfxOff on Vangogh as default")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2545
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:37:58 -04:00
Dan Carpenter
3fb9dd5fef drm/amdgpu: release correct lock in amdgpu_gfx_enable_kgq()
This function was releasing the incorrect lock on the error path.

Reported-by: kernel test robot <lkp@intel.com>
Fixes: 1156e1a60f ("drm/amdgpu: add [en/dis]able_kgq() functions")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:36:53 -04:00
Saleemkhan Jamadar
c488a9370d drm/amdgpu/jpeg: Remove harvest checking for JPEG3
Register CC_UVD_HARVESTING is obsolete for JPEG 3.1.2

Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:36:45 -04:00
Guchun Chen
d97b02bb9c drm/amdgpu/gfx: disable gfx9 cp_ecc_error_irq only when enabling legacy gfx ras
gfx9 cp_ecc_error_irq is only enabled when legacy gfx ras is assert.
So in gfx_v9_0_hw_fini, interrupt disablement for cp_ecc_error_irq
should be executed under such condition, otherwise, an amdgpu_irq_put
calltrace will occur.

[ 7283.170322] RIP: 0010:amdgpu_irq_put+0x45/0x70 [amdgpu]
[ 7283.170964] RSP: 0018:ffff9a5fc3967d00 EFLAGS: 00010246
[ 7283.170967] RAX: ffff98d88afd3040 RBX: ffff98d89da20000 RCX: 0000000000000000
[ 7283.170969] RDX: 0000000000000000 RSI: ffff98d89da2bef8 RDI: ffff98d89da20000
[ 7283.170971] RBP: ffff98d89da20000 R08: ffff98d89da2ca18 R09: 0000000000000006
[ 7283.170973] R10: ffffd5764243c008 R11: 0000000000000000 R12: 0000000000001050
[ 7283.170975] R13: ffff98d89da38978 R14: ffffffff999ae15a R15: ffff98d880130105
[ 7283.170978] FS:  0000000000000000(0000) GS:ffff98d996f00000(0000) knlGS:0000000000000000
[ 7283.170981] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7283.170983] CR2: 00000000f7a9d178 CR3: 00000001c42ea000 CR4: 00000000003506e0
[ 7283.170986] Call Trace:
[ 7283.170988]  <TASK>
[ 7283.170989]  gfx_v9_0_hw_fini+0x1c/0x6d0 [amdgpu]
[ 7283.171655]  amdgpu_device_ip_suspend_phase2+0x101/0x1a0 [amdgpu]
[ 7283.172245]  amdgpu_device_suspend+0x103/0x180 [amdgpu]
[ 7283.172823]  amdgpu_pmops_freeze+0x21/0x60 [amdgpu]
[ 7283.173412]  pci_pm_freeze+0x54/0xc0
[ 7283.173419]  ? __pfx_pci_pm_freeze+0x10/0x10
[ 7283.173425]  dpm_run_callback+0x98/0x200
[ 7283.173430]  __device_suspend+0x164/0x5f0

v2: drop gfx11 as it's fixed in a different solution by retiring cp_ecc_irq funcs(Hawking)

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2522
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:36:38 -04:00
Guchun Chen
eb4f01784e drm/amdgpu: disable sdma ecc irq only when sdma RAS is enabled in suspend
sdma_v4_0_ip is shared on a few asics, but in sdma_v4_0_hw_fini,
driver unconditionally disables ecc_irq which is only enabled on
those asics enabling sdma ecc. This will introduce a warning in
suspend cycle on those chips with sdma ip v4.0, while without
sdma ecc. So this patch correct this.

[ 7283.166354] RIP: 0010:amdgpu_irq_put+0x45/0x70 [amdgpu]
[ 7283.167001] RSP: 0018:ffff9a5fc3967d08 EFLAGS: 00010246
[ 7283.167019] RAX: ffff98d88afd3770 RBX: 0000000000000001 RCX: 0000000000000000
[ 7283.167023] RDX: 0000000000000000 RSI: ffff98d89da30390 RDI: ffff98d89da20000
[ 7283.167025] RBP: ffff98d89da20000 R08: 0000000000036838 R09: 0000000000000006
[ 7283.167028] R10: ffffd5764243c008 R11: 0000000000000000 R12: ffff98d89da30390
[ 7283.167030] R13: ffff98d89da38978 R14: ffffffff999ae15a R15: ffff98d880130105
[ 7283.167032] FS:  0000000000000000(0000) GS:ffff98d996f00000(0000) knlGS:0000000000000000
[ 7283.167036] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7283.167039] CR2: 00000000f7a9d178 CR3: 00000001c42ea000 CR4: 00000000003506e0
[ 7283.167041] Call Trace:
[ 7283.167046]  <TASK>
[ 7283.167048]  sdma_v4_0_hw_fini+0x38/0xa0 [amdgpu]
[ 7283.167704]  amdgpu_device_ip_suspend_phase2+0x101/0x1a0 [amdgpu]
[ 7283.168296]  amdgpu_device_suspend+0x103/0x180 [amdgpu]
[ 7283.168875]  amdgpu_pmops_freeze+0x21/0x60 [amdgpu]
[ 7283.169464]  pci_pm_freeze+0x54/0xc0

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2522
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:36:30 -04:00
Lin.Cao
4994d1f0a7 drm/amdgpu: Fix vram recover doesn't work after whole GPU reset (v2)
v1: Vmbo->shadow is used to back vram bo up when vram lost. So that we
should set shadow as vmbo->shadow to recover vmbo->bo
v2: Modify if(vmbo->shadow) shadow = vmbo->shadow as if(!vmbo->shadow)
continue;

Fixes: e18aaea733 ("drm/amdgpu: move shadow_list to amdgpu_bo_vm")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Lin.Cao <lincao12@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:35:30 -04:00
Horatio Zhang
2f48965bdc drm/amdgpu: drop gfx_v11_0_cp_ecc_error_irq_funcs
The gfx.cp_ecc_error_irq is retired in gfx11. In gfx_v11_0_hw_fini still
use amdgpu_irq_put to disable this interrupt, which caused the call trace
in this function.

[  102.873958] Call Trace:
[  102.873959]  <TASK>
[  102.873961]  gfx_v11_0_hw_fini+0x23/0x1e0 [amdgpu]
[  102.874019]  gfx_v11_0_suspend+0xe/0x20 [amdgpu]
[  102.874072]  amdgpu_device_ip_suspend_phase2+0x240/0x460 [amdgpu]
[  102.874122]  amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu]
[  102.874172]  amdgpu_device_pre_asic_reset+0xd9/0x490 [amdgpu]
[  102.874223]  amdgpu_device_gpu_recover.cold+0x548/0xce6 [amdgpu]
[  102.874321]  amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu]
[  102.874375]  process_one_work+0x21f/0x3f0
[  102.874377]  worker_thread+0x200/0x3e0
[  102.874378]  ? process_one_work+0x3f0/0x3f0
[  102.874379]  kthread+0xfd/0x130
[  102.874380]  ? kthread_complete_and_exit+0x20/0x20
[  102.874381]  ret_from_fork+0x22/0x30

v2:
- Handle umc and gfx ras cases in separated patch
- Retired the gfx_v11_0_cp_ecc_error_irq_funcs in gfx11

v3:
- Improve the subject and code comments
- Add judgment on gfx11 in the function of amdgpu_gfx_ras_late_init

v4:
- Drop the define of CP_ME1_PIPE_INST_ADDR_INTERVAL and
SET_ECC_ME_PIPE_STATE which using in gfx_v11_0_set_cp_ecc_error_state
- Check cp_ecc_error_irq.funcs rather than ip version for a more
sustainable life

v5:
- Simplify judgment conditions

Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:35:20 -04:00
YuBiao Wang
db5dcd476e drm/amdgpu: set default num_kcq to 2 under sriov
The number of kernel queues has impact on the latency under sriov
usecase. So to reduce the latency we set the default num_kcq = 2 under
sriov if not set manually.

Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:34:35 -04:00
Lang Yu
187916e6ed drm/amdgpu: install stub fence into potential unused fence pointers
When using cpu to update page tables, vm update fences are unused.
Install stub fence into these fence pointers instead of NULL
to avoid NULL dereference when calling dma_fence_wait() on them.

Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:34:15 -04:00
Jiapeng Chong
9e72813f69 drm/amdgpu: Remove the unused variable golden_settings_gc_9_4_3
Variable golden_settings_gc_9_4_3 is not effectively used, so delete it.

drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:48:38: warning: ‘golden_settings_gc_9_4_3’ defined but not used.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4877
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:34:03 -04:00
Felix Kuehling
b9274387bc drm/amdkfd: Don't trigger evictions unmapping dmabuf attachments
Don't move DMABuf attachments for PCIe P2P mappings to the SYSTEM domain
when unmapping. This avoids triggering eviction fences unnecessarily.
Instead do the move to SYSTEM and back to GTT when mapping these
attachments to ensure the SG table gets updated after evictions.

This may still trigger unnecessary evictions if user mode unmaps and
remaps the same BO. However, this is unlikely in real applications.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Eric Huang <jinhuieric.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:33:59 -04:00
Jiapeng Chong
1ffbc89c30 drm/amdgpu: remove unneeded semicolon
No functional modification involved.

./drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c:146:2-3: Unneeded semicolon.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4871
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:33:50 -04:00
Le Ma
4667fbe2f7 drm/amdgpu: do gfxhub init for all XCDs
Each XCD needs to do gfxhub init

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:33:47 -04:00
Hamza Mahfooz
c0c2742890 drm/amdgpu: fix an amdgpu_irq_put() issue in gmc_v9_0_hw_fini()
As made mention of in commit c56edea58c31 ("drm/amdgpu: fix
amdgpu_irq_put call trace in gmc_v10_0_hw_fini") and commit aa6ac247ed7d
("drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v11_0_hw_fini"). It
is meaningless to call amdgpu_irq_put() for gmc.ecc_irq. So, remove it
from gmc_v9_0_hw_fini().

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2522
Fixes: c8b5a95b57 ("drm/amdgpu: Fix desktop freezed after gpu-reset")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:33:03 -04:00
Dan Carpenter
f4409a2361 drm/amdgpu: unlock on error in gfx_v9_4_3_kiq_resume()
Smatch complains that we need to drop this lock before returning.

    drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:1838 gfx_v9_4_3_kiq_resume()
    warn: inconsistent returns 'ring->mqd_obj->tbo.base.resv'.

Fixes: 8630112969 ("drm/amdgpu: split gc v9_4_3 functionality from gc v9_0")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:32:27 -04:00
Dan Carpenter
7a6a2e59aa drm/amdgpu: unlock the correct lock in amdgpu_gfx_enable_kcq()
We changed which lock we are supposed to take but this error path
was accidentally over looked so it still drops the old lock.

Fixes: def799c659 ("drm/amdgpu: add multi-xcc support to amdgpu_gfx interfaces (v4)")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:31:46 -04:00
Alex Deucher
0e5f625157 drm/amdgpu: drop unused function
amdgpu_discovery_get_ip_version() has not been used since
commit c40bdfb2ff ("drm/amdgpu: fix incorrect VCN revision in SRIOV")
so drop it.

Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:31:31 -04:00
Alex Deucher
29551fd90e drm/amdgpu: drop invalid IP revision
This was already fixed and dropped in:
commit baf3f8f374 ("drm/amdgpu: handle SRIOV VCN revision parsing")
commit c40bdfb2ff ("drm/amdgpu: fix incorrect VCN revision in SRIOV")
But seems to have been accidently been left around in a merge.

Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:31:04 -04:00
Alex Deucher
1cfb4d6121 drm/amdgpu: put MQDs in VRAM
Reduces preemption latency.
Only enable this for gfx10 and 11 for now
to avoid changing behavior on gfx 8 and 9.

v2: move MES MQDs into VRAM as well (YuBiao)
v3: enable on gfx10, 11 only (Alex)
v4: minor style changes, document why gfx10/11 only (Alex)

Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:30:58 -04:00
Guchun Chen
967a66396e drm/amdgpu: drop redundant sched job cleanup when cs is aborted
Once command submission failed due to userptr invalidation in
amdgpu_cs_submit, legacy code will perform cleanup of scheduler
job. However, it's not needed at all, as former commit has integrated
job cleanup stuff into amdgpu_job_free. Otherwise, because of double
free, a NULL pointer dereference will occur in such scenario.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2457
Fixes: f7d66fb2ea ("drm/amdgpu: cleanup scheduler job initialization v2")
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:29:58 -04:00
Srinivasan Shanmugam
9e69018458 drm/amd/amdgpu: Fix errors & warnings in amdgpu _bios, _cs, _dma_buf, _fence.c
The following checkpatch errors & warning is removed.

ERROR: else should follow close brace '}'
ERROR: trailing statements should be on next line
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
WARNING: Possible repeated word: 'Fences'
WARNING: Missing a blank line after declarations
WARNING: braces {} are not necessary for single statement blocks
WARNING: Comparisons should place the constant on the right side of the test
WARNING: printk() should include KERN_<LEVEL> facility level

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:29:51 -04:00
Yifan Zhang
0e768043bf drm/amdgpu: set gfx9 onwards APU atomics support to be true
APUs w/ gfx9 onwards doesn't reply on PCIe atomics, rather
it is internal path w/ native atomic support. Set have_atomics_support
to true.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Lang Yu <lang.yu@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:29:48 -04:00
Alex Deucher
09d8a67912 drm/amdgpu/gfx11: always restore kcq/kgq MQDs
Always restore the MQD not just when we do a reset.
This allows us to move the MQD to VRAM if we want.

v2: always reset ring pointer as well (Christian)

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:29:44 -04:00
Thong Thai
c6fa6fe9eb drm/amdgpu/nv: update VCN 3 max HEVC encoding resolution
Update the maximum resolution reported for HEVC encoding on VCN 3
devices to reflect its 8K encoding capability.

v2: Also update the max height for H.264 encoding to match spec.
(Ruijing)

Signed-off-by: Thong Thai <thong.thai@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:29:39 -04:00
Alex Deucher
2dbaf83998 drm/amdgpu/gfx10: always restore kcq/kgq MQDs
Always restore the MQD not just when we do a reset.
This allows us to move the MQD to VRAM if we want.

v2: always reset ring pointer as well (Christian)

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:29:07 -04:00
Alex Deucher
45b54a7dd3 drm/amdgpu/gfx9: always restore kcq MQDs
Always restore the MQD not just when we do a reset.
This allows us to move the MQD to VRAM if we want.

v2: always reset ring pointer as well (Christian)

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:29:04 -04:00
Alex Deucher
42cdf6f687 drm/amdgpu/gfx8: always restore kcq MQDs
Always restore the MQD not just when we do a reset.
This allows us to move the MQD to VRAM if we want.

v2: always reset ring pointer as well (Christian)

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:28:58 -04:00
Alex Deucher
b848fe65f8 drm/amdgpu/gfx11: drop unused variable
Just check the return value directly.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:28:55 -04:00
Alex Deucher
d78e816a3d drm/amdgpu/gfx10: drop unused variable
Just check the return value directly.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:28:50 -04:00
Alex Deucher
83033f72a4 drm/amdgpu/gfx11: use generic [en/dis]able_kgq() helpers
And remove the duplicate local variants.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:28:39 -04:00
Alex Deucher
f39c25357f drm/amdgpu/gfx10: use generic [en/dis]able_kgq() helpers
And remove the duplicate local variants.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:28:35 -04:00
Alex Deucher
1156e1a60f drm/amdgpu: add [en/dis]able_kgq() functions
To replace the IP specific variants which are largely
duplicate.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:28:28 -04:00
Srinivasan Shanmugam
f14c8c3e1f drm/amd/amdgpu: Fix style problems in amdgpu_psp.c
Fix the following checkpatch warnings & error in amdgpu_psp.c

WARNING: Comparisons should place the constant on the right side of the test
WARNING: braces {} are not necessary for single statement blocks
WARNING: please, no space before tabs
WARNING: braces {} are not necessary for single statement blocks
ERROR: that open brace { should be on the previous line

Suggested-by: Christian König <christian.koenig@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:27:08 -04:00
Alex Deucher
edacf33357 drm/amdgpu/gfx10: drop old bring up code
No longer used.  Remove it.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:27:06 -04:00
Alex Deucher
6ae869b9b6 drm/amdgpu/gfx11: drop old bring up code
No longer used.  Remove it.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:26:50 -04:00
Srinivasan Shanmugam
8fa7635058 drm/amd/amdgpu: Fix style problems in amdgpu_debugfs.c
Fix the following issues reported by checkpatch:

WARNING: please, no space before tabs
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
WARNING: sizeof *rd should be sizeof(*rd)
WARNING: Missing a blank line after declarations
WARNING: sizeof rd->id should be sizeof(rd->id)
WARNING: static const char * array should probably be static const char * const
WARNING: Symbolic permissions 'S_IRUGO' are not preferred. Consider using octal permissions '0444'.
WARNING: Prefer seq_puts to seq_printf
ERROR: space prohibited after that open parenthesis '('

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:25:54 -04:00
YuBiao Wang
d446127107 drm/amdgpu: Enable mcbp under sriov by default
Enable mcbp under sriov by default. Asics with soc21 supports mcbp now
so we should set it enabled.

Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com>
Reviewed-by: Horace Chen <horace.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:25:50 -04:00
Xiaomeng Hou
04b3c34f5c drm/amdgpu: remove pasid_src field from IV entry
PASID_SRC is not actually present in the Interrupt Packet, the field is
taken as reserved bits now. So remove it from IV entry to avoid misuse.

Signed-off-by: Xiaomeng Hou <Xiaomeng.Hou@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:25:47 -04:00
Srinivasan Shanmugam
a6f7baa387 drm/amd/amdgpu: Simplify switch case statements in amdgpu_connectors.c
Fix the following checkpatch errors:

ERROR: trailing statements should be on next line
ERROR: space required after that ',' (ctx:VxV)
ERROR: code indent should use tabs where possible

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:25:43 -04:00
Horace Chen
9f58341d63 drm/amdgpu: disable SDMA WPTR_POLL_ENABLE for SR-IOV
[Why]
This WPTR_POLL_ENABLE is a hardware contigious polling which will cause
FCLK and UCLK to keep on a high level.
Mostly its case can be covered by F32_WPTR_POLL_ENABLE which polls by
firmware.
So to save power, SR-IOV also needs to disable this bit

Signed-off-by: Horace Chen <horace.chen@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:25:40 -04:00
Stanley.Yang
64e2e71737 drm/amdgpu: Add SDMA_UTCL1_WR_FIFO_SED field for sdma_v4_4_ras_field
Query sdma_utcl1_wr_fifo_sed fiel to detect UTCL1_WR_FIFO SED error counts

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:25:37 -04:00
Chia-I Wu
514987a5bc drm/amdgpu: add a missing lock for AMDGPU_SCHED
mgr->ctx_handles should be protected by mgr->lock.

v2: improve commit message
v3: add a Fixes tag

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Fixes: 52c6a62c64 ("drm/amdgpu: add interface for editing a foreign process's priority v3")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:25:02 -04:00
Mukul Joshi
27fb73a0e3 drm/amdkfd: Update KFD TTM mem limit
Use the helper function in TTM to get TTM memory
limit and set KFD's internal mem limit. This ensures
that KFD's TTM mem limit and actual TTM mem limit are
exactly same.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:24:52 -04:00
Mukul Joshi
f1f6f48a33 drm/amdgpu: Set GTT size equal to TTM mem limit
Use the helper function in TTM to get TTM mem limit and
set GTT size to be equal to TTL mem limit.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:24:46 -04:00
Guchun Chen
3c4f6507ab drm/amdgpu: mark gfx_v9_4_3_disable_gpa_mode() static
This was left global by accident, the corresponding functions for other hardware types are already static:

drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:1072:6: error: no previous prototype for function 'gfx_v9_4_3_disable_gpa_mode' [-Werror,-Wmissing-prototypes]

Fixes: 8630112969 ("drm/amdgpu: split gc v9_4_3 functionality from gc v9_0")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:24:11 -04:00
Guchun Chen
34305ac364 drm/amdgpu: check correct allocated mqd_backup object after alloc
Instead of the default one, check the right mqd_backup object.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Cc: Le Ma <le.ma@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:24:07 -04:00
Guchun Chen
7a4685cdfb drm/amdgpu: fix a build warning by a typo in amdgpu_gfx.c
This should be a typo when intruducing multi-xx support.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Cc: Le Ma <le.ma@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:24:02 -04:00
Horatio Zhang
c953cf0406 drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v10_0_hw_fini
The gmc.ecc_irq is enabled by firmware per IFWI setting,
and the host driver is not privileged to enable/disable
the interrupt. So, it is meaningless to use the amdgpu_irq_put
function in gmc_v10_0_hw_fini, which also leads to the call
trace.

[   82.340264] Call Trace:
[   82.340265]  <TASK>
[   82.340269]  gmc_v10_0_hw_fini+0x83/0xa0 [amdgpu]
[   82.340447]  gmc_v10_0_suspend+0xe/0x20 [amdgpu]
[   82.340623]  amdgpu_device_ip_suspend_phase2+0x127/0x1c0 [amdgpu]
[   82.340789]  amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu]
[   82.340955]  amdgpu_device_pre_asic_reset+0xdd/0x2b0 [amdgpu]
[   82.341122]  amdgpu_device_gpu_recover.cold+0x4dd/0xbb2 [amdgpu]
[   82.341359]  amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu]
[   82.341529]  process_one_work+0x21d/0x3f0
[   82.341535]  worker_thread+0x1fa/0x3c0
[   82.341538]  ? process_one_work+0x3f0/0x3f0
[   82.341540]  kthread+0xff/0x130
[   82.341544]  ? kthread_complete_and_exit+0x20/0x20
[   82.341547]  ret_from_fork+0x22/0x30

Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:23:13 -04:00
Horatio Zhang
7e5b601008 drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v11_0_hw_fini
The gmc.ecc_irq is enabled by firmware per IFWI setting,
and the host driver is not privileged to enable/disable
the interrupt. So, it is meaningless to use the amdgpu_irq_put
function in gmc_v11_0_hw_fini, which also leads to the call
trace.

[  102.980303] Call Trace:
[  102.980303]  <TASK>
[  102.980304]  gmc_v11_0_hw_fini+0x54/0x90 [amdgpu]
[  102.980357]  gmc_v11_0_suspend+0xe/0x20 [amdgpu]
[  102.980409]  amdgpu_device_ip_suspend_phase2+0x240/0x460 [amdgpu]
[  102.980459]  amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu]
[  102.980520]  amdgpu_device_pre_asic_reset+0xd9/0x490 [amdgpu]
[  102.980573]  amdgpu_device_gpu_recover.cold+0x548/0xce6 [amdgpu]
[  102.980687]  amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu]
[  102.980740]  process_one_work+0x21f/0x3f0
[  102.980741]  worker_thread+0x200/0x3e0
[  102.980742]  ? process_one_work+0x3f0/0x3f0
[  102.980743]  kthread+0xfd/0x130
[  102.980743]  ? kthread_complete_and_exit+0x20/0x20
[  102.980744]  ret_from_fork+0x22/0x30

Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:23:00 -04:00