Commit Graph

15900 Commits

Author SHA1 Message Date
Sonny Jiang
346492f30c drm/amdgpu: Add VCN_5_0_1 support
Add vcn support for VCN_5_0_1

v2: rebase, squash in fixes (Alex)

Signed-off-by: Sonny Jiang <sonjiang@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:29:46 -05:00
Sathishkumar S
c406fca4b5 drm/amdgpu: enable JPEG5_0_1 ip block
enable JPEG5_0_1 ip block

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:29:42 -05:00
Sathishkumar S
b8f57b6994 drm/amdgpu: Add JPEG5_0_1 support
add support for JPEG5_0_1

v2: squash in updates, rebase on IP instance changes

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:29:24 -05:00
Sonny Jiang
4e4b1a1b80 drm/amdgpu: Add VCN_5_0_1 codec query
Support VCN_5_0_1 codec query

v2: squash in updates

Signed-off-by: Sonny Jiang <sonjiang@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:29:21 -05:00
Sonny Jiang
fdce10ff8f drm/amdgpu: Add VCN_5_0_1 firmware
Add vcn_5_0_1 firmware support

Signed-off-by: Sonny Jiang <sonjiang@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:29:18 -05:00
Sathishkumar S
20a3029227 drm/amdgpu: update macro for maximum jpeg rings
Update the macro to accomdate more rings.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Sonny Jiang <sonjiang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:29:13 -05:00
Alex Deucher
b1d0286c81 drm/amdgpu: update irq sec header for vcn 5.0.0
No functional change.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:29:01 -05:00
Alex Deucher
26893116c3 drm/amdgpu: update irq sec header for jpeg 5.0.0
No functional change.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:28:55 -05:00
Candice Li
33f1aa210a drm/amdgpu: Add umc v8_14 ras functions
Add umc v8_14 ras functions.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:28:39 -05:00
Candice Li
2c2b84f193 drm/amdgpu: Add psp v14_0_3 ras support
Add psp v14_0_3 ras support.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:28:21 -05:00
Srinivasan Shanmugam
55f4139b65 drm/amd/amdgpu: Add Annotations to Process Isolation functions
This update adds explanations to key functions that manage how the
Kernel Fusion Driver (KFD) and Kernel Graphics Driver (KGD) share the
GPU.

amdgpu_gfx_enforce_isolation_wait_for_kfd: Controls the waiting period
for KFD to ensure it takes turns with KGD in using the GPU. It uses a
mutex to safely manage shared data, like timing and state, and tracks
when KFD starts and stops waiting.

amdgpu_gfx_enforce_isolation_ring_begin_use: Ensures KFD has enough time
to run before new tasks are submitted to the GPU ring. It uses a mutex
to synchronize access and may adjust the KFD scheduler.

amdgpu_gfx_enforce_isolation_ring_end_use: Handles cleanup and state
updates when finishing the use of a GPU ring. It may also adjust the KFD
scheduler, using a mutex to manage shared data access.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:28:13 -05:00
Hawking Zhang
57bcfa89fe drm/amdgpu: Init mmhub v1_8_1 ras func
reuse mmhub v1_8 ras functuion

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:28:09 -05:00
Shiwu Zhang
bd18b11f2d drm/amdgpu: Enable xgmi for gfx v9_5_0
Enable xgmi for gfx v9_5_0

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:27:58 -05:00
Asad Kamal
f79cfbac5c drm/amdgpu: Fetch refclock for SMU v13.0.12
Add support to fetch refclock value for SMU v13.0.12

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:27:54 -05:00
Asad Kamal
100350c373 drm/amd/pm: Add mode2 support for SMU v13.0.12
Add mode2 reset support for smu version 13.0.12

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:27:50 -05:00
Srinivasan Shanmugam
a69f4cc278 drm/amd/amdgpu: Add Descriptions to Process Isolation and Cleaner Shader Sysfs Functions
This update adds explanations to key functions related to process
isolation and cleaner shader execution sysfs interfaces.

- `amdgpu_gfx_set_run_cleaner_shader`: Describes how to manually run a
  cleaner shader, which clears the Local Data Store (LDS) and General
  Purpose Registers (GPRs) to ensure data isolation between GPU workloads.

- `amdgpu_gfx_get_enforce_isolation`: Describes how to query the current
  settings of the 'enforce_isolation' feature for each GPU partition.

- `amdgpu_gfx_set_enforce_isolation`: Describes how to enable or disable
  process isolation for GPU partitions through the sysfs interface.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:27:31 -05:00
Hawking Zhang
9a826c4af8 drm/amdgpu: Enable RAS for psp v13_0_12
Enable RAS Cap check and initialize RAS funcs
for psp v13_0_12

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:27:28 -05:00
Hawking Zhang
98230feb55 drm/amdgpu: Load spdm_drv for psp v13_0_12
spdm_drv is a firmware that needs to be loaded
in driver initialization phase.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:27:23 -05:00
Hawking Zhang
3516d35f81 drm/amdgpu: Add psp v13_0_12 firmware specifiers
Add psp v13_0_12 firmware specifiers for sos and ta

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Shiwu Zhang <shiwu.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:27:19 -05:00
Le Ma
2d2f1622c8 drm/amdgpu: add psp 13_0_12 version support
Add support for new psp 13_0_12 version

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:27:08 -05:00
Mario Limonciello
b6e6871a56 drm/amd: Show an info message about optional firmware missing
With the warning from the core about missing firmware gone,
users still may be notified of missing optional firmware by
a more friendly message to clarify it's optional.

Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:26:52 -05:00
Yang Wang
2a50d94b11 drm/amdgpu: add ACA support for jpeg v4.0.3
Add ACA support for jpeg v4.0.3.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:26:52 -05:00
Yang Wang
3748c439bb drm/amdgpu: add ACA support for vcn v4.0.3
v1:
Add ACA support for vcn v4.0.3.

v2:
- split VCN ACA(v1) to 2 parts: vcn and jpeg.
- move mmSMNAID_AID0_MCA_SMU to amdgpu_aca.h file.

v3:
- split JPEG ACA to another patch.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:26:52 -05:00
Yang Wang
abfcf95607 drm/amdgpu: move common ACA ipid defines into amdgpu_aca.h
move common ACA ipid defines into amdgpu_aca.h file.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:26:52 -05:00
Alex Sierra
1a3d4abd54 drm/amdgpu: add ih cam support for IH 4.4.4
Same as IH 4.4.2.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:26:51 -05:00
Le Ma
968e3811c3 drm/amdgpu: add initial support for sdma444
add sdma444 basic support

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:26:51 -05:00
Lijo Lazar
fd0c6bd82d drm/amdgpu: Increase FRU File Id buffer size
Some boards use longer File Ids.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:26:51 -05:00
Tao Zhou
ae756cd853 drm/amdgpu: correct the calculation of RAS bad page
After the introduction of NPS RAS, one bad page record on eeprom may be
related to 1 or 16 bad pages, so the bad page record and bad page are
two different concepts, define a new variable to store bad page number.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:26:51 -05:00
Tao Zhou
1f06e7f344 drm/amdgpu: split ras_eeprom_init into init and check functions
Init function is for ras table header read and check function is
responsible for the validation of the header. Call them in different
stages.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:26:51 -05:00
Mario Limonciello
ea5d493498 drm/amd: Add the capability to mark certain firmware as "required"
Some of the firmware that is loaded by amdgpu is not actually required.
For example the ISP firmware on some SoCs is optional, and if it's not
present the ISP IP block just won't be initialized.

The firmware loader core however will show a warning when this happens
like this:
```
Direct firmware load for amdgpu/isp_4_1_0.bin failed with error -2
```

To avoid confusion for non-required firmware, adjust the amd-ucode helper
to take an extra argument indicating if the firmware is required or
optional.

On optional firmware use firmware_request_nowarn() instead of
request_firmware() to avoid the warnings.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/amd-gfx/df71d375-7abd-4b32-97ce-15e57846eed8@amd.com/T/#t
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:26:51 -05:00
Hawking Zhang
0ca6d97596 drm/amdgpu: Apply gc v9_5_0 golden settings
Apply gc v9_5_0 golden settings.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:26:51 -05:00
Alex Sierra
dad0c70507 drm/amd: update mtype flags for gfx 9.5.0
Update mtype flags to meet gfx 9.5.0 requirements for remote GPU
memory and system memory.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:26:50 -05:00
Le Ma
0b58a55af5 drm/amdgpu: add initial support for gfx950
add gfx950 basic support

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:26:50 -05:00
Le Ma
ebc7d1acf3 drm/amdgpu/gfx: add gfx950 microcode
Add firmware declarations.

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:26:50 -05:00
Alex Sierra
9bfe4caa4e drm/amd: define gc ip version local variable
For better readability. Also leftover orphaned code.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:26:50 -05:00
Lijo Lazar
3f1e050c99 drm/amdgpu: Remove gfxoff usage
GFXOFF is not valid for these IP versions. Also, SDMA v4.4.2 is not in
GFX domain.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:26:50 -05:00
Prike Liang
d2382f29ce drm/amdgpu: Avoid to release the FW twice in the validated error
There will to release the FW twice when the FW validated error.
Even if the release_firmware() will further validate the FW whether
is empty, but that will be redundant and inefficient.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:26:50 -05:00
Randy Dunlap
a567db808e drm/amdgpu: device: fix spellos and punctuation
Make spelling and punctuation changes to ease reading of the comments.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Xinhui Pan <Xinhui.Pan@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:26:50 -05:00
Sathishkumar S
de258d06fd drm/amdgpu: Add amdgpu_vcn_sched_mask debugfs
Add debugfs entry to enable or disable job submission to
specific vcn instances. The entry is created only when
there is more than an instance and is unified queue type.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:26:50 -05:00
Jinzhou Su
9db3aed8ea drm/amdgpu: return error when eeprom checksum failed
Return eeprom table checksum error result, otherwise
it might be overwritten by next call.

V2: replace DRM_ERROR with dev_err

Signed-off-by: Jinzhou Su <jinzhou.su@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:26:50 -05:00
Mario Limonciello
2965e6355d drm/amd: Add Suspend/Hibernate notification callback support
As part of the suspend sequence VRAM needs to be evicted on dGPUs.
In order to make suspend/resume more reliable we moved this into
the pmops prepare() callback so that the suspend sequence would fail
but the system could remain operational under high memory usage suspend.

Another class of issues exist though where due to memory fragementation
there isn't a large enough contiguous space and swap isn't accessible.

Add support for a suspend/hibernate notification callback that could
evict VRAM before tasks are frozen. This should allow paging out to swap
if necessary.

Link: https://github.com/ROCm/ROCK-Kernel-Driver/issues/174
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3476
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2362
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3781
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Link: https://lore.kernel.org/r/20241128032656.2090059-2-superm1@kernel.org
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:26:50 -05:00
Lijo Lazar
edd628ad17 drm/amdgpu: Simplify cleanup check for FRU sysfs
FRU info is expected to be non-NULL if FRU sys files are created.
Simplify the check.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:26:49 -05:00
Jinzhou Su
e1a34ed917 drm/amdgpu: Add secure display v2 command
Add secure display v2 command to support multiple ROI instances per
display.

v2: fix typo and coding style issue

Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Jinzhou Su <jinzhou.su@amd.com>
Reviewed-by: Lang Yu <lang.yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:26:49 -05:00
Mario Limonciello
c2ee5c2f0e drm/amd: Invert APU check for amdgpu_device_evict_resources()
Resource eviction isn't needed for s3 or s2idle on APUs, but should
be run for S4. As amdgpu_device_evict_resources() will be called
by prepare notifier adjust logic so that APUs only cover S4.

Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Link: https://lore.kernel.org/r/20241128032656.2090059-1-superm1@kernel.org
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:26:49 -05:00
Sunil Khatri
86fa54f349 drm/amdgpu: add "restore" missing variable comment
add "restore" missing variable in the fucntions
sdma_v4_4_2_page_resume and sdma_v4_4_2_inst_start.

This fixes the warning:
warning: Function parameter or struct member 'restore' not described in 'sdma_v4_4_2_page_resume'
warning: Function parameter or struct member 'restore' not described in 'sdma_v4_4_2_inst_start'

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:26:49 -05:00
Sunil Khatri
093bbeb994 drm/amdgpu: Update the variable name to dma_buf
Instead of fixing the warning for missing variable
its better to update the variable name to match
with the style followed in the code.

This will fix the below mentioned warning:
warning: Function parameter or struct member 'dbuf' not described in 'amdgpu_bo_create_isp_user'
warning: Excess function parameter 'dma_buf' description in 'amdgpu_bo_create_isp_user'

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:26:48 -05:00
Tao Zhou
ea8094abfb drm/amdgpu: set UMC PA per NPS mode when PA is 0
The shift bit of PA varys according to NPS mode due to
different address format.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:26:48 -05:00
Tao Zhou
d08fb66370 drm/amdgpu: remove is_mca_add for ras_add_bad_pages
Remove unnecessary variable and simplify the logic.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:26:48 -05:00
Tao Zhou
a8d133e625 drm/amdgpu: parse legacy RAS bad page mixed with new data in various NPS modes
All legacy RAS bad pages are generated in NPS1 mode, but new bad page
can be generated in any NPS mode, so we can't use retired_page stored
on eeprom directly in non-nps1 mode even for legacy data. We need to
take different actions for different data, new data can be identified
from old data by UMC_CHANNEL_IDX_V2 flag.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:26:48 -05:00
Shikang Fan
0859eb540f drm/amdgpu: Check fence emitted count to identify bad jobs
In SRIOV, when host driver performs MODE 1 reset and notifies FLR to
guest driver, there is a small chance that there is no job running on hw
but the driver has not updated the pending list yet, causing the driver
not respond the FLR request. Modify the has_job_running function to
make sure if there is still running job.

v2: Use amdgpu_fence_count_emitted to determine job running status.
v3: Remove the timeout wait in has_job_running

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Shikang Fan <shikang.fan@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:26:48 -05:00