Hawking Zhang
a6bcffa596
drm/amdgpu: Add smu v13_0_14 ip block
...
Add smu v13_0_14 ip block support
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com >
Reviewed-by: Le Ma <Le.Ma@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-05-02 15:49:11 -04:00
Yang Wang
f23558627f
drm/amdgpu: add new aca smu callback func parse_error_code()
...
add new aca smu callback parse_error_code{} to avoid specific asic check
in amdgpu_aca.c file
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: Tao Zhou <tao.zhou1@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-04-16 22:39:15 -04:00
Yang Wang
81d96e8b5a
drm/amdgpu: refine function signature of amdgpu_aca_get_error_data()
...
refine function signature of amdgpu_aca_get_error_data();
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: Tao Zhou <tao.zhou1@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-04-09 21:50:09 -04:00
Yang Wang
31fd330b97
drm/amdgpu: add ras event id support for ACA
...
add ras event id support for ACA.
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Reviewed-by: Tao Zhou <tao.zhou1@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-03-22 15:48:18 -04:00
Yang Wang
bd15bf742f
drm/amdgpu: avoid update aca bank multi times during ras isr
...
Because the UE Valid MCA count will only be cleared after reset,
in order to avoid repeated counting of the error count,
the aca bank is only updated once during ras isr.
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-03-22 15:48:11 -04:00
Yang Wang
865d339763
drm/amdgpu: add aca deferred error type support
...
add aca deferred error type support
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-03-20 13:38:15 -04:00
Yang Wang
e3d4de8d8b
drm/amdgpu: retire unused aca_bank_report data structure
...
retire unused aca_bank_report data structure.
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-03-20 13:38:15 -04:00
Yang Wang
949899cbac
drm/amdgpu: add new api to save error count into aca cache
...
add new api to save error count into aca cache.
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-03-20 13:38:14 -04:00
Yang Wang
abc3b5d21d
drm/amdgpu: add new aca_smu_type support
...
Add new types to distinguish between ACA error type and smu mca type.
e.g.:
the ACA_ERROR_TYPE_DEFERRED is not matched any smu mca valid bank
channel, so add new type 'aca_smu_type' to distinguish aca error type
and smu mca type.
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-03-20 13:38:14 -04:00
Yang Wang
788686e2d9
drm/amdgpu: use helper macro HW_ERR instead of Hardware error string
...
use helper macro HW_ERR to instead of Hardware error string.
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-01-29 15:47:02 -05:00
Yang Wang
c0c48f0d61
drm/amdgpu: adjust aca init/fini sequence to match gpu reset
...
- move aca init/fini function into ras init/fini to adapt gpu reset
sequence.
- add new function amdgpu_aca_reset()
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-01-25 14:58:02 -05:00
Yang Wang
6eb726a082
drm/amdgpu: add aca sysfs remove support
...
add aca sysfs remove support.
Fixes: 37973b69ea ("drm/amdgpu: add aca sysfs support")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-01-25 14:58:02 -05:00
Srinivasan Shanmugam
8d1717fb64
drm/amdgpu: Fix return type in 'aca_bank_hwip_is_matched()'
...
Change the return type of "if (!bank || type == ACA_HWIP_TYPE_UNKNOW)"
to be bool instead of int.
Fixes the below:
drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c:185 aca_bank_hwip_is_matched() warn: signedness bug returning '(-22)'
Fixes: f5e4cc8461 ("drm/amdgpu: implement RAS ACA driver framework")
Cc: Yang Wang <kevinyang.wang@amd.com >
Cc: Hawking Zhang <Hawking.Zhang@amd.com >
Cc: Christian König <christian.koenig@amd.com >
Cc: Alex Deucher <alexander.deucher@amd.com >
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com >
Reviewed-by: Yang Wang <kevinyang.wang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-01-25 14:47:03 -05:00
Yang Wang
37973b69ea
drm/amdgpu: add aca sysfs support
...
add aca sysfs node support
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-01-15 18:35:36 -05:00
Yang Wang
04c4fcd263
drm/amdgpu: add amdgpu ras aca query interface
...
v1:
add ACA error query interface
v2:
Add a new helper function to determine whether to use ACA or MCA.
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-01-15 18:35:36 -05:00
Yang Wang
33dcda51e9
drm/amdgpu: add ACA bank dump debugfs support
...
add ACA bank dump debugfs support
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-01-15 18:35:35 -05:00
Yang Wang
0599849c32
drm/amdgpu: add ACA kernel hardware error log support
...
add ACA kernel hardware error log support.
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-01-15 18:35:35 -05:00
Yang Wang
f5e4cc8461
drm/amdgpu: implement RAS ACA driver framework
...
v1:
implement new RAS ACA driver code framework.
v2:
- rename aca_bank_set to aca_banks.
- rename aca_source_xxx to aca_handle_xxx.
v3:
Optimize some function implementation details. (from Hawking's suggestion)
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-01-15 18:35:35 -05:00