Tao Zhou
b7674ae75b
drm/amdgu: get RAS retire flip bits for new type of HBM
...
Get RAS retire flip bits for HBM with different types in various NPS modes.
Also set flip row bit and MCA R13 bit in PA in different NPS modes.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2025-05-13 09:32:08 -04:00
Tao Zhou
9b5b71895b
drm/amdgpu: implement get_retire_flip_bits for UMC v12
...
The RAS bad page retire flip bits can be set per vram type,
vram vendor and NPS mode.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2025-05-13 09:32:05 -04:00
Tao Zhou
b695dd3bb8
drm/amdgpu: add loop bits for NPS2 page retirement
...
Support NPS2 RAS.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2025-04-08 16:48:14 -04:00
Tao Zhou
19d4b27aed
drm/amdgpu: retire RAS bad pages in different NPS modes
...
There are some changes in format of memory normalized address per
NPS mode, need to adjust bit mapping according to NPS mode.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-12-10 10:26:46 -05:00
Tao Zhou
3d60a30c85
drm/amdgpu: store PA with column bits cleared for RAS bad page
...
So the code can be simplified, and no need to expose the detail of PA
format outside address conversion.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-12-10 10:26:45 -05:00
Tao Zhou
150f6c9030
drm/amdgpu: simplify RAS page retirement in one memory row
...
Take R13 and column bits as a whole for UMC v12.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-12-10 10:26:45 -05:00
YiPeng Chai
56631dee29
drm/amdgpu: optimize logging deferred error info
...
1. Use pa_pfn as the radix-tree key index to log
deferred error info.
2. Use local array to store a row of bad pages.
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-07-23 17:32:14 -04:00
YiPeng Chai
b2aa6b108d
drm/amdgpu: umc v12_0 converts error address
...
Umc v12_0 converts error address.
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com >
Reviewed-by: Tao Zhou <tao.zhou1@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-04-26 17:22:41 -04:00
YiPeng Chai
95b4063de4
drm/amdgpu: add interface to update umc v12_0 ecc status
...
Add interface to update umc v12_0 ecc status.
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com >
Reviewed-by: Tao Zhou <tao.zhou1@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-04-26 17:22:41 -04:00
Tao Zhou
4b0cb230bd
drm/amdgpu: retire UMC v12 mca_addr_to_pa
...
RAS TA will handle it, the function is useless.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-04-09 22:09:15 -04:00
Candice Li
46e2231ce0
drm/amdgpu: Log deferred error separately
...
Separate deferred error from UE and CE and log it
individually.
Signed-off-by: Candice Li <candice.li@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-01-15 18:35:37 -05:00
YiPeng Chai
99cab331a4
drm/amdgpu: Add umc page retirement for umc v12_0
...
Add umc page retirement for umc v12_0.
V2:
1. Changed umc page retirement check condition
to call umc_v12_0_is_uncorrectable_error.
2. Use memset to clear the contents of the umc
error address structure.
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-12-19 14:59:03 -05:00
YiPeng Chai
a8c77a121c
drm/amdgpu: Add poison mode check error condition for umc v12_0
...
Add poison mode check error condition for umc v12_0.
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-12-19 14:59:03 -05:00
Yang Wang
bf13da6ae1
drm/amdgpu: correct smu v13.0.6 umc ras error check
...
correct smu v13.0.0 umc ras error check
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-11-09 17:01:20 -05:00
Tao Zhou
6205b558e1
drm/amdgpu: fix value of some UMC parameters for UMC v12
...
Prepare for bad page retirement.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-09-26 16:54:52 -04:00
Tao Zhou
3cb9ebc9d6
drm/amdgpu: add channel index table for UMC v12
...
Get UMC phyical channel index according to node id, umc instance and
channel instance.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-09-11 17:10:58 -04:00
Tao Zhou
40a08fe890
drm/amdgpu: add address conversion for UMC v12
...
Convert MCA error address to physical address and find out all pages in
one physical row.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-09-11 17:10:35 -04:00
Candice Li
7e6ec09974
drm/amdgpu: Add umc v12_0 ras functions
...
Add umc v12_0 ras error querying.
Signed-off-by: Candice Li <candice.li@amd.com >
Reviewed-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-09-06 14:38:00 -04:00