linux

mirror of https://github.com/raspberrypi/linux.git synced 2025-12-20 16:52:28 +00:00

Author	SHA1	Message	Date
Lijo Lazar	fd0c6bd82d	drm/amdgpu: Increase FRU File Id buffer size Some boards use longer File Ids. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:51 -05:00
Tao Zhou	ae756cd853	drm/amdgpu: correct the calculation of RAS bad page After the introduction of NPS RAS, one bad page record on eeprom may be related to 1 or 16 bad pages, so the bad page record and bad page are two different concepts, define a new variable to store bad page number. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:51 -05:00
Tao Zhou	1f06e7f344	drm/amdgpu: split ras_eeprom_init into init and check functions Init function is for ras table header read and check function is responsible for the validation of the header. Call them in different stages. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:51 -05:00
Mario Limonciello	ea5d493498	drm/amd: Add the capability to mark certain firmware as "required" Some of the firmware that is loaded by amdgpu is not actually required. For example the ISP firmware on some SoCs is optional, and if it's not present the ISP IP block just won't be initialized. The firmware loader core however will show a warning when this happens like this: ``` Direct firmware load for amdgpu/isp_4_1_0.bin failed with error -2 ``` To avoid confusion for non-required firmware, adjust the amd-ucode helper to take an extra argument indicating if the firmware is required or optional. On optional firmware use firmware_request_nowarn() instead of request_firmware() to avoid the warnings. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/amd-gfx/df71d375-7abd-4b32-97ce-15e57846eed8@amd.com/T/#t Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:51 -05:00
Hawking Zhang	0ca6d97596	drm/amdgpu: Apply gc v9_5_0 golden settings Apply gc v9_5_0 golden settings. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:51 -05:00
Alex Sierra	dad0c70507	drm/amd: update mtype flags for gfx 9.5.0 Update mtype flags to meet gfx 9.5.0 requirements for remote GPU memory and system memory. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:50 -05:00
Le Ma	0b58a55af5	drm/amdgpu: add initial support for gfx950 add gfx950 basic support Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:50 -05:00
Le Ma	ebc7d1acf3	drm/amdgpu/gfx: add gfx950 microcode Add firmware declarations. Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:50 -05:00
Alex Sierra	9bfe4caa4e	drm/amd: define gc ip version local variable For better readability. Also leftover orphaned code. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:50 -05:00
Lijo Lazar	3f1e050c99	drm/amdgpu: Remove gfxoff usage GFXOFF is not valid for these IP versions. Also, SDMA v4.4.2 is not in GFX domain. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:50 -05:00
Prike Liang	d2382f29ce	drm/amdgpu: Avoid to release the FW twice in the validated error There will to release the FW twice when the FW validated error. Even if the release_firmware() will further validate the FW whether is empty, but that will be redundant and inefficient. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:50 -05:00
Randy Dunlap	a567db808e	drm/amdgpu: device: fix spellos and punctuation Make spelling and punctuation changes to ease reading of the comments. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Xinhui Pan <Xinhui.Pan@amd.com> Cc: amd-gfx@lists.freedesktop.org Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:50 -05:00
Sathishkumar S	de258d06fd	drm/amdgpu: Add amdgpu_vcn_sched_mask debugfs Add debugfs entry to enable or disable job submission to specific vcn instances. The entry is created only when there is more than an instance and is unified queue type. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:50 -05:00
Jinzhou Su	9db3aed8ea	drm/amdgpu: return error when eeprom checksum failed Return eeprom table checksum error result, otherwise it might be overwritten by next call. V2: replace DRM_ERROR with dev_err Signed-off-by: Jinzhou Su <jinzhou.su@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:50 -05:00
Mario Limonciello	2965e6355d	drm/amd: Add Suspend/Hibernate notification callback support As part of the suspend sequence VRAM needs to be evicted on dGPUs. In order to make suspend/resume more reliable we moved this into the pmops prepare() callback so that the suspend sequence would fail but the system could remain operational under high memory usage suspend. Another class of issues exist though where due to memory fragementation there isn't a large enough contiguous space and swap isn't accessible. Add support for a suspend/hibernate notification callback that could evict VRAM before tasks are frozen. This should allow paging out to swap if necessary. Link: https://github.com/ROCm/ROCK-Kernel-Driver/issues/174 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3476 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2362 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3781 Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Link: https://lore.kernel.org/r/20241128032656.2090059-2-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:50 -05:00
Lijo Lazar	edd628ad17	drm/amdgpu: Simplify cleanup check for FRU sysfs FRU info is expected to be non-NULL if FRU sys files are created. Simplify the check. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:49 -05:00
Jinzhou Su	e1a34ed917	drm/amdgpu: Add secure display v2 command Add secure display v2 command to support multiple ROI instances per display. v2: fix typo and coding style issue Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Jinzhou Su <jinzhou.su@amd.com> Reviewed-by: Lang Yu <lang.yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:49 -05:00
Mario Limonciello	c2ee5c2f0e	drm/amd: Invert APU check for amdgpu_device_evict_resources() Resource eviction isn't needed for s3 or s2idle on APUs, but should be run for S4. As amdgpu_device_evict_resources() will be called by prepare notifier adjust logic so that APUs only cover S4. Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Link: https://lore.kernel.org/r/20241128032656.2090059-1-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:49 -05:00
Sunil Khatri	86fa54f349	drm/amdgpu: add "restore" missing variable comment add "restore" missing variable in the fucntions sdma_v4_4_2_page_resume and sdma_v4_4_2_inst_start. This fixes the warning: warning: Function parameter or struct member 'restore' not described in 'sdma_v4_4_2_page_resume' warning: Function parameter or struct member 'restore' not described in 'sdma_v4_4_2_inst_start' Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:49 -05:00
Sunil Khatri	093bbeb994	drm/amdgpu: Update the variable name to dma_buf Instead of fixing the warning for missing variable its better to update the variable name to match with the style followed in the code. This will fix the below mentioned warning: warning: Function parameter or struct member 'dbuf' not described in 'amdgpu_bo_create_isp_user' warning: Excess function parameter 'dma_buf' description in 'amdgpu_bo_create_isp_user' Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:48 -05:00
Tao Zhou	ea8094abfb	drm/amdgpu: set UMC PA per NPS mode when PA is 0 The shift bit of PA varys according to NPS mode due to different address format. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:48 -05:00
Tao Zhou	d08fb66370	drm/amdgpu: remove is_mca_add for ras_add_bad_pages Remove unnecessary variable and simplify the logic. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:48 -05:00
Tao Zhou	a8d133e625	drm/amdgpu: parse legacy RAS bad page mixed with new data in various NPS modes All legacy RAS bad pages are generated in NPS1 mode, but new bad page can be generated in any NPS mode, so we can't use retired_page stored on eeprom directly in non-nps1 mode even for legacy data. We need to take different actions for different data, new data can be identified from old data by UMC_CHANNEL_IDX_V2 flag. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:48 -05:00
Shikang Fan	0859eb540f	drm/amdgpu: Check fence emitted count to identify bad jobs In SRIOV, when host driver performs MODE 1 reset and notifies FLR to guest driver, there is a small chance that there is no job running on hw but the driver has not updated the pending list yet, causing the driver not respond the FLR request. Modify the has_job_running function to make sure if there is still running job. v2: Use amdgpu_fence_count_emitted to determine job running status. v3: Remove the timeout wait in has_job_running Signed-off-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Shikang Fan <shikang.fan@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:48 -05:00
Srinivasan Shanmugam	85b495bbbe	drm/amd/amdgpu/vcn: Fix kdoc entries for VCN clock/power gating functions This commit corrects the descriptors for the vcn_v4_0/v4_0_3/v4_0_5/v5_0_0 _set_clockgating_state and vcn_v4_0/v4_0_3/v4_0_5/v5_0_0 _set_powergating_state functions in the amdgpu driver. The parameter descriptors in the comments were mismatched with the actual function parameters. The non-existent 'handle' parameter has been replaced with the correct 'ip_block' parameter in the comments to accurately reflect the function signatures and to resolving the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:1232: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v5_0_0_set_clockgating_state' drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:1232: warning: Excess function parameter 'handle' description in 'vcn_v5_0_0_set_clockgating_state' drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:1263: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v5_0_0_set_powergating_state' drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:1263: warning: Excess function parameter 'handle' description in 'vcn_v5_0_0_set_powergating_state' drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c:2012: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v4_0_set_clockgating_state' drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c:2012: warning: Excess function parameter 'handle' description in 'vcn_v4_0_set_clockgating_state' drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c:2043: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v4_0_set_powergating_state' drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c:2043: warning: Excess function parameter 'handle' description in 'vcn_v4_0_set_powergating_state' drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c:1505: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v4_0_5_set_clockgating_state' drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c:1505: warning: Excess function parameter 'handle' description in 'vcn_v4_0_5_set_clockgating_state' drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c:1536: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v4_0_5_set_powergating_state' drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c:1536: warning: Excess function parameter 'handle' description in 'vcn_v4_0_5_set_powergating_state' drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c:1629: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v4_0_3_set_powergating_state' drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c:1629: warning: Excess function parameter 'handle' description in 'vcn_v4_0_3_set_powergating_state' Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:48 -05:00
Boyuan Zhang	cf1aa9ffd4	drm/amdgpu: move per inst variables to amdgpu_vcn_inst Move all per instance variables from amdgpu_vcn to amdgpu_vcn_inst. Move adev->vcn.fw[i] from amdgpu_vcn to amdgpu_vcn_inst. Move adev->vcn.vcn_config[i] from amdgpu_vcn to amdgpu_vcn_inst. Move adev->vcn.vcn_codec_disable_mask[i] from amdgpu_vcn to amdgpu_vcn_inst. Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:47 -05:00
Boyuan Zhang	f2ba8c3d51	drm/amdgpu: pass ip_block in set_clockgating_state Pass ip_block instead of adev in set_clockgating_state() callback functions. Modify set_clockgating_state()for all correspoding ip blocks. v2: remove all changes for is_idle(), remove type casting Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:47 -05:00
Boyuan Zhang	80d8051124	drm/amdgpu: pass ip_block in set_powergating_state Pass ip_block instead of adev in set_powergating_state callback function. Modify set_powergating_state ip functions for all correspoding ip blocks. v2: fix a ip block index error. v3: remove type casting Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:47 -05:00
Boyuan Zhang	393f026b16	drm/amdgpu: add inst to amdgpu_dpm_enable_vcn Add an instance parameter to amdgpu_dpm_enable_vcn() function, and change all calls from vcn ip functions to add instance argument. vcn generations with only one instance (v1.0, v2.0) always use 0 as instance number. vcn generations with multiple instances (v2.5, v3.0, v4.0, v4.0.3, v4.0.5, v5.0.0) use the actual instance number. v2: remove for-loop in amdgpu_dpm_enable_vcn(), and temporarily move it to vcn ip with multiple instances, in order to keep the exact same logic as before, until further separation in next patch. v3: fix missing prefix Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:47 -05:00
Boyuan Zhang	ff69bba05f	drm/amd/pm: add inst to dpm_set_powergating_by_smu Add an instance parameter to amdgpu_dpm_set_powergating_by_smu() function, and use the instance to call set_powergating_by_smu(). v2: remove duplicated functions. remove for-loop in amdgpu_dpm_set_powergating_by_smu(), and temporarily move it to amdgpu_dpm_enable_vcn(), in order to keep the exact same logic as before, until further separation in next patch. v3: drop SI logic in amdgpu_dpm_enable_vcn(). Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:47 -05:00
Tao Zhou	fcb600b078	drm/amdgpu: add interface to get die id from memory address And implement it for UMC v12_0. The die id is calculated from IPID register in bad page retirement flow, but we don't store it on eeprom and it can be also gotten from physical address. v2: get PA_C4 and PA_R13 from MCA address since they may be cleared in retired page. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:47 -05:00
Tao Zhou	2206daa1f9	drm/amdgpu: add a flag to indicate UMC channel index version v1 (legacy way): store channel index within a UMC instance in eeprom v2: store global channel index in eeprom V2: only save the flag on eeprom, clear it after saving. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:46 -05:00
Tao Zhou	71a0e96300	drm/amdgpu: save UMC global channel index to eeprom Save the global channel index returned by RAS TA to eeprom. We can get memory physical address by MCA address and channel index. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:46 -05:00
Tao Zhou	07dd49e1fc	drm/amdgpu: support to find RAS bad pages via old TA Old version of RAS TA doesn't support to convert MCA address stored on eeprom to physical address (PA), support to find all bad pages in one memory row by PA with old RAS TA. This approach is only suitable for nps1 mode. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:46 -05:00
Tao Zhou	b02ef40772	drm/amdgpu: add function to find all memory pages in one physical row And the function can be reused across amdgpu driver. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:46 -05:00
Tao Zhou	19d4b27aed	drm/amdgpu: retire RAS bad pages in different NPS modes There are some changes in format of memory normalized address per NPS mode, need to adjust bit mapping according to NPS mode. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:46 -05:00
Tao Zhou	c3d4acf0c3	drm/amdgpu: store only one RAS bad page record for all pages in one row So eeprom space can be saved, compatible with legacy way. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:46 -05:00
Lijo Lazar	e1ee2111ca	drm/amdgpu: Prefer RAS recovery for scheduler hang Before scheduling a recovery due to scheduler/job hang, check if a RAS error is detected. If so, choose RAS recovery to handle the situation. A scheduler/job hang could be the side effect of a RAS error. In such cases, it is required to go through the RAS error recovery process. A RAS error recovery process in certains cases also could avoid a full device device reset. An error state is maintained in RAS context to detect the block affected. Fatal Error state uses unused block id. Set the block id when error is detected. If the interrupt handler detected a poison error, it's not required to look for a fatal error. Skip fatal error checking in such cases. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:46 -05:00
Tao Zhou	0eecff79e4	drm/amdgpu: do RAS MCA2PA conversion in device init phase NPS mode is introduced, the value of memory physical address (PA) related to a MCA address varies per nps mode. We need to rely on MCA address and convert it into PA accroding to nps mode. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:46 -05:00
Tao Zhou	772df3df80	drm/amdgpu: add flag to indicate the type of RAS eeprom record One UMC MCA address could map to multiply physical address (PA): AMDGPU_RAS_EEPROM_REC_PA: one record store one PA AMDGPU_RAS_EEPROM_REC_MCA: one record store one MCA address, PA is not cared about Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:46 -05:00
Tao Zhou	95024c714b	drm/amdgpu: add TA_RAS_INV_NODE value We can set UMC node instance to invalid state if we use global channel index, and RAS TA can choose UMC address conversion approach by checking node_inst value. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:46 -05:00
Tao Zhou	f44a30583b	drm/amdgpu: add return value for convert_ras_err_addr So upper layer can return failure directly if address conversion fails. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:46 -05:00
Tao Zhou	76723fbc5f	drm/amdgpu: reduce memory usage for umc_lookup_bad_pages_in_a_row The function handles one page in one time, allocating umc.retire_unit bad page records is enough. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:46 -05:00
Tao Zhou	4e7812e237	drm/amdgpu: make convert_ras_err_addr visible outside UMC block And change some UMC v12 specific functions to generic version, so the code can be shared. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:45 -05:00
Tao Zhou	3d60a30c85	drm/amdgpu: store PA with column bits cleared for RAS bad page So the code can be simplified, and no need to expose the detail of PA format outside address conversion. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:45 -05:00
Prike Liang	66f4f7d5aa	drm/amdgpu: reduce the mmio writes in kiq setting There's no need to perform the two MMIO writes in the KIQ Setting registers programmed period, and reducing the MMIO writes will save the driver loading time. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:45 -05:00
Jiadong Zhu	52b10d55c1	drm/amdgpu/sdma4.4.2: implement ring reset callback for sdma4.4.2 Implement sdma queue reset callback via SMU interface. v2: Leverage inst_stop/start functions in reset sequence. Use GET_INST for physical SDMA instance. Disable apu for sdma reset. v3: Rephrase error prints. v4: Remove redundant prints. Remove setting PREEMPT registers as soft reset handles it. Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:45 -05:00
Tao Zhou	5c8baccc1e	drm/amdgpu: remove redundant RAS error address coversion code Only one interface is responsible for the conversion. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:45 -05:00
Pratap Nirujogi	ebbe34edc0	drm/amd/amdgpu: Add support for isp buffers Add support to create user BOs with MC address for isp using the dma-buf handle exported for the buffers allocated from system memory in isp driver. Export amdgpu_bo_create_kernel() and amdgpu_bo_free_kernel() as well for isp to allocate GTT internal buffers required for fw to run. Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:45 -05:00
Tao Zhou	150f6c9030	drm/amdgpu: simplify RAS page retirement in one memory row Take R13 and column bits as a whole for UMC v12. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:45 -05:00

... 17 18 19 20 21 ...

15974 Commits