amdgpu_device_mode1_reset will return gpu mode1_reset
succeed (ret = 0) as long as wait_for_bootloader call
succeed, regardless of the status reported by smu or
psp firmware. This results to driver continue executing
recovery even smu or psp fail to perform mode1 reset.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[What]
Current SRIOV still using adev->clock.default_XX which gets from
atomfirmware. But these fields are abandoned in atomfirmware long ago.
Which may cause function to return a 0 value.
[How]
We don't need to check whether SR-IOV. For SR-IOV one-vf-mode,
pm is enabled and VF is able to read dpm clock
from pmfw, so we can use dpm clock interface directly. For
multi-VF mode, VF pm is disabled, so driver can just react as pm
disabled. One-vf-mode is introduced from GFX9 so it shall not have
any backward compatibility issue.
Signed-off-by: Horace Chen <horace.chen@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pull PCI updates from Bjorn Helgaas:
"Enumeration:
- Add locking to read/modify/write PCIe Capability Register accessors
for Link Control and Root Control
- Use pci_dev_id() when possible instead of manually composing ID
from dev->bus->number and dev->devfn
Resource management:
- Move prototypes for __weak sysfs resource files to linux/pci.h to
fix 'no previous prototype' warnings
- Make more I/O port accesses depend on HAS_IOPORT
- Use devm_platform_get_and_ioremap_resource() instead of open-coding
platform_get_resource() followed by devm_ioremap_resource()
Power management:
- Ensure devices are powered up while accessing VPD
- If device is powered-up, keep it that way while polling for PME
- Only read PCI_PM_CTRL register when available, to avoid reading the
wrong register and corrupting dev->current_state
Virtualization:
- Avoid Secondary Bus Reset on NVIDIA T4 GPUs
Error handling:
- Remove unused pci_disable_pcie_error_reporting()
- Unexport pci_enable_pcie_error_reporting(), used only by aer.c
- Unexport pcie_port_bus_type, used only by PCI core
VGA:
- Simplify and clean up typos in VGA arbiter
Apple PCIe controller driver:
- Initialize pcie->nvecs (number of available MSIs) before use
Broadcom iProc PCIe controller driver:
- Use of_property_read_bool() instead of low-level accessors for
boolean properties
Broadcom STB PCIe controller driver:
- Assert PERST# when probing BCM2711 because some bootloaders don't
do it
Freescale i.MX6 PCIe controller driver:
- Add .host_deinit() callback so we can clean up things like
regulators on probe failure or driver unload
Freescale Layerscape PCIe controller driver:
- Add support for link-down notification so the endpoint driver can
process LINK_DOWN events
- Add suspend/resume support, including manual
PME_Turn_off/PME_TO_Ack handshake
- Save Link Capabilities during probe so they can be restored when
handling a link-up event, since the controller loses the Link Width
and Link Speed values during reset
Intel VMD host bridge driver:
- Fix disable of bridge windows during domain reset; previously we
cleared the base/limit registers, which actually left the windows
enabled
Marvell MVEBU PCIe controller driver:
- Remove unused busn member
Microchip PolarFlare PCIe controller driver:
- Fix interrupt bit definitions so the SEC and DED interrupt handlers
work correctly
- Make driver buildable as a module
- Read FPGA MSI configuration parameters from hardware instead of
hard-coding them
Microsoft Hyper-V host bridge driver:
- To avoid a NULL pointer dereference, skip MSI restore after
hibernate if MSI/MSI-X hasn't been enabled
NVIDIA Tegra194 PCIe controller driver:
- Revert 'PCI: tegra194: Enable support for 256 Byte payload' because
Linux doesn't know how to reduce MPS from to 256 to 128 bytes for
endpoints below a switch (because other devices below the switch
might already be operating), which leads to 'Malformed TLP' errors
Qualcomm PCIe controller driver:
- Add DT and driver support for interconnect bandwidth voting for
'pcie-mem' and 'cpu-pcie' interconnects
- Fix broken SDX65 'compatible' DT property
- Configure controller so MHI bus master clock will be switched off
while in ASPM L1.x states
- Use alignment restriction from EPF core in EPF MHI driver
- Add Endpoint eDMA support
- Add MHI eDMA support
- Add Snapdragon SM8450 support to the EPF MHI driversupport
- Add MHI eDMA support
- Add Snapdragon SM8450 support to the EPF MHI driversupport
- Add MHI eDMA support
- Add Snapdragon SM8450 support to the EPF MHI driversupport
- Add MHI eDMA support
- Add Snapdragon SM8450 support to the EPF MHI driver
- Use iATU for EPF MHI transfers smaller than 4K to avoid eDMA setup
latency
- Add sa8775p DT binding and driver support
Rockchip PCIe controller driver:
- Use 64-bit mask on MSI 64-bit PCI address to avoid zeroing out the
upper 32 bits
SiFive FU740 PCIe controller driver:
- Set the supported number of MSI vectors so we can use all available
MSI interrupts
Synopsys DesignWare PCIe controller driver:
- Add generic dwc suspend/resume APIs (dw_pcie_suspend_noirq() and
dw_pcie_resume_noirq()) to be called by controller driver
suspend/resume ops, and a controller callback to send PME_Turn_Off
MicroSemi Switchtec management driver:
- Add support for PCIe Gen5 devices
Miscellaneous:
- Reorder and compress to reduce size of struct pci_dev
- Fix race in DOE destroy_work_on_stack()
- Add stubs to avoid casts between incompatible function types
- Explicitly include correct DT includes to untangle headers"
* tag 'pci-v6.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (96 commits)
PCI: qcom-ep: Add ICC bandwidth voting support
dt-bindings: PCI: qcom: ep: Add interconnects path
PCI: qcom-ep: Treat unknown IRQ events as an error
dt-bindings: PCI: qcom: Fix SDX65 compatible
PCI: endpoint: Add kernel-doc for pci_epc_mem_init() API
PCI: epf-mhi: Use iATU for small transfers
PCI: epf-mhi: Add support for SM8450
PCI: epf-mhi: Add eDMA support
PCI: qcom-ep: Add eDMA support
PCI: epf-mhi: Make use of the alignment restriction from EPF core
PCI/PM: Only read PCI_PM_CTRL register when available
PCI: qcom: Add support for sa8775p SoC
dt-bindings: PCI: qcom: Add sa8775p compatible
PCI: qcom-ep: Pass alignment restriction to the EPF core
PCI: Simplify pcie_capability_clear_and_set_word() control flow
PCI: Tidy config space save/restore messages
PCI: Fix code formatting inconsistencies
PCI: Fix typos in docs and comments
PCI: Fix pci_bus_resetable(), pci_slot_resetable() name typos
PCI: Simplify pci_dev_driver()
...
Don't assume that only the driver would be accessing LNKCTL. ASPM policy
changes can trigger write to LNKCTL outside of driver's control. And in
the case of upstream bridge, the driver does not even own the device it's
changing the registers for.
Use RMW capability accessors which do proper locking to avoid losing
concurrent updates to the register value.
Suggested-by: Lukas Wunner <lukas@wunner.de>
Fixes: a2e73f56fa ("drm/amdgpu: Add support for CIK parts")
Fixes: 62a3755341 ("drm/amdgpu: add si implementation v10")
Link: https://lore.kernel.org/r/20230717120503.15276-6-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
DCN 3.1.4 is reported to hang on s2idle entry if graphics activity
is happening during entry. This is because GFXOFF was scheduled as
delayed but RLC gets disabled in s2idle entry sequence which will
hang GFX IP if not already in GFXOFF.
To help this problem, flush any delayed work for GFXOFF early in
s2idle entry sequence to ensure that it's off when RLC is changed.
commit 4b31b92b14 ("drm/amdgpu: complete gfxoff allow signal during
suspend without delay") modified power gating flow so that if called
in s0ix that it ensured that GFXOFF wasn't put in work queue but
instead processed immediately.
This is dead code due to commit 10cb67eb8a ("drm/amdgpu: skip
CG/PG for gfx during S0ix") because GFXOFF will now not be explicitly
called as part of the suspend entry code. Remove that dead code.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
GFX v11.0.1 reported fence fallback timer expired issue on
SDMA and GFX rings after S0ix resume. This is generated by
EOP interrupts are disabled when S0ix suspend but fails to
re-enable when resume because of the GFX is in GFXOFF.
[ 203.349571] [drm] Fence fallback timer expired on ring sdma0
[ 203.349572] [drm] Fence fallback timer expired on ring gfx_0.0.0
[ 203.861635] [drm] Fence fallback timer expired on ring gfx_0.0.0
For S0ix, GFX is in GFXOFF state, avoid to touch the GFX registers
to configure the fence driver interrupts for rings that belong to GFX.
The interrupts configuration will be restored by GFXOFF exit.
Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Remove duplicated includes in amdgpu_amdkfd_gpuvm.c and amdgpu_ttm.c.
Resolves checkincludes message.
Signed-off-by: GUO Zihua <guozihua@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Allow the user to specify -2 as auto enabled with displays.
By default we don't enter runtime suspend when there are
displays attached because it does not work well in some
desktop environments due to the driver sending hotplug
events on resume in case any new displays were attached
while the GPU was powered down. Some users still want
this functionality though, so this lets you enable it.
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2428
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For sriov, doorbell index for vcn0 for AID needs to be on
32 byte boundary so we need to move the vcn end doorbell
Signed-off-by: Samir Dhume <samir.dhume@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
KFD currently relies on MEC FW to clear tcp watch control
register on UNMAP_PROCESS, but FW doesn't work on it,
which is a bug. So the solution is to clear the register
as gfx v9 in KFD.
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
PCI core API pci_dev_id() can be used to get the BDF number for a pci
device. We don't need to compose it mannually. Use pci_dev_id() to
simplify the code a little bit.
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes the following:
WARNING: function definition argument 'struct card_info *' should also have an identifier name
WARNING: function definition argument 'uint32_t' should also have an identifier name
WARNING: function definition argument 'void *' should also have an identifier name
WARNING: function definition argument 'struct atom_context *' should also have an identifier name
WARNING: function definition argument 'int' should also have an identifier name
WARNING: function definition argument 'uint32_t *' should also have an identifier name
WARNING: Unnecessary space before function pointer name
ERROR: space prohibited after that '*' (ctx:BxW)
CHECK: Prefer kernel type 'u32' over 'uint32_t'
Cc: Guchun Chen <guchun.chen@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mode1 reset needs to recover mp1 in fatal error case
for mp0 v13_0_10.
v2:
Define a macro to wrap psp function calls.
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
RAS global isr will only be invoked by hardware
interrupt. Don't need to query ras capability in isr
In addition, amdgpu_ras_interrupt_fatal_error_handler
ensures the isr won't be called from guest linux
side by accident. The RAS cap check in isr that
introduced to fix sriov crash is not needed any more
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The parameter amdgpu_mcbp shall have priority against the default value
calculated from the chip version.
User could disable mcbp by setting the parameter mcbp as zero.
v2: do not trigger preemption in sw ring muxer when mcbp is disabled.
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We need the domains in amdgpu_drm.h for the kernel driver to manage
the pool, but we don't want userspace using it until the code
is ready. So reject for now.
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use local64_try_cmpxchg instead of local64_cmpxchg (*ptr, old, new) == old
in amdgpu_perf_read. x86 CMPXCHG instruction returns success in ZF flag,
so this change saves a compare after cmpxchg (and related move instruction
in front of cmpxchg).
Also, try_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg
fails. There is no need to re-read the value in the loop.
No functional change intended.
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Need to move irq resume to the beginning of reset sriov, or if
one interrupt occurs before irq resume, then the irq won't work anymore.
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Register RAS fatal error interrupt and add handler.
v2: only register NBIO RAS for dGPU platform.
change nbio_v7_9_set_ras_controller_irq_state and nbio_v7_9_set_ras_err_event_athub_irq_state
to dummy functions.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This is only required for SR-IOV world switches, but it
adds additional latency leading to reduced performance in
some benchmarks. Disable for now on bare metal.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix the following errors reported by checkpatch:
ERROR: space required before the open brace '{'
ERROR: "foo * bar" should be "foo *bar"
ERROR: space required before the open parenthesis '('
ERROR: that open brace { should be on the previous line
Signed-off-by: Ran Sun <sunran001@208suo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>