Instead of waiting until the interrupt reaches GuC, we can grab a
forcewake while triggering the H2G interrupt. GEN11_GUC_HOST_INTERRUPT
is inside sgunit and is not affected by forcewakes. However, there
could be some delays when platform is entering/exiting some higher
level platform sleep states and a H2G is triggered. A forcewake
ensures those sleep states have been fully exited and further
processing occurs as expected. The hysteresis timers for C6 and
higher sleep states will ensure there is no unwanted race between the
wake and processing of the interrupts by GuC.
This will have an official WA soon so adding a FIXME in the comments.
v2: Make the new ranges watertight to address BAT failures and update
commit message (Matt R).
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240119193513.221730-1-vinay.belgaumkar@intel.com
UAPI Changes:
- (Build-time only, should not have any impact)
drm/i915/uapi: Replace fake flex-array with flexible-array member
"Zero-length arrays as fake flexible arrays are deprecated and we are
moving towards adopting C99 flexible-array members instead."
This is on core kernel request moving towards GCC 13.
Driver Changes:
- Fix context runtime accounting on sysfs fdinfo for heavy workloads (Tvrtko)
- Add support for OA media units on MTL (Umesh)
- Add new workarounds for Meteorlake (Daniele, Radhakrishna, Haridhar)
- Fix sysfs to read actual frequency for MTL and Gen6 and earlier
(Ashutosh)
- Synchronize i915/BIOS on C6 enabling on MTL (Vinay)
- Fix DMAR error noise due to GPU error capture (Andrej)
- Fix forcewake during BAR resize on discrete (Andrzej)
- Flush lmem contents after construction on discrete (Chris)
- Fix GuC loading timeout on systems where IFWI programs low boot
frequency (John)
- Fix race condition UAF in i915_perf_add_config_ioctl (Min)
- Sanitycheck MMIO access early in driver load and during forcewake
(Matt)
- Wakeref fixes for GuC RC error scenario and active VM tracking (Chris)
- Cancel HuC delayed load timer on reset (Daniele)
- Limit double GT reset to pre-MTL (Daniele)
- Use i915 instead of dev_priv insied the file_priv structure (Andi)
- Improve GuC load error reporting (John)
- Simplify VCS/BSD engine selection logic (Tvrtko)
- Perform uc late init after probe error injection (Andrzej)
- Fix format for perf_limit_reasons in debugfs (Vinay)
- Create per-gt debugfs files (Andi)
- Documentation and kerneldoc fixes (Nirmoy, Lee)
- Selftest improvements (Fei, Jonathan)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZC6APj/feB+jBf2d@jlahtine-mobl.ger.corp.intel.com
Driver Changes:
- Fix issue #6333: "list_add corruption" and full system lockup from
performance monitoring (Janusz)
- Give the punit time to settle before fatally failing (Aravind, Chris)
- Don't use stolen memory or BAR for ring buffers on LLC platforms (John)
- Add missing ecodes and correct timeline seqno on GuC error captures (John)
- Make sure DSM size has correct 1MiB granularity on Gen12+ (Nirmoy,
Lucas)
- Fix potential SSEU max_subslices array-index-out-of-bounds access on Gen11 (Andrea)
- Whitelist COMMON_SLICE_CHICKEN3 for UMD access on Gen12+ (Matt R.)
- Apply Wa_1408615072/Wa_1407596294 correctly on Gen11 (Matt R)
- Apply LNCF/LBCF workarounds correctly on XeHP SDV/PVC/DG2 (Matt R)
- Implement Wa_1606376872 for Xe_LP (Gustavo)
- Consider GSI offset when doing MCR lookups on Meteorlake+ (Matt R.)
- Add engine TLB invalidation for Meteorlake (Matt R.)
- Fix GSC Driver-FLR completion on Meteorlake (Alan)
- Fix GSC races on driver load/unload on Meteorlake+ (Daniele)
- Disable MC6 for MTL A step (Badal)
- Consolidate TLB invalidation flow (Tvrtko)
- Improve debug GuC/HuC debug messages (Michal Wa., John)
- Move fd_install after last use of fence (Rob)
- Initialize the obj flags for shmem objects (Aravind)
- Fix missing debug object activation (Nirmoy)
- Probe lmem before the stolen portion (Matt A)
- Improve clean up of GuC busyness stats worker (John)
- Fix missing return code checks in GuC submission init (John)
- Annotate two more workaround/tuning registers as MCR on PVC (Matt R)
- Fix GEN8_MISCCPCTL definition and remove unused INF_UNIT_LEVEL_CLKGATE (Lucas)
- Use sysfs_emit() and sysfs_emit_at() (Nirmoy)
- Make kobj_type structures constant (Thomas W.)
- make kobj attributes const on gt/ (Jani)
- Remove the unused virtualized start hack on buddy allocator (Matt A)
- Remove redundant check for DG1 (Lucas)
- Move DG2 tuning to the right function (Lucas)
- Rename dev_priv to i915 for private data naming consistency in gt/ (Andi)
- Remove unnecessary whitelisting of CS_CTX_TIMESTAMP on Xe_HP platforms (Matt R.)
-
- Escape wildcard in method names in kerneldoc (Bagas)
- Selftest improvements (Chris, Jonathan, Tvrtko, Anshuman, Tejas)
- Fix sparse warnings (Jani)
[airlied: fix unused variable in intel_workarounds]
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZBMSb42yjjzczRhj@jlahtine-mobl.ger.corp.intel.com
The Driver-FLR flow may inadvertently exit early before the full
completion of the re-init of the internal HW state if we only poll
GU_DEBUG Bit31 (polling for it to toggle from 0 -> 1). Instead
we need a two-step completion wait-for-completion flow that also
involves GU_CNTL. See the patch and new code comments for detail.
This is new direction from HW architecture folks.
v2: - Add error message for the teardown timeout (Anshuman)
- Don't duplicate code in comments (Jani)
v3: - Add get/put runtime-pm for this function. Though
not functionally required during unload, its so the uncore
doesn't complain.
v4: - Remove the get/put runtime-pm - that was for a prior
version of this patch (not needed for drm-managed callback).
- Remove the fixes tag since this is only for MTL and MTL
still needs force probe (Daniele).
- Bit 31 of GU_CNTL should be DRIVERFLR instead of
DRIVERFLR_STATUS (Daniele).
Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Tested-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230224001758.544817-1-alan.previn.teres.alexis@intel.com
drmm_add_action_or_reset() is unaware of __iomem and the pointer needs
to be a plain void *. Cast __iomem away and back while the pointer goes
through drmm.
drivers/gpu/drm/i915/intel_uncore.c:2463:17: warning: incorrect type in argument 1 (different address spaces)
drivers/gpu/drm/i915/intel_uncore.c:2463:17: expected void volatile [noderef] __iomem *addr
drivers/gpu/drm/i915/intel_uncore.c:2463:17: got void *regs
drivers/gpu/drm/i915/intel_uncore.c:2494:16: warning: incorrect type in argument 3 (different address spaces)
drivers/gpu/drm/i915/intel_uncore.c:2494:16: expected void *data
drivers/gpu/drm/i915/intel_uncore.c:2494:16: got void [noderef] __iomem *regs
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230207124026.2105442-2-jani.nikula@intel.com
If the GSC was loaded, the only way to stop it during the driver unload
flow is to do a driver-FLR.
The driver-initiated FLR is not the same as PCI config space FLR in
that it doesn't reset the SGUnit and doesn't modify the PCI config
space. Thus, it doesn't require a re-enumeration of the PCI BARs.
However, the driver-FLR does cause a memory wipe of graphics memory
on all discrete GPU platforms or a wipe limited to stolen memory
on the integrated GPU platforms.
We perform the FLR as the last action before releasing the MMIO bar, so
that we don't have to care about the consequences of the reset on the
unload flow.
v2: rename FLR function, add comment to explain FLR impact (Rodrigo),
better explain why GSC needs FLR (Alan)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221208200521.2928378-5-daniele.ceraolospurio@intel.com
GVT Changes:
- gvt-next stuff mostly with refactor for the new MDEV interface.
i915 Changes:
- PSR fixes and improvements (Jouni)
- DP DSC fixes (Vinod, Jouni)
- More general display cleanups (Jani)
- More display collor management cleanup targetting degamma (Ville)
- remove circ_buf.h includes (Jiri)
- wait power off delay at driver remove to optimize probe (Jani)
- More audio cleanup targeting the ELD precompute readout (Ville)
- Enable DC power states on all eDP ports (Imre)
- RPL-P stepping info (Matt Atwood)
- MTL enabling patches (RK)
- Removal of DG2 force_probe (Matt)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Y3f71obyEkImXoUF@intel.com
Turns out many of the files that need i915_reg.h get it implicitly via
{display/intel_de.h, gt/intel_context.h} -> i915_trace.h -> i915_irq.h
-> i915_reg.h. Since i915_trace.h doesn't actually need i915_irq.h,
makes sense to drop it, but that requires adding quite a few new
includes all over the place.
Prefer including i915_reg.h where needed instead of adding another
implicit include, because eventually we'll want to split up i915_reg.h
and only include the specific registers at each place.
Also some places actually needed i915_irq.h too.
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/6e78a2e0ac1bffaf5af3b5ccc21dff05e6518cef.1668008071.git.jani.nikula@intel.com
MTL has separate forcewake tables for the primary/render GT and the
media GT; each GT's intel_uncore will use a separate forcewake table and
should only initialize the domains that are relevant to that GT. The GT
ack register also moves to a new location of (GSI base + 0xDFC) on this
platform.
Note that although our uncore handlers take care of transparently
redirecting all register accesses in the media GT's GSI range to their
new offset at 0x380000, the forcewake ranges listed in the table should
use the final, post-translation offsets.
NOTE: There are two ranges in the media IP that have multicast
registers where the two register instances reside in different power
wells (either VD0 or VD2). We don't have an easy way to deal with this
today (and in fact we don't even access these register ranges in the
driver today), so for now we just mark those ranges as FORCEWAKE_ALL
which will cause all of the media power wells to be grabbed, ensuring
proper operation. If we start reading/writing in those ranges in the
future, we can re-visit whether it's worth adding extra steering
complexity into our forcewake support.
Bspec: 67788, 67789, 52077
Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220910001631.1986601-1-matthew.d.roper@intel.com
GT non-engine registers (referred to as "GSI" registers by the spec)
have the same relative offsets on standalone media as they do on the
primary GT, just with an additional "GSI offset" added to their MMIO
address. If we store this GSI offset in the standalone media's
intel_uncore structure, it can be automatically applied to all GSI reg
reads/writes that happen on that GT, allowing us to re-use our existing
GT code with minimal changes.
Forcewake and shadowed register tables for the media GT (which will be
added in a future patch) are listed as final addresses that already
include the GSI offset, so we also need to add the GSI offset before
doing lookups of registers in one of those tables.
v2:
- Add comment on raw_reg_*() macros explaining why we don't bother with
GSI offsets in them. (Daniele)
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220908224550.821257-1-matthew.d.roper@intel.com
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Unmapping of the MMIO range can be done as a DRM-managed action, which
will take care of the unmapping on device teardown and error paths.
This will also ensure proper ordering with respect to other DRM-managed
actions that we'll be using to clean up non-primary GTs in upcoming
patches.
We have not yet enabled any non-root GTs in the driver yet, so the
kfree() of the GT structure is effectively dead code. When we do start
enabling non-root GTs in upcoming patches, those are going to be using
DRM-managed allocations tied to the device lifetime, so we don't need to
explicitly free them (and kfree would be incorrect anyway).
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220906234934.3655440-5-matthew.d.roper@intel.com
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
The original intent of intel_uncore_mmio_debug as described in commit
0a9b26306d ("drm/i915: split out uncore_mmio_debug") was to be a
singleton structure that could be shared between multiple GTs' uncore
objects in a multi-tile system. Somehow we went off track and
started allocating separate instances of this structure for each GT,
which defeats that original goal.
But in reality, there isn't even a need to share the mmio_debug between
multiple GTs; on all modern platforms (i.e., everything after gen7)
unclaimed register accesses are something that can only be detected for
display registers. There's no point in grabbing the debug spinlock and
checking for unclaimed accesses on an uncore used by an xehpsdv or pvc
remote tile GT, or the uncore used by a mtl standalone media GT since
all of the display accesses go through the primary intel_uncore.
The simplest solution is to simply leave uncore->debug NULL on all
intel_uncore instances except for the primary one. This will allow us
to avoid the pointless debug spinlock acquisition we've been doing on
MMIO accesses coming in through these intel_uncores.
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220906234934.3655440-3-matthew.d.roper@intel.com
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Handling of multicast/replicated registers is spread across intel_gt.c
and intel_uncore.c today. As multicast handling and the related
steering logic gets more complicated with the addition of new platforms
and new rules it makes sense to centralize it all in one place.
For now the existing functions have been moved to the new .c/.h as-is.
Function renames and updates to operate in a more consistent manner will
be done in subsequent patches.
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220615001019.1821989-2-matthew.d.roper@intel.com
In order to get the GSC Support merged on drm-intel-gt-next
in a clean fashion we needed this ATS-M patch to avoid
conflict in i915_pci.c:
commit 412c942bdf ("drm/i915/ats-m: add ATS-M platform info")
--
Fixing a silent conflict on drivers/gpu/drm/i915/gt/intel_gt_gmch.c:
- if (!intel_vtd_active(i915))
+ if (!i915_vtd_active(i915))
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Since gen6 we use FPGA_DBG register to detect unclaimed MMIO registers.
This register is in the display engine IP and can only ever detect
unclaimed accesses to registers in this area. However sometimes there
are reports of this triggering for registers in other areas, which
should not be possible.
Right now we always warn after the read/write of registers going through
unclaimed_reg_debug(). However places using __raw_uncore_* may be
triggering the unclaimed access and those being later accounted to a
different register. Let's warn both before and after the read/write
with a slightly different message, so it's clear if the register
reported in the warning is actually the culprit.
Commit dda960335e ("drm/i915: Just clear the mmiodebug before a
register access") attempted to solve the same issue by removing the
warning when if FPGA_DBG flags before the mmio read/write. However, it
doesn't solve it completely as FPGA_DBG may remain set when reading
registers outside display. So in the end the check after the mmio
read/write triggers the warning pointing to the wrong register.
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220405001149.2675226-1-lucas.demarchi@intel.com
Catch-up with 5.17-rc2 and trying to align with drm-intel-gt-next
for a possible topic branch for merging the split of i915_regs...
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
We need to flush TLBs before releasing backing store otherwise userspace
is able to encounter stale entries if a) it is not declaring access to
certain buffers and b) it races with the backing store release from a
such undeclared execution already executing on the GPU in parallel.
The approach taken is to mark any buffer objects which were ever bound
to the GPU and to trigger a serialized TLB flush when their backing
store is released.
Alternatively the flushing could be done on VMA unbind, at which point
we would be able to ascertain whether there is potential a parallel GPU
execution (which could race), but essentially it boils down to paying
the cost of TLB flushes potentially needlessly at VMA unbind time (when
the backing store is not known to be going away so not needed for
safety), versus potentially needlessly at backing store relase time
(since we at that point cannot tell whether there is anything executing
on the GPU which uses that object).
Thereforce simplicity of implementation has been chosen for now with
scope to benchmark and refine later as required.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reported-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Dave Airlie <airlied@redhat.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Driver Changes:
- Added bits of DG2 support around page table handling (Stuart Summers, Matthew Auld)
- Fixed wakeref leak in PMU busyness during reset in GuC mode (Umesh Nerlige Ramappa)
- Fixed debugfs access crash if GuC failed to load (John Harrison)
- Bring back GuC error log to error capture, undoing accidental earlier breakage (Thomas Hellström)
- Fixed memory leak in error capture caused by earlier refactoring (Thomas Hellström)
- Exclude reserved stolen from driver use (Chris Wilson)
- Add memory region sanity checking and optional full test (Chris Wilson)
- Fixed buffer size truncation in TTM shmemfs backend (Robert Beckett)
- Use correct lock and don't overwrite internal data structures when stealing GuC context ids (Matthew Brost)
- Don't hog IRQs when destroying GuC contexts (John Harrison)
- Make GuC to Host communication more robust (Matthew Brost)
- Continuation of locking refactoring around VMA and backing store handling (Maarten Lankhorst)
- Improve performance of reading GuC log from debugfs (John Harrison)
- Log when GuC fails to reset an engine (John Harrison)
- Speed up GuC/HuC firmware loading by requesting RP0 (Vinay Belgaumkar)
- Further work on asynchronous VMA unbinding (Thomas Hellström, Christian König)
- Refactor GuC/HuC firmware handling to prepare for future platforms (John Harrison)
- Prepare for future different GuC/HuC firmware signing key sizes (Daniele Ceraolo Spurio, Michal Wajdeczko)
- Add noreclaim annotations (Matthew Auld)
- Remove racey GEM_BUG_ON between GPU reset and GuC communication handling (Matthew Brost)
- Refactor i915->gt with to_gt(i915) to prepare for future platforms (Michał Winiarski, Andi Shyti)
- Increase GuC log size for CONFIG_DEBUG_GEM (John Harrison)
- Fixed engine busyness in selftests when in GuC mode (Umesh Nerlige Ramappa)
- Make engine parking work with PREEMPT_RT (Sebastian Andrzej Siewior)
- Replace X86_FEATURE_PAT with pat_enabled() (Lucas De Marchi)
- Selftest for stealing of guc ids (Matthew Brost)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YcRvKO5cyPvIxVCi@tursulin-mobl2
We now support a per-gt uncore, yet we're not able to infer which GT
we're operating upon. Let's store a backpointer for now.
At this point the early initialization of the gt needs to be
broken in two parts where the first is needed to assign to the gt
the i915 private data pointer and the uncore. A temporary
function has been made and the two parts are
__intel_gt_init_early() and intel_gt_init_early(). This split
will be fixed in the future with the multitile patch.
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211214193346.21231-2-andi.shyti@linux.intel.com