UAPI Changes:
- drm/i915/guc: Use context hints for GT frequency
Allow user to provide a low latency context hint. When set, KMD
sends a hint to GuC which results in special handling for this
context. SLPC will ramp the GT frequency aggressively every time
it switches to this context. The down freq threshold will also be
lower so GuC will ramp down the GT freq for this context more slowly.
We also disable waitboost for this context as that will interfere with
the strategy.
We need to enable the use of SLPC Compute strategy during init, but
it will apply only to contexts that set this bit during context
creation.
Userland can check whether this feature is supported using a new param-
I915_PARAM_HAS_CONTEXT_FREQ_HINT. This flag is true for all guc submission
enabled platforms as they use SLPC for frequency management.
The Mesa usage model for this flag is here -
https://gitlab.freedesktop.org/sushmave/mesa/-/commits/compute_hint
- drm/i915/gt: Enable only one CCS for compute workload
Enable only one CCS engine by default with all the compute sices
allocated to it.
While generating the list of UABI engines to be exposed to the
user, exclude any additional CCS engines beyond the first
instance
***
NOTE: This W/A will make all DG2 SKUs appear like single CCS SKUs by
default to mitigate a hardware bug. All the EUs will still remain
usable, and all the userspace drivers have been confirmed to be able
to dynamically detect the change in number of CCS engines and adjust.
For the smaller percent of applications that get perf benefit from
letting the userspace driver dispatch across all 4 CCS engines we will
be introducing a sysfs control as a later patch to choose 4 CCS each
with 25% EUs (or 50% if 2 CCS).
NOTE: A regression has been reported at
https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10895
However Andi has been triaging the issue and we're closing in a fix
to the gap in the W/A implementation:
https://lists.freedesktop.org/archives/intel-gfx/2024-April/348747.html
Driver Changes:
- Add new and fix to existing workarounds: Wa_14018575942 (MTL),
Wa_16019325821 (Gen12.70), Wa_14019159160 (MTL), Wa_16015675438,
Wa_14020495402 (Gen12.70) (Tejas, John, Lucas)
- Fix UAF on destroy against retire race and remove two earlier
partial fixes (Janusz)
- Limit the reserved VM space to only the platforms that need it (Andi)
- Reset queue_priority_hint on parking for execlist platforms (Chris)
- Fix gt reset with GuC submission is disabled (Nirmoy)
- Correct capture of EIR register on hang (John)
- Remove usage of the deprecated ida_simple_xx() API
- Refactor confusing __intel_gt_reset() (Nirmoy)
- Fix the fix for GuC reset lock confusion (John)
- Simplify/extend platform check for Wa_14018913170 (John)
- Replace dev_priv with i915 (Andi)
- Add and use gt_to_guc() wrapper (Andi)
- Remove bogus null check (Rodrigo, Dan)
. Selftest improvements (Janusz, Nirmoy, Daniele)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZitVBTvZmityDi7D@jlahtine-mobl.ger.corp.intel.com
Allow user to provide a low latency context hint. When set, KMD
sends a hint to GuC which results in special handling for this
context. SLPC will ramp the GT frequency aggressively every time
it switches to this context. The down freq threshold will also be
lower so GuC will ramp down the GT freq for this context more slowly.
We also disable waitboost for this context as that will interfere with
the strategy.
We need to enable the use of SLPC Compute strategy during init, but
it will apply only to contexts that set this bit during context
creation.
Userland can check whether this feature is supported using a new param-
I915_PARAM_HAS_CONTEXT_FREQ_HINT. This flag is true for all guc submission
enabled platforms as they use SLPC for frequency management.
The Mesa usage model for this flag is here -
https://gitlab.freedesktop.org/sushmave/mesa/-/commits/compute_hint
v2: Rename flags as per review suggestions (Rodrigo, Tvrtko).
Also, use flag bits in intel_context as it allows finer control for
toggling per engine if needed (Tvrtko).
v3: Minor review comments (Tvrtko)
v4: Update comment (Sushma)
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Ivan Briano <ivan.briano@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240306012759.204938-1-vinay.belgaumkar@intel.com
Many of the IS_METEORLAKE conditions throughout the driver are supposed
to be checks for Xe_LPG and/or Xe_LPM+ IP, not for the MTL platform
specifically. Update those checks to ensure that the code will still
operate properly if/when these IP versions show up on future platforms.
v2:
- Update two more conditions (one for pg_enable, one for MTL HuC
compatibility).
v3:
- Don't change GuC/HuC compatibility check, which sounds like it truly
is specific to the MTL platform. (Gustavo)
- Drop a non-lineage workaround number for the OA timestamp frequency
workaround. (Gustavo)
Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230821180619.650007-20-matthew.d.roper@intel.com
Driver Changes:
- Avoid infinite GPU waits by avoidin premature release of request's
reusable memory (Chris, Janusz)
- Expose RPS thresholds in sysfs (Tvrtko)
- Apply GuC SLPC min frequency softlimit correctly (Vinay)
- Restore SLPC efficient freq earlier (Vinay)
- Consider OA buffer boundary when zeroing out reports (Umesh)
- Extend Wa_14015795083 to TGL, RKL, DG1 and ADL (Matt R)
- Fix context workarounds with non-masked regs on MTL/DG2 (Lucas)
- Enable the CCS_FLUSH bit in the pipe control and in the CS for MTL+ (Andi)
- Update MTL workarounds 14018778641, 22016122933 (Tejas, Zhanjun)
- Ensure memory quiesced before AUX CCS invalidation (Jonathan)
- Add a gsc_info debugfs (Daniele)
- Invalidate the TLBs on each GT on multi-GT device (Chris)
- Fix a VMA UAF for multi-gt platform (Nirmoy)
- Do not use stolen on MTL due to HW bug (Nirmoy)
- Check HuC and GuC version compatibility on MTL (Daniele)
- Dump perf_limit_reasons for slow GuC init debug (Vinay)
- Replace kmap() with kmap_local_page() (Sumitra, Ira)
- Add sentinel to xehp_oa_b_counters for KASAN (Andrzej)
- Add the gen12_needs_ccs_aux_inv helper (Andi)
- Fixes and updates for GSC memory allocation (Daniele)
- Fix one wrong caching mode enum usage (Tvrtko)
- Fixes for GSC wakeref (Alan)
- Static checker fixes (Harshit, Arnd, Dan, Cristophe, David, Andi)
- Rename flags with bit_group_X according to the datasheet (Andi)
- Use direct alias for i915 in requests (Andrzej)
- Replace i915->gt0 with to_gt(i915) (Andi)
- Use the i915_vma_flush_writes helper (Tvrtko)
- Selftest improvements (Alan)
- Remove dead code (Tvrtko)
Signed-off-by: Dave Airlie <airlied@redhat.com>
# Conflicts:
# drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZMy6kDd9npweR4uy@jlahtine-mobl.ger.corp.intel.com
UAPI Changes:
- (Build-time only, should not have any impact)
drm/i915/uapi: Replace fake flex-array with flexible-array member
"Zero-length arrays as fake flexible arrays are deprecated and we are
moving towards adopting C99 flexible-array members instead."
This is on core kernel request moving towards GCC 13.
Driver Changes:
- Fix context runtime accounting on sysfs fdinfo for heavy workloads (Tvrtko)
- Add support for OA media units on MTL (Umesh)
- Add new workarounds for Meteorlake (Daniele, Radhakrishna, Haridhar)
- Fix sysfs to read actual frequency for MTL and Gen6 and earlier
(Ashutosh)
- Synchronize i915/BIOS on C6 enabling on MTL (Vinay)
- Fix DMAR error noise due to GPU error capture (Andrej)
- Fix forcewake during BAR resize on discrete (Andrzej)
- Flush lmem contents after construction on discrete (Chris)
- Fix GuC loading timeout on systems where IFWI programs low boot
frequency (John)
- Fix race condition UAF in i915_perf_add_config_ioctl (Min)
- Sanitycheck MMIO access early in driver load and during forcewake
(Matt)
- Wakeref fixes for GuC RC error scenario and active VM tracking (Chris)
- Cancel HuC delayed load timer on reset (Daniele)
- Limit double GT reset to pre-MTL (Daniele)
- Use i915 instead of dev_priv insied the file_priv structure (Andi)
- Improve GuC load error reporting (John)
- Simplify VCS/BSD engine selection logic (Tvrtko)
- Perform uc late init after probe error injection (Andrzej)
- Fix format for perf_limit_reasons in debugfs (Vinay)
- Create per-gt debugfs files (Andi)
- Documentation and kerneldoc fixes (Nirmoy, Lee)
- Selftest improvements (Fei, Jonathan)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZC6APj/feB+jBf2d@jlahtine-mobl.ger.corp.intel.com
Expose intel_rps_read_actual_frequency_fw to read the actual freq without
taking forcewake for use by PMU. The code is refactored to use a common set
of functions across sysfs and PMU. Using common functions with sysfs in PMU
solves the issues of missing support for MTL and missing support for older
generations (prior to Gen6). It also future proofs the PMU where sometimes
code has been updated for sysfs and PMU has been missed.
v2: Remove runtime_pm_if_in_use from read_actual_frequency_fw (Tvrtko)
v3: (Tvrtko)
- Remove goto in __read_cagf
- Unexport intel_rps_get_cagf and intel_rps_read_punit_req
Fixes: 22009b6dad ("drm/i915/mtl: Modify CAGF functions for MTL")
Link: https://gitlab.freedesktop.org/drm/intel/-/issues/8280
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230316004800.2539753-1-ashutosh.dixit@intel.com
Move a handful of key enums to a new file intel_display_limits.h. These
are the enum types, and the MAX/NUM enumerations within them, that are
used in other headers. Otherwise, there's no common theme between them.
Replace intel_display.h include with intel_display_limit.h where
relevant, and add the intel_display.h include directly in the .c files
where needed.
Since intel_display.h is used almost everywhere in display/, include it
from intel_display_types.h to avoid massive changes across the
board. There are very few files that would need intel_display_types.h
but not intel_display.h so this is neglible, and further cleanup between
these headers can be left for the future.
Overall this change drops the direct and indirect dependencies on
intel_display.h from about 300 to about 100 compilation units, because
we can drop the include from i915_drv.h.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230116164644.1752009-1-jani.nikula@intel.com
Waitboost (when SLPC is enabled) results in a H2G message. This can result
in thousands of messages during a stress test and fill up an already full
CTB. There is no need to request for boost if min softlimit is equal or
greater than it.
v2: Add the tracing back, and check requested freq
in the worker thread (Tvrtko)
v3: Check requested freq in dec_waiters as well
v4: Only check min_softlimit against boost_freq. Limit this
optimization for server parts for now.
v5: min_softlimit can be greater than boost (Ashutosh)
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221024171108.14373-1-vinay.belgaumkar@intel.com
We need to inform PCODE of a desired ring frequencies so PCODE update
the memory frequencies to us. rps->min_freq and rps->max_freq are the
frequencies used in that request. However they were unset when SLPC was
enabled and PCODE never updated the memory freq.
v2 (as Suggested by Ashutosh): if SLPC is in use, let's pick the right
frequencies from the get_ia_constants instead of the fake init of
rps' min and max.
v3: don't forget the max <= min return
v4: Move all the freq conversion to intel_rps.c. And the max <= min
check to where it belongs.
v5: (Ashutosh) Fix old comment s/50 HZ/50 MHz and add a doc explaining
the "raw format"
Fixes: 7ba79a6715 ("drm/i915/guc/slpc: Gate Host RPS when SLPC is enabled")
Cc: <stable@vger.kernel.org> # v5.15+
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Tested-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220831214538.143950-1-rodrigo.vivi@intel.com
Host Turbo operates at efficient frequency when GT is not idle unless
the user or workload has forced it to a higher level. Replicate the same
behavior in SLPC by allowing the algorithm to use efficient frequency.
We had disabled it during boot due to concerns that it might break
kernel ABI for min frequency. However, this is not the case since
SLPC will still abide by the (min,max) range limits.
With this change, min freq will be at efficient frequency level at init
instead of fused min (RPn). If user chooses to reduce min freq below the
efficient freq, we will turn off usage of efficient frequency and honor
the user request. When a higher value is written, it will get toggled
back again.
The patch also corrects the register which needs to be read for obtaining
the correct efficient frequency for Gen9+.
We see much better perf numbers with benchmarks like glmark2 with
efficient frequency usage enabled as expected.
v2: Address review comments (Rodrigo)
v3: with efficient frequency being dynamic, it is possible that the req
frequency may go beyond max freq. This will cause SLPC selftests to fail.
Add a FIXME there to start the test with [RPn, RP0] instead and restore
it afterwards.
BugLink: https://gitlab.freedesktop.org/drm/intel/-/issues/5468
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220820010832.15350-1-vinay.belgaumkar@intel.com
Each gt contains an independent instance of pcode. Extend pcode functions
to interface with pcode on different gt's. To avoid creating dependency of
display functionality on intel_gt, pcode function interfaces are exposed in
terms of uncore rather than intel_gt. Callers have been converted to pass
in the appropritate (i915 or intel_gt) uncore to the pcode functions.
v2: Expose pcode functions in terms of uncore rather than gt (Jani/Rodrigo)
v3: Retain previous function names to eliminate needless #defines (Rodrigo)
v4: Move out i915_pcode_init() to a separate patch (Tvrtko)
Remove duplicated drm_err/drm_dbg from intel_pcode_init() (Tvrtko)
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220519085732.1276255-2-tvrtko.ursulin@linux.intel.com
[tursulin: fixup merge conflict]
In order to get the GSC Support merged on drm-intel-gt-next
in a clean fashion we needed this ATS-M patch to avoid
conflict in i915_pci.c:
commit 412c942bdf ("drm/i915/ats-m: add ATS-M platform info")
--
Fixing a silent conflict on drivers/gpu/drm/i915/gt/intel_gt_gmch.c:
- if (!intel_vtd_active(i915))
+ if (!i915_vtd_active(i915))
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Sync up with v5.18-rc1, in particular to get 5e3094cfd9
("drm/i915/xehpsdv: Add has_flat_ccs to device info").
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Freq caps (i.e. RP0, RP1 and RPn frequencies) are read from HW. However the
formats (bit positions, widths, registers and units) of these vary for
different generations with even more variations arriving in the future. In
order not to have to do identical computation for these caps in multiple
places, here we centralize the computation of these caps. This makes the
code cleaner and also more extensible for the future.
v2: Clarify that caps are in "hw units" in comments (Lucas De Marchi)
v3: Minor checkpatch fix
v4: s/intel_rps_get_freq_caps/gen6_rps_get_freq_caps/ (Badal Nilawar)
v5: Changes comments to kernel doc (Anshuman Gupta)
Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Acked-by: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220406191848.20895-1-ashutosh.dixit@intel.com
Throttling here refers to the GT frequency being clipped. Each of
the throttle reason attributes will have a 0 or 1 value depending
upon whether there is throttling and also the specific reason for
it.
The following is a brief description of the sysfs throttle
frequency attributes added:
- throttle_reason_status: when set indicates that there is GT
frequency clipping.
- throttle_reason_pl1: when set indicates that PBM PL1 (platform
or package PL1) has caused GT frequency clipping.
- throttle_reason_pl2: when set indicates that PBM PL2 or PL3
(platform or package PL2 or PL3) has caused GT frequency
clipping.
- throttle_reason_pl4: when set indicates that PL4 or IccMax has
caused GT frequency clipping.
- throttle_reason_thermal: when set indicates that Thermal event
has caused GT frequency clipping.
- throttle_reason_prochot: when set indicates that PROCHOT# has
caused GT frequency clipping.
- throttle_reason_ratl: when set indicates that Running Average
Thermal Limit has caused GT frequency clipping.
- throttle_reason_vr_thermalert: when set indicates that Hot VR
(any processor VR) has caused GT frequency clipping.
- throttle_reason_vr_tdc: when set indicates that VR TDC
(Thermal Design Current) has caused GT frequency clipping.
Signed-off-by: Sujaritha Sundaresan <sujaritha.sundaresan@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Dale B Stimson <dale.b.stimson@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220318233938.149744-8-andi.shyti@linux.intel.com
Registers that exist within the MCH BAR and are mirrored into the GPU's
MMIO space are a good candidate to separate out into their own header.
For reference, the mirror of the MCH BAR starts at the following
locations in the graphics MMIO space (the end of the MCHBAR range
differs slightly on each platform):
* Pre-gen6: 0x10000
* Gen6-Gen11 + RKL: 0x140000
v2:
- Create separate patch to swtich a few register definitions to be
relative to the MCHBAR mirror base.
- Drop upper bound of MCHBAR mirror from commit message; there are too
many different combinations between various platforms to list out,
and the documentation is spotty for the older pre-gen6 platforms
anyway.
Bspec: 134, 51771
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220215061342.2055952-2-matthew.d.roper@intel.com
Catch-up with 5.17-rc2 and trying to align with drm-intel-gt-next
for a possible topic branch for merging the split of i915_regs...
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
By default, GT (and GuC) run at RPn. Requesting for RP0
before firmware load can speed up DMA and HuC auth as well.
In addition to writing to 0xA008, we also need to enable
swreq in 0xA024 so that Punit will pay heed to our request.
SLPC will restore the frequency back to RPn after initialization,
but we need to manually do that for the non-SLPC path.
We don't need a manual override in the SLPC disabled case, just
use the intel_rps_set function to ensure consistent RPS state.
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Sujaritha Sundaresan <sujaritha.sundaresan@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211216233022.21351-1-vinay.belgaumkar@intel.com