A failure to load the HuC is occasionally observed where the cause is
believed to be a low GT frequency leading to very long load times.
So a) increase the timeout so that the user still gets a working
system even in the case of slow load. And b) report the frequency
during the load to see if that is the cause of the slow down.
Also update the similar code on the GuC load to not use uncore->gt
when there is a local gt available. The two should match, but no need
for unnecessary de-referencing.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240102222202.310495-1-John.C.Harrison@Intel.com
Driver Changes:
- Avoid infinite GPU waits by avoidin premature release of request's
reusable memory (Chris, Janusz)
- Expose RPS thresholds in sysfs (Tvrtko)
- Apply GuC SLPC min frequency softlimit correctly (Vinay)
- Restore SLPC efficient freq earlier (Vinay)
- Consider OA buffer boundary when zeroing out reports (Umesh)
- Extend Wa_14015795083 to TGL, RKL, DG1 and ADL (Matt R)
- Fix context workarounds with non-masked regs on MTL/DG2 (Lucas)
- Enable the CCS_FLUSH bit in the pipe control and in the CS for MTL+ (Andi)
- Update MTL workarounds 14018778641, 22016122933 (Tejas, Zhanjun)
- Ensure memory quiesced before AUX CCS invalidation (Jonathan)
- Add a gsc_info debugfs (Daniele)
- Invalidate the TLBs on each GT on multi-GT device (Chris)
- Fix a VMA UAF for multi-gt platform (Nirmoy)
- Do not use stolen on MTL due to HW bug (Nirmoy)
- Check HuC and GuC version compatibility on MTL (Daniele)
- Dump perf_limit_reasons for slow GuC init debug (Vinay)
- Replace kmap() with kmap_local_page() (Sumitra, Ira)
- Add sentinel to xehp_oa_b_counters for KASAN (Andrzej)
- Add the gen12_needs_ccs_aux_inv helper (Andi)
- Fixes and updates for GSC memory allocation (Daniele)
- Fix one wrong caching mode enum usage (Tvrtko)
- Fixes for GSC wakeref (Alan)
- Static checker fixes (Harshit, Arnd, Dan, Cristophe, David, Andi)
- Rename flags with bit_group_X according to the datasheet (Andi)
- Use direct alias for i915 in requests (Andrzej)
- Replace i915->gt0 with to_gt(i915) (Andi)
- Use the i915_vma_flush_writes helper (Tvrtko)
- Selftest improvements (Alan)
- Remove dead code (Tvrtko)
Signed-off-by: Dave Airlie <airlied@redhat.com>
# Conflicts:
# drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZMy6kDd9npweR4uy@jlahtine-mobl.ger.corp.intel.com
- Removing unused declarations (Arnd, Gustavo)
- ICL+ DSI modeset sequence fixes (Ville)
- Improvements on HDCP (Suraj)
- Fixes and clean up on MTL Display (Mika Kahola, Lee, RK, Nirmoy, Chaitanya)
- Restore HSW/BDW PSR1 (Ville)
- Other PSR Fixes (Jouni)
- Fixes around DC states and other Display Power (Imre)
- Init DDI ports in VBT order (Ville)
- General documentation fixes (Jani)
- General refactor for better organization (Jani)
- Bigjoiner fix (Stanislav)
- VDSC Fixes and improvements (Stanialav, Suraj)
- Hotplug fixes and improvements (Simon, Suraj)
- Start using plane scale factor for relative data rate (Stanislav)
- Use shmem for dpt objects (RK)
- Simplify expression &to_i915(dev)->drm (Uwe)
- Do not access i915_gem_object members from frontbuffer tracking (Jouni)
- Fix uncore race around i915->params.mmio_debug (Jani)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZMv4RCzGyCmG/BDe@intel.com
Before we add the second step of the MTL HuC auth (via GSC), we need to
have the ability to differentiate between them. To do so, the huc
authentication check is duplicated for GuC and GSC auth, with
GSC-enabled binaries being considered fully authenticated only after
the GSC auth step.
To report the difference between the 2 auth steps, a new case is added
to the HuC getparam. This way, the clear media driver can start
submitting before full auth, as partial auth is enough for those
workloads.
v2: fix authentication status check for DG2
v3: add a better comment at the top of the HuC file to explain the
different approaches to load and auth (John)
v4: update call to intel_huc_is_authenticated in the pxp code to check
for GSC authentication
v5: drop references to meu and esclamation mark in huc_auth print (John)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> #v2
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230531235415.1467475-5-daniele.ceraolospurio@intel.com
In the previous patch we extracted the offset of the legacy-style HuC
binary located within the GSC-enabled blob, so now we can use that to
load the HuC via DMA if the fuse is set that way.
Note that we now need to differentiate between "GSC-enabled binary" and
"loaded by GSC", so the former case has been renamed to "has GSC headers"
for clarity, while the latter is now based on the fuse instead of the
binary format. This way, all the legacy load paths are automatically
taken (including the auth by GuC) without having to implement further
code changes.
v2: s/is_meu_binary/has_gsc_headers/, clearer logs (John)
v3: split check for GSC access, better comments (John)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230531235415.1467475-4-daniele.ceraolospurio@intel.com
The new binaries that support the 2-step authentication contain the
legacy-style binary, which we can use for loading the HuC via DMA. To
find out where this is located in the image, we need to parse the
manifest of the GSC-enabled HuC binary. The manifest consist of a
partition header followed by entries, one of which contains the offset
we're looking for.
Note that the DG2 GSC binary contains entries with the same names, but
it doesn't contain a full legacy binary, so we need to skip assigning
the dma offset in that case (which we can do by checking the ccs).
Also, since we're now parsing the entries, we can extract the HuC
version that way instead of using hardcoded offsets.
Note that the GSC binary uses the same structures in its binary header,
so they've been added in their own header file.
v2: fix structure names to match meu defines (s/CPT/CPD/), update commit
message, check ccs validity, drop old version location defines.
v3: drop references to the MEU tool to reduce confusion, fix log (John)
v4: fix log for real (John)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> #v2
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230531235415.1467475-3-daniele.ceraolospurio@intel.com
Now that each FW has its own reserved area, we can keep them always
pinned and skip the pin/unpin dance on reset. This will make things
easier for the 2-step HuC authentication, which requires the FW to be
pinned in GGTT after the xfer is completed.
Since the vma is now valid for a long time and not just for the quick
pin-load-unpin dance, the name "dummy" is no longer appropriare and has
been replaced with vma_res. All the functions have also been updated to
operate on vma_res for consistency.
Given that we pin the vma behind the allocator's back (which is ok
because we do the pinning in an area that was previously reserved for
thus purpose), we do need to explicitly re-pin on resume because the
automated helper won't cover us.
v2: better comments and commit message, s/dummy/vma_res/
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230531235415.1467475-2-daniele.ceraolospurio@intel.com
Pull driver core updates from Greg KH:
"Here is the large set of driver core changes for 6.4-rc1.
Once again, a busy development cycle, with lots of changes happening
in the driver core in the quest to be able to move "struct bus" and
"struct class" into read-only memory, a task now complete with these
changes.
This will make the future rust interactions with the driver core more
"provably correct" as well as providing more obvious lifetime rules
for all busses and classes in the kernel.
The changes required for this did touch many individual classes and
busses as many callbacks were changed to take const * parameters
instead. All of these changes have been submitted to the various
subsystem maintainers, giving them plenty of time to review, and most
of them actually did so.
Other than those changes, included in here are a small set of other
things:
- kobject logging improvements
- cacheinfo improvements and updates
- obligatory fw_devlink updates and fixes
- documentation updates
- device property cleanups and const * changes
- firwmare loader dependency fixes.
All of these have been in linux-next for a while with no reported
problems"
* tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits)
device property: make device_property functions take const device *
driver core: update comments in device_rename()
driver core: Don't require dynamic_debug for initcall_debug probe timing
firmware_loader: rework crypto dependencies
firmware_loader: Strip off \n from customized path
zram: fix up permission for the hot_add sysfs file
cacheinfo: Add use_arch[|_cache]_info field/function
arch_topology: Remove early cacheinfo error message if -ENOENT
cacheinfo: Check cache properties are present in DT
cacheinfo: Check sib_leaf in cache_leaves_are_shared()
cacheinfo: Allow early level detection when DT/ACPI info is missing/broken
cacheinfo: Add arm64 early level initializer implementation
cacheinfo: Add arch specific early level initializer
tty: make tty_class a static const structure
driver core: class: remove struct class_interface * from callbacks
driver core: class: mark the struct class in struct class_interface constant
driver core: class: make class_register() take a const *
driver core: class: mark class_release() as taking a const *
driver core: remove incorrect comment for device_create*
MIPS: vpe-cmp: remove module owner pointer from struct class usage.
...
We're observing sporadic HuC delayed load timeouts in CI, due to mei_pxp
binding completing later than we expected. HuC is still loaded when the
bind occurs, but in the meantime i915 has started allowing submission to
the VCS engines even if HuC is not there.
In most of the cases I've observed, the timeout was due to the
init/resume of another driver between i915 and mei hitting errors and
thus adding an extra delay, but HuC was still loaded before userspace
could submit, because the whole resume process time was increased by the
delays.
Given that there is no upper bound to the delay that can be introduced
by other drivers, I've reached the following compromise with the media
team:
1) i915 is going to bump the timeout to 5s, to reduce the probability
of reaching it. We still expect HuC to be loaded before userspace
starts submitting, so increasing the timeout should have no impact on
normal operations, but in case something weird happens we don't want to
stall video submissions for too long.
2) The media driver will cope with the failing submissions that manage
to go through between i915 init/resume complete and HuC loading, if any
ever happen. This could cause a small corruption of video playback
immediately after a resume (we should be safe on boot because the media
driver polls the HUC_STATUS ioctl before starting submissions).
Since we're accepting the timeout as a valid outcome, I'm also reducing
the print verbosity from error to notice.
v2: use separate prints for MEI GSC and MEI PXP init timeouts (John)
v3: add MISSING_CASE to the if-else chain (John)
References: https://gitlab.freedesktop.org/drm/intel/-/issues/7033
Fixes: 27536e0327 ("drm/i915/huc: track delayed HuC load with a fence")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tony Ye <tony.ye@intel.com>
Cc: John Harrison <john.c.harrison@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221013203245.1801788-1-daniele.ceraolospurio@intel.com
The current HuC status getparam return values are a bit confusing in
regards to what happens in some scenarios. In particular, most of the
error cases cause the ioctl to return an error, but a couple of them,
INIT_FAIL and LOAD_FAIL, are not explicitly handled and neither is
their expected return value documented; these 2 error cases therefore
end up into the catch-all umbrella of the "HuC not loaded" case, with
this case therefore including both some error scenarios and the load
in progress one.
The updates included in this patch change the handling so that all
error cases behave the same way, i.e. return an errno code, and so
that the HuC load in progress case is unambiguous.
The patch also includes a small change to the FW init path to make sure
we always transition to an error state if something goes wrong.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Tony Ye <tony.ye@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Tony Ye <tony.ye@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220928004145.745803-14-daniele.ceraolospurio@intel.com
Given that HuC load is delayed on DG2, this patch adds support for a fence
that can be used to wait for load completion. No waiters are added in this
patch (they're coming up in the next one), to keep the focus of the
patch on the tracking logic.
The full HuC loading flow on boot DG2 is as follows:
1) i915 exports the GSC as an aux device;
2) the mei-gsc driver is loaded on the aux device;
3) the mei-pxp component is loaded;
4) mei-pxp calls back into i915 and we load the HuC.
Between steps 1 and 2 there can be several seconds of gap, mainly due to
the kernel doing other work during the boot.
The resume flow is slightly different, because we don't need to
re-expose or re-probe the aux device, so we go directly to step 3 once
i915 and mei-gsc have completed their resume flow.
Here's an example of the boot timing, captured with some logs added to
i915:
[ 17.908307] [drm] adding GSC device
[ 17.915717] [drm] i915 probe done
[ 22.282917] [drm] mei-gsc bound
[ 22.938153] [drm] HuC authenticated
Also to note is that if something goes wrong during GSC HW init the
mei-gsc driver will still bind, but steps 3 and 4 will not happen.
The status tracking is done by registering a bus_notifier to receive a
callback when the mei-gsc driver binds, with a large enough timeout to
account for delays. Once mei-gsc is bound, we switch to a smaller
timeout to wait for the mei-pxp component to load.
The fence is signalled on HuC load complete or if anything goes wrong in
any of the tracking steps. Timeout are enforced via hrtimer callbacks.
v2: fix includes (Jani)
v5: gsc_notifier() remove unneeded ()
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220928004145.745803-12-daniele.ceraolospurio@intel.com
The GSC will perform both the load and the authentication, so we just
need to check the auth bit after the GSC has replied.
Since we require the PXP module to load the HuC, the earliest we can
trigger the load is during the pxp_bind operation.
Note that GSC-loaded HuC survives GT reset, so we need to just mark it
as ready when we re-init the GT HW.
V2: move setting of HuC fw error state to the failure path of the HuC
auth function, so it covers both the legacy and new auth flows
V4:
1. Fix typo in the commit message
2. style fix in intel_huc_wait_for_auth_complete()
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Vitaly Lubart <vitaly.lubart@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220928004145.745803-11-daniele.ceraolospurio@intel.com
The previous patch introduced new failure cases in the HuC init flow
that can be hit by simply changing the config, so we want to avoid
failing the probe in those scenarios. HuC load failure is already
considered a non-fatal error and we have a way to report to userspace
if the HuC is not available via a dedicated getparam, so no changes
in expectation there.
The error message in the HuC init code has also been lowered to info to
avoid throwing error message for an expected behavior.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220504204816.2082588-5-daniele.ceraolospurio@intel.com
If the GuC fails to load, it is useful to know what firmware file /
version was attempted. So move the version info report to before the
load attempt rather than only after a successful load.
If the GuC does fail to load, then make the error messages visible
rather than being 'debug' prints that do not appears in dmesg output
by default.
When waiting for the GuC to load, it used to be necessary to check for
two different states - READY and (LAPIC_DONE | MIA_CORE). Apparently
the second signified init complete on RC6 exit. However, in more
recent GuC versions the RC6 exit sequence now finishes with status
READY as well. So the test can be simplified.
Also, add an enum giving all the current status codes that GuC loading
can report as a reference without having to pull and search through
the GuC source files.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220107000622.292081-4-John.C.Harrison@Intel.com
This was done by the following semantic patch:
@@ expression i915; @@
- INTEL_GEN(i915)
+ GRAPHICS_VER(i915)
@@ expression i915; expression E; @@
- INTEL_GEN(i915) >= E
+ GRAPHICS_VER(i915) >= E
@@ expression dev_priv; expression E; @@
- !IS_GEN(dev_priv, E)
+ GRAPHICS_VER(dev_priv) != E
@@ expression dev_priv; expression E; @@
- IS_GEN(dev_priv, E)
+ GRAPHICS_VER(dev_priv) == E
@@
expression dev_priv;
expression from, until;
@@
- IS_GEN_RANGE(dev_priv, from, until)
+ IS_GRAPHICS_VER(dev_priv, from, until)
@def@
expression E;
identifier id =~ "^gen$";
@@
- id = GRAPHICS_VER(E)
+ ver = GRAPHICS_VER(E)
@@
identifier def.id;
@@
- id
+ ver
It also takes care of renaming the variable we assign to GRAPHICS_VER()
so to use "ver" rather than "gen".
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210605155356.4183026-2-lucas.demarchi@intel.com
We currently initialize HuC support based on GuC being enabled in
modparam; this means that huc_is_supported() can return false on HW that
does have a HuC when enable_guc=0. The rationale for this behavior is
that HuC requires GuC for authentication and therefore is not supported
by itself. However, we do not allow defining HuC fw wthout GuC fw and
selecting HuC in modparam implicitly selects GuC as well, so we can't
actually hit a scenario where HuC is selected alone. Therefore, we can
flip the support check to reflect the HW capabilities and fw
availability, which is more intuitive and will make it cleaner to log
HuC the difference between not supported in HW and not selected.
Removing the difference between GuC and HuC also allows us to simplify
the init_early, since we don't need to differentiate the support based
on the type of uC.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200326181121.16869-4-daniele.ceraolospurio@intel.com
We are quite trigger happy in cleaning up the firmware blobs, as we do
so from several error/fini paths in GuC/HuC/uC code. We do have the
__uc_cleanup_firmwares cleanup function, which unwinds
__uc_fetch_firmwares and is already called both from the error path of
gem_init and from gem_driver_release, so let's stop cleaning up from
all the other paths.
The fact that we're not cleaning the firmware immediately means that
we can't consider firmware availability as an indication of
initialization success. A "LOADABLE" status has been added to
indicate that the initialization was successful, to be used to
selectively load HuC only if HuC init has completed (HuC init failure
is not considered a fatal error).
v2: s/ready_to_load/loadable (Michal), only run guc/huc_fini if the
fw is in loadable state
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> #v1
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200218223327.11058-9-daniele.ceraolospurio@intel.com
Now that we can differentiate wants vs uses GuC/HuC, intel_uc_init is
restricted to running only if we have successfully fetched the required
blob(s) and are committed to using the microcontroller(s).
The only remaining thing that can go wrong in uc_init is the allocation
of GuC/HuC related objects; if we get such a failure better to bail out
immediately instead of wedging later, like we do for e.g.
intel_engines_init, since without objects we can't use the HW, including
not being able to attempt the firmware load.
While at it, remove the unneeded fw_cleanup call (this is handled
outside of gt_init) and add a probe failure injection point for testing.
Also, update the logs for <g/h>uc_init failures to probe_failure() since
they will cause the driver load to fail.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Fernando Pacheco <fernando.pacheco@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200218223327.11058-8-daniele.ceraolospurio@intel.com