linux

mirror of https://github.com/raspberrypi/linux.git synced 2025-12-22 17:52:09 +00:00

Author	SHA1	Message	Date
Dave Airlie	3d335a523b	Merge tag 'drm-intel-next-2022-11-18' of git://anongit.freedesktop.org/drm/drm-intel into drm-next GVT Changes: - gvt-next stuff mostly with refactor for the new MDEV interface. i915 Changes: - PSR fixes and improvements (Jouni) - DP DSC fixes (Vinod, Jouni) - More general display cleanups (Jani) - More display collor management cleanup targetting degamma (Ville) - remove circ_buf.h includes (Jiri) - wait power off delay at driver remove to optimize probe (Jani) - More audio cleanup targeting the ELD precompute readout (Ville) - Enable DC power states on all eDP ports (Imre) - RPL-P stepping info (Matt Atwood) - MTL enabling patches (RK) - Removal of DG2 force_probe (Matt) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/Y3f71obyEkImXoUF@intel.com	2022-11-23 09:15:44 +10:00
Rodrigo Vivi	002c6ca752	Merge drm/drm-next into drm-intel-next Catch up on 6.1-rc cycle in order to solve the intel_backlight conflict on linux-next. Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2022-11-14 14:32:34 -05:00
Jani Nikula	801543b259	drm/i915: stop including i915_irq.h from i915_trace.h Turns out many of the files that need i915_reg.h get it implicitly via {display/intel_de.h, gt/intel_context.h} -> i915_trace.h -> i915_irq.h -> i915_reg.h. Since i915_trace.h doesn't actually need i915_irq.h, makes sense to drop it, but that requires adding quite a few new includes all over the place. Prefer including i915_reg.h where needed instead of adding another implicit include, because eventually we'll want to split up i915_reg.h and only include the specific registers at each place. Also some places actually needed i915_irq.h too. Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/6e78a2e0ac1bffaf5af3b5ccc21dff05e6518cef.1668008071.git.jani.nikula@intel.com	2022-11-11 13:05:19 +02:00
John Harrison	178b8a3668	drm/i915/guc: Don't deadlock busyness stats vs reset The engine busyness stats has a worker function to do things like 64bit extend the 32bit hardware counters. The GuC's reset prepare function flushes out this worker function to ensure no corruption happens during the reset. Unforunately, the worker function has an infinite wait for active resets to finish before doing its work. Thus a deadlock would occur if the worker function had actually started just as the reset starts. The function being used to lock the reset-in-progress mutex is called intel_gt_reset_trylock(). However, as noted it does not follow standard 'trylock' conventions and exit if already locked. So rename the current _trylock function to intel_gt_reset_lock_interruptible(), which is the behaviour it actually provides. In addition, add a new implementation of _trylock and call that from the busyness stats worker instead. v2: Rename existing trylock to interruptible rather than trying to preserve the existing (confusing) naming scheme (review comments from Tvrtko). Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221102192109.2492625-3-John.C.Harrison@Intel.com	2022-11-04 13:45:45 -07:00
John Harrison	de51de9672	drm/i915/guc: Properly initialise kernel contexts If a context has already been registered prior to first submission then context init code was not being called. The noticeable effect of that was the scheduling priority was left at zero (meaning super high priority) instead of being set to normal. This would occur with kernel contexts at start of day as they are manually pinned up front rather than on first submission. So add a call to initialise those when they are pinned. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221102192109.2492625-2-John.C.Harrison@Intel.com	2022-11-04 13:45:44 -07:00
John Harrison	cc2e0cf0ad	drm/i915/guc: Remove excessive line feeds in state dumps Some of the GuC state dump messages were adding extra line feeds. When printing via a DRM printer to dmesg, for example, that messes up the log formatting as it loses any prefixing from the printer. Given that the extra line feeds are just in the middle of random bits of GuC state, there isn't any real need for them. So just remove them completely. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221031220007.4176835-1-John.C.Harrison@Intel.com	2022-11-04 12:08:19 -07:00
Alan Previn	a7ac9d84b8	drm/i915/guc: Remove intel_context:number_committed_requests counter With the introduction of the delayed disable-sched behavior, we use the GuC's xarray of valid guc-id's as a way to identify if new requests had been added to a context when the said context is being checked for closure. Additionally that prior change also closes the race for when a new incoming request fails to cancel the pending delayed disable-sched worker. With these two complementary checks, we see no more use for intel_context:guc_state:number_committed_requests. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221006225121.826257-3-alan.previn.teres.alexis@intel.com	2022-10-26 17:29:47 -07:00
Matthew Brost	8332109430	drm/i915/guc: Delay disabling guc_id scheduling for better hysteresis Add a delay, configurable via debugfs (default 34ms), to disable scheduling of a context after the pin count goes to zero. Disable scheduling is a costly operation as it requires synchronizing with the GuC. So the idea is that a delay allows the user to resubmit something before doing this operation. This delay is only done if the context isn't closed and less than a given threshold (default is 3/4) of the guc_ids are in use. Alan Previn: Matt Brost first introduced this patch back in Oct 2021. However no real world workload with measured performance impact was available to prove the intended results. Today, this series is being republished in response to a real world workload that benefited greatly from it along with measured performance improvement. Workload description: 36 containers were created on a DG2 device where each container was performing a combination of 720p 3d game rendering and 30fps video encoding. The workload density was configured in a way that guaranteed each container to ALWAYS be able to render and encode no less than 30fps with a predefined maximum render + encode latency time. That means the totality of all 36 containers and their workloads were not saturating the engines to their max (in order to maintain just enough headrooom to meet the min fps and max latencies of incoming container submissions). Problem statement: It was observed that the CPU core processing the i915 soft IRQ work was experiencing severe load. Using tracelogs and an instrumentation patch to count specific i915 IRQ events, it was confirmed that the majority of the CPU cycles were caused by the gen11_other_irq_handler() -> guc_irq_handler() code path. The vast majority of the cycles was determined to be processing a specific G2H IRQ: i.e. INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE. These IRQs are sent by GuC in response to i915 KMD sending H2G requests: INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET. Those H2G requests are sent whenever a context goes idle so that we can unpin the context from GuC. The high CPU utilization % symptom was limiting density scaling. Root Cause Analysis: Because the incoming execution buffers were spread across 36 different containers (each with multiple contexts) but the system in totality was NOT saturated to the max, it was assumed that each context was constantly idling between submissions. This was causing a thrashing of unpinning contexts from GuC at one moment, followed quickly by repinning them due to incoming workload the very next moment. These event-pairs were being triggered across multiple contexts per container, across all containers at the rate of > 30 times per sec per context. Metrics: When running this workload without this patch, we measured an average of ~69K INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE events every 10 seconds or ~10 million times over ~25+ mins. With this patch, the count reduced to ~480 every 10 seconds or about ~28K over ~10 mins. The improvement observed is ~99% for the average counts per 10 seconds. Design awareness: Selftest impact. As temporary WA disable this feature for the selftests. Selftests are very timing sensitive and any change in timing can cause failure. A follow up patch will fixup the selftests to understand this delay. Design awareness: Race between guc_request_alloc and guc_context_close. If a context close is issued while there is a request submission in flight and a delayed schedule disable is pending, guc_context_close and guc_request_alloc will race to cancel the delayed disable. To close the race, make sure that guc_request_alloc waits for guc_context_close to finish running before checking any state. Design awareness: GT Reset event. If a gt reset is triggered, as preparation steps, add an additional step to ensure all contexts that have a pending delay-disable-schedule task be flushed of it. Move them directly into the closed state after cancelling the worker. This is okay because the existing flow flushes all yet-to-arrive G2H's dropping them anyway. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221006225121.826257-2-alan.previn.teres.alexis@intel.com	2022-10-26 17:29:43 -07:00
John Harrison	568944af44	drm/i915/guc: Limit scheduling properties to avoid overflow GuC converts the pre-emption timeout and timeslice quantum values into clock ticks internally. That significantly reduces the point of 32bit overflow. On current platforms, worst case scenario is approximately 110 seconds. Rather than allowing the user to set higher values and then get confused by early timeouts, add limits when setting these values. v2: Add helper functions for clamping (review feedback from Tvrtko). v3: Add a bunch of BUG_ON range checks in addition to the checks already in the clamping functions (Tvrtko) Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221006213813.1563435-2-John.C.Harrison@Intel.com	2022-10-24 12:11:59 -07:00
Tvrtko Ursulin	0add082ceb	drm/i915/guc: Fix revocation of non-persistent contexts Patch which added graceful exit for non-persistent contexts missed the fact it is not enough to set the exiting flag on a context and let the backend handle it from there. GuC backend cannot handle it because it runs independently in the firmware and driver might not see the requests ever again. Patch also missed the fact some usages of intel_context_is_banned in the GuC backend needed replacing with newly introduced intel_context_is_schedulable. Fix the first issue by calling into backend revoke when we know this is the last chance to do it. Fix the second issue by replacing intel_context_is_banned with intel_context_is_schedulable, which should always be safe since latter is a superset of the former. v2: * Just call ce->ops->revoke unconditionally. (Andrzej) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: `45c64ecf97` ("drm/i915: Improve user experience and driver robustness under SIGINT or similar") Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: <stable@vger.kernel.org> # v6.0+ Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221003121630.694249-1-tvrtko.ursulin@linux.intel.com	2022-10-05 08:52:41 +01:00
Lucas De Marchi	d263545ef0	drm/i915/gt: Fix platform prefix Different handling for XeHP and later platforms should be using the xehp prefix, not gen125. Rename them. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220930050903.3479619-4-lucas.demarchi@intel.com	2022-10-03 07:15:11 -07:00
John Harrison	1e88da4f6d	drm/i915/guc: Enable compute scheduling on DG2 DG2 has issues. To work around one of these the GuC must schedule apps in an exclusive manner across both RCS and CCS. That is, if a context from app X is running on RCS then all CCS engines must sit idle even if there are contexts from apps Y, Z, ... waiting to run. A certain OS favours RCS to the total starvation of CCS. Linux does not. Hence the GuC now has a scheduling policy setting to control this abitration. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220922201209.1446343-2-John.C.Harrison@Intel.com	2022-09-27 17:12:43 -07:00
Jani Nikula	d09aa85258	drm/i915: move i915_coherent_map_type() to i915_gem_pages.c and un-inline The inline function has no place in i915_drv.h. Move it away, un-inline, and untangle some header dependencies while at it. Cc: Matthew Auld <matthew.auld@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220914163514.1837467-1-jani.nikula@intel.com	2022-09-26 12:21:08 +03:00
Matt Roper	03d2c54d30	drm/i915/mtl: Use primary GT's irq lock for media GT When we hook up interrupts (in the next patch), interrupts for the media GT are still processed as part of the primary GT's interrupt flow. As such, we should share the same IRQ lock with the primary GT. Let's convert gt->irq_lock into a pointer and just point the media GT's instance at the same lock the primary GT is using. v2: - Point media's gt->irq_lock at the primary GT lock properly. (Daniele) - Fix jump target for intel_root_gt_init_early errors. (Daniele) Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220906234934.3655440-14-matthew.d.roper@intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2022-09-12 15:23:12 +03:00
Umesh Nerlige Ramappa	31335aa8e0	drm/i915/guc: Cancel GuC engine busyness worker synchronously The worker is canceled in gt_park path, but earlier it was assumed that gt_park path cannot sleep and the cancel is asynchronous. This caused a race with suspend flow where the worker runs after suspend and causes an unclaimed register access warning. Cancel the worker synchronously since the gt_park is indeed allowed to sleep. v2: Fix author name and sign-off mismatch Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/4419 Fixes: `77cdd054dd` ("drm/i915/pmu: Connect engine busyness stats from GuC to pmu") Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220827002135.139349-1-umesh.nerlige.ramappa@intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2022-09-12 15:23:11 +03:00
John Harrison	65332a5b9f	drm/i915/uc: Add patch level version number support With the move to un-versioned filenames, it becomes more difficult to know exactly what version of a given firmware is being used. So add the patch level version number to the debugfs output. Also, support matching by patch level when selecting code paths for firmware compatibility. While a patch level change cannot be backwards breaking, it is potentially possible that a new feature only works from a given patch level onwards (even though it was theoretically added in an earlier version that bumped the major or minor version). Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220906230147.479945-2-daniele.ceraolospurio@intel.com	2022-09-07 09:54:29 -07:00
John Harrison	665ae9c9ca	drm/i915/uc: Support for version reduced and multiple firmware files There was a misunderstanding in how firmware file compatibility should be managed within i915. This has been clarified as: i915 must support all existing firmware releases forever new minor firmware releases should replace prior versions only backwards compatibility breaking releases should be a new file This patch cleans up the single fallback file support that was added as a quick fix emergency effort. That is now removed in preference to supporting arbitrary numbers of firmware files per platform. The patch also adds support for having GuC firmware files that are named by major version only (because the major version indicates backwards breaking changes that affect the KMD) and for having HuC firmware files with no version number at all (because the KMD has no interface requirements with the HuC). For GuC, the driver will report via dmesg if the found file is older than expected. For HuC, the KMD will no longer require updating for any new HuC release so will not be able to report what the latest expected version is. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220906230147.479945-1-daniele.ceraolospurio@intel.com	2022-09-07 09:54:00 -07:00
Matthew Auld	54c204c522	Revert "drm/i915/guc: Add delay to disable scheduling after pin count goes to zero" This reverts commit `6a07990384`. Everything in CI using GuC is now timing out[1], and killing the machine with this change (perhaps a deadlock?). CI was recently on fire due to some changes coming in from -rc1, so likely the pre-merge CI results for this series were invalid? For now just revert, unless GuC experts already have a fix in mind. [1] https://intel-gfx-ci.01.org/tree/drm-tip/index.html? Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220819123904.913750-1-matthew.auld@intel.com	2022-08-20 09:41:56 -07:00
Matthew Brost	6a07990384	drm/i915/guc: Add delay to disable scheduling after pin count goes to zero Add a delay, configurable via debugfs (default 34ms), to disable scheduling of a context after the pin count goes to zero. Disable scheduling is a costly operation as it requires synchronizing with the GuC. So the idea is that a delay allows the user to resubmit something before doing this operation. This delay is only done if the context isn't closed and less than a given threshold (default is 3/4) of the guc_ids are in use. As temporary WA disable this feature for the selftests. Selftests are very timing sensitive and any change in timing can cause failure. A follow up patch will fixup the selftests to understand this delay. Alan Previn: Matt Brost first introduced this series back in Oct 2021. However no real world workload with measured performance impact was available to prove the intended results. Today, this series is being republished in response to a real world workload that benefited greatly from it along with measured performance improvement. Workload description: 36 containers were created on a DG2 device where each container was performing a combination of 720p 3d game rendering and 30fps video encoding. The workload density was configured in a way that guaranteed each container to ALWAYS be able to render and encode no less than 30fps with a predefined maximum render + encode latency time. That means the totality of all 36 containers and their workloads were not saturating the engines to their max (in order to maintain just enough headrooom to meet the min fps and max latencies of incoming container submissions). Problem statement: It was observed that the CPU core processing the i915 soft IRQ work was experiencing severe load. Using tracelogs and an instrumentation patch to count specific i915 IRQ events, it was confirmed that the majority of the CPU cycles were caused by the gen11_other_irq_handler() -> guc_irq_handler() code path. The vast majority of the cycles was determined to be processing a specific G2H IRQ: i.e. INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE. These IRQs are sent by GuC in response to i915 KMD sending H2G requests: INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET. Those H2G requests are sent whenever a context goes idle so that we can unpin the context from GuC. The high CPU utilization % symptom was limiting density scaling. Root Cause Analysis: Because the incoming execution buffers were spread across 36 different containers (each with multiple contexts) but the system in totality was NOT saturated to the max, it was assumed that each context was constantly idling between submissions. This was causing a thrashing of unpinning contexts from GuC at one moment, followed quickly by repinning them due to incoming workload the very next moment. These event-pairs were being triggered across multiple contexts per container, across all containers at the rate of > 30 times per sec per context. Metrics: When running this workload without this patch, we measured an average of ~69K INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE events every 10 seconds or ~10 million times over ~25+ mins. With this patch, the count reduced to ~480 every 10 seconds or about ~28K over ~10 mins. The improvement observed is ~99% for the average counts per 10 seconds. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220817020511.2180747-3-alan.previn.teres.alexis@intel.com	2022-08-18 15:23:47 -07:00
Daniele Ceraolo Spurio	f922fbb0f2	drm/i915/guc: clear stalled request after a reset If the GuC CTs are full and we need to stall the request submission while waiting for space, we save the stalled request and where the stall occurred; when the CTs have space again we pick up the request submission from where we left off. If a full GT reset occurs, the state of all contexts is cleared and all non-guilty requests are unsubmitted, therefore we need to restart the stalled request submission from scratch. To make sure that we do so, clear the saved request after a reset. Fixes note: the patch that introduced the bug is in 5.15, but no officially supported platform had GuC submission enabled by default in that kernel, so the backport to that particular version (and only that one) can potentially be skipped. Fixes: `925dc1cf58` ("drm/i915/guc: Implement GuC submission tasklet") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: John Harrison <john.c.harrison@intel.com> Cc: <stable@vger.kernel.org> # v5.15+ Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220811210812.3239621-1-daniele.ceraolospurio@intel.com	2022-08-17 13:53:55 -07:00
Daniele Ceraolo Spurio	6c82c75230	drm/i915/guc: Don't send policy update for child contexts. The GuC FW applies the parent context policy to all the children, so individual updates to the children are not supported and we should not send them. Note that sending the message did not have any functional consequences, because the GuC just drops it and logs an error; since we were trying to set the child policy to match the parent anyway the message being dropped was not a problem. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <john.c.harrison@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220728003339.2361010-1-daniele.ceraolospurio@intel.com	2022-08-01 13:56:26 -07:00
Rahul Kumar Singh	69142c0a5f	drm/i915/guc: Add selftest for a hung GuC Add a test to check that the hangcheck will recover from a submission hang in the GuC. Signed-off-by: Rahul Kumar Singh <rahul.kumar.singh@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220728182616.2417491-1-John.C.Harrison@Intel.com	2022-07-29 10:35:55 -07:00
Michał Winiarski	9fb3473732	drm/i915/guc: Route semaphores to GuC for Gen12+ In GuC submission mode, there is an option to use auto-switch out semaphores and have GuC auto-switch in a waiting context. This requires routing the semaphore interrupt to GuC. Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220728024225.2363663-2-John.C.Harrison@Intel.com	2022-07-29 10:35:53 -07:00
Daniele Ceraolo Spurio	774ce1510e	drm/i915/guc: support v69 in parallel to v70 This patch re-introduces support for GuC v69 in parallel to v70. As this is a quick fix, v69 has been re-introduced as the single "fallback" guc version in case v70 is not available on disk and only for platforms that are out of force_probe and require the GuC by default. All v69 specific code has been labeled as such for easy identification, and the same was done for all v70 functions for which there is a separate v69 version, to avoid accidentally calling the wrong version via the unlabeled name. When the fallback mode kicks in, a drm_notice message is printed in dmesg to inform the user of the required update. The existing logging of the fetch function has also been updated so that we no longer complain immediately if we can't find a fw and we only throw an error if the fetch of both the base and fallback blobs fails. The plan is to follow this up with a more complex rework to allow for multiple different GuC versions to be supported at the same time. v2: reduce the fallback to platform that require it, switch to firmware_request_nowarn(), improve logs. Fixes: `2584b3549f` ("drm/i915/guc: Update to GuC version 70.1.1") Link: https://lists.freedesktop.org/archives/intel-gfx/2022-July/301640.html Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Dave Airlie <airlied@gmail.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220718230732.1409641-1-daniele.ceraolospurio@intel.com	2022-07-19 14:39:20 -07:00
Umesh Nerlige Ramappa	0667429ce6	drm/i915/reset: Add additional steps for Wa_22011802037 for execlist backend For execlists backend, current implementation of Wa_22011802037 is to stop the CS before doing a reset of the engine. This WA was further extended to wait for any pending MI FORCE WAKEUPs before issuing a reset. Add the extended steps in the execlist path of reset. In addition, extend the WA to gen11. v2: (Tvrtko) - Clarify comments, commit message, fix typos - Use IS_GRAPHICS_VER for gen 11/12 checks v3: (Daneile) - Drop changes to intel_ring_submission since WA does not apply to it - Log an error if MSG IDLE is not defined for an engine Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Fixes: `f6aa0d713c` ("drm/i915: Add Wa_22011802037 force cs halt") Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220621192105.2100585-1-umesh.nerlige.ramappa@intel.com	2022-06-27 12:21:22 -07:00
Alan Previn	59bcdb564b	drm/i915/guc: Don't update engine busyness stats too frequently Using two different types of workoads, it was observed that guc_update_engine_gt_clks was being called too frequently and/or causing a CPU-to-lmem bandwidth hit over PCIE. Details on the workloads and numbers are in the notes below. Background: At the moment, guc_update_engine_gt_clks can be invoked via one of 3 ways. #1 and #2 are infrequent under normal operating conditions: 1.When a predefined "ping_delay" timer expires so that GuC- busyness can sample the GTPM clock counter to ensure it doesn't miss a wrap-around of the 32-bits of the HW counter. (The ping_delay is calculated based on 1/8th the time taken for the counter go from 0x0 to 0xffffffff based on the GT frequency. This comes to about once every 28 seconds at a GT frequency of 19.2Mhz). 2.In preparation for a gt reset. 3.In response to __gt_park events (as the gt power management puts the gt into a lower power state when there is no work being done). Root-cause: For both the workloads described farther below, it was observed that when user space calls IOCTLs that unparks the gt momentarily and repeats such calls many times in quick succession, it triggers calling guc_update_engine_gt_clks as many times. However, the primary purpose of guc_update_engine_gt_clks is to ensure we don't miss the wraparound while the counter is ticking. Thus, the solution is to ensure we skip that check if gt_park is calling this function earlier than necessary. Solution: Snapshot jiffies when we do actually update the busyness stats. Then get the new jiffies every time intel_guc_busyness_park is called and bail if we are being called too soon. Use half of the ping_delay as a safe threshold. NOTE1: Workload1: IGTs' gem_create was modified to create a file handle, allocate memory with sizes that range from a min of 4K to the max supported (in power of two step-sizes). Its maps, modifies and reads back the memory. Allocations and modification is repeated until total memory allocation reaches the max. Then the file handle is closed. With this workload, guc_update_engine_gt_clks was called over 188 thousand times in the span of 15 seconds while this test ran three times. With this patch, the number of calls reduced to 14. NOTE2: Workload2: 30 transcode sessions are created in quick succession. While these sessions are created, pcm-iio tool was used to measure I/O read operation bandwidth consumption sampled at 100 milisecond intervals over the course of 20 seconds. The total bandwidth consumed over 20 seconds without this patch was measured at average at 311KBps per sample. With this patch, the number went down to about 175Kbps which is about a 43% savings. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220623023157.211650-2-alan.previn.teres.alexis@intel.com	2022-06-27 11:35:54 -07:00
Tvrtko Ursulin	45c64ecf97	drm/i915: Improve user experience and driver robustness under SIGINT or similar We have long standing customer complaints that pressing Ctrl-C (or to the effect of) causes engine resets with otherwise well behaving programs. Not only is logging engine resets during normal operation not desirable since it creates support incidents, but more fundamentally we should avoid going the engine reset path when we can since any engine reset introduces a chance of harming an innocent context. Reason for this undesirable behaviour is that the driver currently does not distinguish between banned contexts and non-persistent contexts which have been closed. To fix this we add the distinction between the two reasons for revoking contexts, which then allows the strict timeout only be applied to banned, while innocent contexts (well behaving) can preempt cleanly and exit without triggering the engine reset path. Note that the added context exiting category applies both to closed non- persistent context, and any exiting context when hangcheck has been disabled by the user. At the same time we rename the backend operation from 'ban' to 'revoke' which more accurately describes the actual semantics. (There is no ban at the backend level since banning is a concept driven by the scheduling frontend. Backends are simply able to revoke a running context so that is the more appropriate name chosen.) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220527072452.2225610-1-tvrtko.ursulin@linux.intel.com	2022-06-17 09:03:11 +01:00
Umesh Nerlige Ramappa	303760aa91	i915/guc/reset: Make __guc_reset_context aware of guilty engines There are 2 ways an engine can get reset in i915 and the method of reset affects how KMD labels a context as guilty/innocent. (1) GuC initiated engine-reset: GuC resets a hung engine and notifies KMD. The context that hung on the engine is marked guilty and all other contexts are innocent. The innocent contexts are resubmitted. (2) GT based reset: When an engine heartbeat fails to tick, KMD initiates a gt/chip reset. All active contexts are marked as guilty and discarded. In order to correctly mark the contexts as guilty/innocent, pass a mask of engines that were reset to __guc_reset_context. Fixes: `eb5e7da736` ("drm/i915/guc: Reset implementation for new GuC interface") Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220426003045.3929439-1-umesh.nerlige.ramappa@intel.com	2022-05-12 11:42:37 -07:00
Matthew Brost	a5c89f7c43	drm/i915/guc: Support programming the EU priority in the GuC descriptor In GuC submission mode the EU priority must be updated by the GuC rather than the driver as the GuC owns the programming of the context descriptor. Given that the GuC code uses the GuC priorities, we can't use a generic function using i915 priorities for both execlists and GuC submission. The existing function has therefore been pushed to the execlists back-end while a new one has been added for GuC. v2: correctly use the GuC prio. Cc: John Harrison <john.c.harrison@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220504234636.2119794-1-daniele.ceraolospurio@intel.com	2022-05-05 15:46:18 -07:00
Chris Wilson	166c44e694	drm/i915/gt: Clear SET_PREDICATE_RESULT prior to executing the ring Userspace may leave predication enabled upon return from the batch buffer, which has the consequent of preventing all operation from the ring from being executed, including all the synchronisation, coherency control, arbitration and user signaling. This is more than just a local gpu hang in one client, as the user has the ability to prevent the kernel from applying critical workarounds and can cause a full GT reset. We could simply execute MI_SET_PREDICATE upon return from the user batch, but this has the repercussion of modifying the user's context state. Instead, we opt to execute a fixup batch which by mixing predicated operations can determine the state of the SET_PREDICATE_RESULT register and restore it prior to the next userspace batch. This allows us to protect the kernel's ring without changing the uABI. Suggested-by: Zbigniew Kempczynski <zbigniew.kempczynski@intel.com> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com> Cc: Zbigniew Kempczynski <zbigniew.kempczynski@intel.com> Cc: Thomas Hellstrom <thomas.hellstrom@intel.com> Signed-off-by: Ramalingam C <ramalingam.c@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220425152317.4275-4-ramalingam.c@intel.com	2022-05-02 15:18:09 +05:30
Umesh Nerlige Ramappa	ad6ade8e34	drm/i915/pmu: Use existing uncore helper to read gpm_timestamp Use intel_uncore_read64_2x32 to read upper and lower fields of the GPM timestamp. v2: Fix compile error Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220427003515.3944267-1-umesh.nerlige.ramappa@intel.com	2022-04-28 12:30:32 -07:00
Matthew Brost	717f9bad5d	drm/i915/dg2: Enable Wa_14014475959 - RCS / CCS context exit There is bug in DG2 where if the CCS contexts switches out while the RCS is running it can cause memory corruption. To workaround this add an atomic to a memory address with a value 1 and semaphore wait to the same address for a value of 0. The GuC firmware is responsible for writing 0 to the memory address when it is safe for the context to switch out. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220415224025.3693037-6-umesh.nerlige.ramappa@intel.com	2022-04-19 11:33:47 -07:00
Umesh Nerlige Ramappa	dac3838109	drm/i915/guc: Enable Wa_22011802037 for gen12 GuC based platforms Initiating a reset when the command streamer is not idle or in the middle of executing an MI_FORCE_WAKE can result in a hang. Multiple command streamers can be part of a single reset domain, so resetting one would mean resetting all command streamers in that domain. To workaround this, before initiating a reset, ensure that all command streamers within that reset domain are either IDLE or are not executing a MI_FORCE_WAKE. Enable GuC PRE_PARSER WA bit so that GuC follows the WA sequence when initiating engine-resets. For gt-resets, ensure that i915 applies the WA sequence. Opens to address in future patches: - The part of the WA to wait for pending forcewakes is also applicable to execlists backend. - The WA also needs to be applied for gen11 Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220415224025.3693037-3-umesh.nerlige.ramappa@intel.com	2022-04-19 11:33:45 -07:00
John Harrison	2584b3549f	drm/i915/guc: Update to GuC version 70.1.1 The latest GuC firmware drops the context descriptor pool in favour of passing all creation data in the create H2G. It also greatly simplifies the work queue and removes the process descriptor used for multi-LRC submission. So, remove all mention of LRC and process descriptors and update the registration code accordingly. Unfortunately, the new API also removes the ability to set default values for the scheduling policies at context registration time. Instead, a follow up H2G must be sent. The individual scheduling policy update H2G commands are also dropped in favour of a single KLV based H2G. So, change the update wrappers accordingly and call this during context registration.. Of course, this second H2G per registration might fail due to being backed up. The registration code has a complicated state machine to cope with the actual registration call failing. However, if that works then there is no support for unwinding if a further call should fail. Unwinding would require sending a H2G to de-register - but that can't be done because the CTB is already backed up. So instead, add a new flag to say whether the context has a pending policy update. This is set if the policy H2G fails at registration time. The submission code checks for this flag and retries the policy update if set. If that call fails, the submission path early exists with a retry error. This is something that is already supported for other reasons. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220412225955.1802543-2-John.C.Harrison@Intel.com	2022-04-14 17:00:35 -07:00
Alan Previn	a0f1f7b4f7	drm/i915/guc: Print the GuC error capture output register list. Print the GuC captured error state register list (string names and values) when gpu_coredump_state printout is invoked via the i915 debugfs for flushing the gpu error-state that was captured prior. Since GuC could have reported multiple engine register dumps in a single notification event, parse the captured data (appearing as a stream of structures) to identify each dump as a different 'engine-capture-group-output'. Finally, for each 'engine-capture-group-output' that is found, verify if the engine register dump corresponds to the engine_coredump content that was previously populated by the i915_gpu_coredump function. That function would have copied the context's vma's including the bacth buffer during the G2H-context-reset notification that occurred earlier. Perform this verification check by comparing guc_id, lrca and engine- instance obtained from the 'engine-capture-group-output' vs a copy of that same info taken during i915_gpu_coredump. If they match, then print those vma's as well (such as the batch buffers). NOTE: the output format was verified using the gem_exec_capture IGT test. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220321164527.2500062-14-alan.previn.teres.alexis@intel.com	2022-03-22 10:33:31 -07:00
Alan Previn	a6f0f9cf33	drm/i915/guc: Plumb GuC-capture into gpu_coredump Add a flags parameter through all of the coredump creation functions. Add a bitmask flag to indicate if the top level gpu_coredump event is triggered in response to a GuC context reset notification. Using that flag, ensure all coredump functions that read or print mmio-register values related to work submission or command-streamer engines are skipped and replaced with a calls guc-capture module equivalent functions to retrieve or print the register dump. While here, split out display related register reading and printing into its own function that is called agnostic to whether GuC had triggered the reset. For now, introduce an empty printing function that can filled in on a subsequent patch just to handle formatting. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220321164527.2500062-13-alan.previn.teres.alexis@intel.com	2022-03-22 10:33:31 -07:00
Alan Previn	f5718a7265	drm/i915/guc: Extract GuC error capture lists on G2H notification. - Upon the G2H Notify-Err-Capture event, parse through the GuC Log Buffer (error-capture-subregion) and generate one or more capture-nodes. A single node represents a single "engine- instance-capture-dump" and contains at least 3 register lists: global, engine-class and engine-instance. An internal link list is maintained to store one or more nodes. - Because the link-list node generation happen before the call to i915_gpu_codedump, duplicate global and engine-class register lists for each engine-instance register dump if we find dependent-engine resets in a engine-capture-group. - When i915_gpu_coredump calls into capture_engine, (in a subsequent patch) we detach the matching node (guc-id, LRCA, etc) from the link list above and attach it to i915_gpu_coredump's intel_engine_coredump structure when have matching LRCA/guc-id/engine-instance. Additional notes to be aware of: - GuC generates the error capture dump into the GuC log buffer but this buffer is one big log buffer with 3 independent subregions within it. Each subregion is populated with different content and used in different ways and timings but all regions operate behave as independent ring buffers. Each guc-log subregion (general-logs, crash-dump and error- capture) has it's own guc_log_buffer_state that contain independent read and write pointers. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220321164527.2500062-11-alan.previn.teres.alexis@intel.com	2022-03-22 10:33:30 -07:00
Michael Cheng	61c5ed946d	drm/i915/gt: replace cache_clflush_range Replace all occurrence of cache_clflush_range with drm_clflush_virt_range. This will prevent compile errors on non-x86 platforms. Signed-off-by: Michael Cheng <michael.cheng@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220321223819.72833-6-michael.cheng@intel.com	2022-03-22 10:10:53 -07:00
Matt Roper	f9576e36c6	drm/i915/xehp: Support platforms with CCS engines but no RCS In the past we've always assumed that an RCS engine is present on every platform. However now that we have compute engines there may be platforms that have CCS engines but no RCS, or platforms that are designed to have both, but have the RCS engine fused off. Various engine-centric initialization that only needs to be done a single time for the group of RCS+CCS engines can't rely on being setup with the RCS now; instead we add a I915_ENGINE_FIRST_RENDER_COMPUTE flag that will be assigned to a single engine in the group; whichever engine has this flag will be responsible for some of the general setup (RCU_MODE programming, initialization of certain workarounds, etc.). Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220303223435.2793124-1-matthew.d.roper@intel.com	2022-03-04 08:02:15 -08:00
John Harrison	e1dd871442	drm/i915/guc: Fix potential invalid pointer dereferences when decoding G2Hs Some G2H handlers were reading the context id field from the payload before checking the payload met the minimum length required. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220302003357.4188363-9-John.C.Harrison@Intel.com	2022-03-03 15:03:12 -08:00
John Harrison	77dcbffbb5	drm/i915/guc: Rename desc_idx to ctx_id The LRC descriptor pool is going away. So, stop naming context ids as descriptor pool indecies. While at it, add a bunch of missing line feeds to some error messages. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220302003357.4188363-7-John.C.Harrison@Intel.com	2022-03-03 15:03:10 -08:00
John Harrison	8e2e9c435e	drm/i915/guc: Move lrc desc setup to where it is needed The LRC descriptor was being initialised early on in the context registration sequence. It could then be determined that the actual registration needs to be delayed and the descriptor would be wiped out. This is inefficient, so move the setup to later in the process after the point of no return. v2: Move some split changes into the split patch (and do them correctly). Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220302003357.4188363-6-John.C.Harrison@Intel.com	2022-03-03 15:03:09 -08:00
John Harrison	58ea7d620c	drm/i915/guc: Split guc_lrc_desc_pin apart The LRC descriptor pool is going away. Further, the function that was populating it was also doing a bunch of logic about the context registration sequence. So, split that code apart into separate state setup and try to register functions. Note that some of those 'try to register' code paths actually undo the state setup and leave it to be redone again later (with potentially different values). This is inefficient. The next patch will correct this. Also, move a comment about ignoring return values to the place where the return values are actually ignored. v2: Move some more splitting from a later patch (and do it correctly). Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220302003357.4188363-5-John.C.Harrison@Intel.com	2022-03-03 15:03:08 -08:00
John Harrison	d124902242	drm/i915/guc: Better name for context id limit The LRC descriptor pool is going away. So, stop using it as the limit for how many context ids are available. Instead, size the pool according to the number of contexts allowed. Note that this is just a naming change, the actual limit is identical in value. While at it, also update a kzalloc(sizeof()*count) to be a kcalloc(count,size). Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220302003357.4188363-4-John.C.Harrison@Intel.com	2022-03-03 15:03:06 -08:00
John Harrison	09570c5010	drm/i915/guc: Add an explicit 'submission_initialized' flag The LRC descriptor pool is going away. So, stop using it as a check for whether submission has been initialised or not. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220302003357.4188363-3-John.C.Harrison@Intel.com	2022-03-03 15:03:04 -08:00
John Harrison	02942b4213	drm/i915/guc: Do not conflate lrc_desc with GuC id for registration The LRC descriptor pool is going away. So, stop using it as a check for context registration, use the GuC id instead (being the thing that actually gets registered with the GuC). Also, rename the set/clear/query helper functions for context id mappings to better reflect their purpose and to differentiate from other registration related helper functions. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220302003357.4188363-2-John.C.Harrison@Intel.com	2022-03-03 15:03:03 -08:00
Matt Roper	87cb6d80f2	drm/i915/xehp: Enable ccs/dual-ctx in RCU_MODE We have to specify in the Render Control Unit Mode register when CCS is enabled. v2: - Move RCU_MODE programming to a helper function. (Tvrtko) - Clean up and clarify comments. (Tvrtko) - Add RCU_MODE to the GuC save/restore list. (Daniele) v3: - Move this patch before the GuC ADS update to enable compute engines; the definition of RCU_MODE and its insertion into the save/restore list moves to this patch. (Daniele) v4: - Call xehp_enable_ccs_engines() directly in guc_resume() and execlists_resume() rather than adding an extra layer of wrapping to the engine->resume() vfunc. (Umesh) Bspec: 46034 Original-author: Michel Thierry Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220302001554.1836066-1-matthew.d.roper@intel.com	2022-03-02 06:45:21 -08:00
Matt Roper	c674c5b934	drm/i915/xehp: CCS should use RCS setup functions The compute engine handles the same commands the render engine can (except 3D pipeline), so it makes sense that CCS is more similar to RCS than non-render engines. The CCS context state (lrc) is also similar to the render one, so reuse it. Note that the compute engine has its own CTX_R_PWR_CLK_STATE register. In order to avoid having multiple RCS && CCS checks, add the following engine flag: - I915_ENGINE_HAS_RCS_REG_STATE - use the render (larger) reg state ctx. BSpec: 46260 Original-author: Michel Thierry Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220301231549.1817978-6-matthew.d.roper@intel.com	2022-03-02 06:45:19 -08:00
John Harrison	e2a1e7abae	drm/i915/guc: Do not complain about stale reset notifications It is possible for reset notifications to arrive for a context that is in the process of being banned. So don't flag these as an error, just report it as informational (because it is still useful to know that resets are happening even if they are being ignored). v2: Better wording for the message (review feedback from Tvrtko). v3: Fix rebase issue (review feedback from Daniele). Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220225015232.1939497-1-John.C.Harrison@Intel.com	2022-03-01 14:14:20 -08:00
Daniele Ceraolo Spurio	e068ef3fd5	drm/i915/guc: Initialize GuC submission locks and queues early Move initialization of submission-related spinlock, lists and workers to init_early. This fixes an issue where if the GuC init fails we might still try to get the lock in the context cleanup code. Note that it is safe to call the GuC context cleanup code even if the init failed because all contexts are initialized with an invalid GuC ID, which will cause the GuC side of the cleanup to be skipped, so it is easier to just make sure the variables are initialized than to special case the cleanup to handle the case when they're not. References: https://gitlab.freedesktop.org/drm/intel/-/issues/4932 Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220215011123.734572-1-daniele.ceraolospurio@intel.com	2022-03-01 10:33:51 -08:00

1 2 3 4 5

204 Commits