Now that the tasklet completely controls scheduling of the requests, and
we postpone scheduling out the old requests, we can keep a hanging
virtual request bound to the engine on which it hung, and remove it from
te queue. On release, it will be returned to the same engine and remain
in its queue until it is scheduled; after which point it will become
eligible for transfer to a sibling. Instead, we could opt to resubmit the
request along the virtual engine on unhold, making it eligible for load
balancing immediately -- but that seems like a pointless optimisation
for a hanging context.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201224135544.1713-8-chris@chris-wilson.co.uk
Having recognised that we do not change the sibling until we schedule
out, we can then defer the decision to resubmit the virtual engine from
the unwind of the active queue to scheduling out of the virtual context.
This improves our resilence in virtual engine scheduling, and should
eliminate the rare cases of gem_exec_balance failing.
By keeping the unwind order intact on the local engine, we can preserve
data dependency ordering while doing a preempt-to-busy pass until we
have determined the new ELSP. This means that if we try to timeslice
between a virtual engine and a data-dependent ordinary request, the pair
will maintain their relative ordering and we will avoid the
resubmission, cancelling the timeslicing until further change.
The dilemma though is that we then may end up in a situation where the
'demotion' of the virtual request to an ordinary request in the engine
queue results in filling the ELSP[] with virtual requests instead of
spreading the load across the engines. To compensate for this, we mark
each virtual request and refuse to resubmit a virtual request in the
secondary ELSP slots, thus forcing subsequent virtual requests to be
scheduled out after timeslicing. By delaying the decision until we
schedule out, we will avoid unnecessary resubmission.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2079
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2098
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201224135544.1713-7-chris@chris-wilson.co.uk
The issue with stale virtual breadcrumbs remain. Now we have the problem
that if the irq-signaler is still referencing the stale breadcrumb as we
transfer it to a new sibling, the list becomes spaghetti. This is a very
small window, but that doesn't stop it being hit infrequently. To
prevent the lists being tangled (the iterator starting on one engine's
b->signalers but walking onto another list), always decouple the virtual
breadcrumb on schedule-out and make sure that the walker has stepped out
of the lists.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201224135544.1713-5-chris@chris-wilson.co.uk
Inside schedule_out, we do extra work upon idling the context, such as
updating the runtime, kicking off retires, kicking virtual engines.
However, if we are in a series of processing single requests per
contexts, we may find ourselves scheduling out the context, only to
immediately schedule it back in during dequeue. This is just extra work
that we can avoid if we keep the context marked as inflight across the
dequeue. This becomes more significant later on for minimising virtual
engine misses.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201224135544.1713-4-chris@chris-wilson.co.uk
Once a virtual engine has been bound to a sibling, it will remain bound
until we finally schedule out the last active request. We can not rebind
the context to a new sibling while it is inflight as the context save
will conflict, hence we wait. As we cannot then use any other sibliing
while the context is inflight, only kick the bound sibling while it
inflight and upon scheduling out the kick the rest (so that we can swap
engines on timeslicing if the previously bound engine becomes
oversubscribed).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201224135544.1713-3-chris@chris-wilson.co.uk
Rather than having special case code for opportunistically calling
process_csb() and performing a direct submit while holding the engine
spinlock for submitting the request, simply call the tasklet directly.
This allows us to retain the direct submission path, including the CS
draining to allow fast/immediate submissions, without requiring any
duplicated code paths, and most importantly greatly simplifying the
control flow by removing reentrancy. This will enable us to close a few
races in the virtual engines in the next few patches.
The trickiest part here is to ensure that paired operations (such as
schedule_in/schedule_out) remain under consistent locking domains,
e.g. when pulled outside of the engine->active.lock
v2: Use bh kicking, see commit 3c53776e29 ("Mark HI and TASKLET
softirq synchronous").
v3: Update engine-reset to be tasklet aware
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201224135544.1713-1-chris@chris-wilson.co.uk
We assume that both timestamps are driven off the same clock [reported
to userspace as I915_PARAM_CS_TIMESTAMP_FREQUENCY]. Verify that this is
so by reading the timestamp registers around a busywait (on an otherwise
idle engine so there should be no preemptions).
v2: Icelake (not ehl, nor tgl) seems to be using a fixed 80ns interval
for, and only for, CTX_TIMESTAMP -- or it may be GPU frequency and the
test is always running at maximum frequency?. As far as I can tell, this
isolated change in behaviour is undocumented.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201223122359.22562-1-chris@chris-wilson.co.uk
Adjust a jump target so that a bit of exception handling can be better
reused at the end of this function.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* Return directly after a call of the function “kzalloc” failed
at the beginning.
* Delete a label which became unnecessary with this refactoring.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We can have the same src ids for different client ids so make sure to
check both the client id and the source id when handling interrupts.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We need to take into account the client id otherwise we'll end
up sending generic events for any src id that is registered.
We only support irq domains on pre-soc15 parts so client is
always legacy.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This is supported by Sienna Cichlid, Navy Flounder and Dimgrey
Cavefish. For these ASICs, the target voltage calculation can be
illustrated by "voltage = voltage calculated from v/f curve +
overdrive vddgfx offset".
V2: limit the smu_version check for Sienna Cichlid only
Here are some sample usages about this new OD setting:
1. Check current vddgfx offset setting by
cat /sys/class/drm/card0/device/pp_od_clk_voltage
...
...
OD_VDDGFX_OFFSET:
0mV
...
...
2. Set new vddgfx offset by
echo "vo 10" > /sys/class/drm/card0/device/pp_od_clk_voltage
cat /sys/class/drm/card0/device/pp_od_clk_voltage
...
...
OD_VDDGFX_OFFSET:
10mV
...
...
3. Commit the new setting by
echo "c" > /sys/class/drm/card0/device/pp_od_clk_voltage
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When user specifies a reduced WGP(CU) config via disalbe_cu module
parameter, this does not disable the clocks which uses additional
power. This interface send active WGP number to SMU and SMU will
cooperate with RLC to power off relative WGPs.
v2: Add request active WGPs in Vangogh smu post init.
Signed-off-by: Jinzhou.Su <Jinzhou.Su@amd.com>
Reviewed-by: Jiansong Chen <Jiansong.Chen@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For all the ASICs that integrate psp v11, vega20
doesn't support ih reroute. arcturus and later will
allow kernel driver to program ih_cfg_index/data
through mmio directly. navi1x and onwards will only
support grb_ih_set command in sriov configuration.
psp_v11_0_reroute_ih is not needed any more.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
IH_CHICKEN.MC_SPACE_FBPA_ENABLE field is only
valid when IH_RB_CNTL.MC_SPACE is programed to 0x3,
frame buffer physical address. For both bus address
and gpu virtual address, don't program MC_SPACE_FBPA_ENABLE
field
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>