In order to avoid the DSB keeping the DEwake permanently
asserted we must clear DSB_PMCTRL_2.DSB_FORCE_DEWAKE once
we are done. For good measure do the same for
DSB_PMCTRL.DSB_ENABLE_DEWAKE.
Experimentally this doens't seem to be actually necessary
(unlike with DSB_FORCE_DEWAKE). That is, the DSB_ENABLE_DEWAKE
doesn't seem to do anything whenever the DSB is not active.
But I'd hate to waste a ton of power in case there I'm wrong
and there is some way DEwake could remaing asserted. One extra
register write is a small price to pay for some peace of mind.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240624191032.27333-13-ville.syrjala@linux.intel.com
Reviewed-by: Animesh Manna <animesh.manna@intel.com>
Allow intel_dsb_chain() to start the chained DSB
at start of the undelaye vblank. This is slightly
more involved than simply setting the bit as we
must use the DEwake mechanism to eliminate pkgC
latency.
And DSB_ENABLE_DEWAKE itself is problematic in that
it allows us to configure just a single scanline,
and if the current scanline is already past that
DSB_ENABLE_DEWAKE won't do anything, rendering the
whole thing moot.
The current workaround involves checking the pipe's current
scanline with the CPU, and if it looks like we're about to
miss the configured DEwake scanline we set DSB_FORCE_DEWAKE
to immediately assert DEwake. This is somewhat racy since the
hardware is making progress all the while we're checking it on
the CPU.
We can make things less racy by chaining two DSBs and handling
the DSB_FORCE_DEWAKE stuff entirely without CPU involvement:
1. CPU starts the first DSB immediately
2. First DSB configures the second DSB, including its dewake_scanline
3. First DSB starts the second w/ DSB_WAIT_FOR_VBLANK
4. First DSB asserts DSB_FORCE_DEWAKE
5. First DSB waits until we're outside the dewake_scanline-vblank_start
window
6. First DSB deasserts DSB_FORCE_DEWAKE
That will guarantee that the we are fully awake when the second
DSB starts to actually execute.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240624191032.27333-12-ville.syrjala@linux.intel.com
Reviewed-by: Animesh Manna <animesh.manna@intel.com>
Add functions to emit a DSB scanline window wait instructions.
We can either wait for the scanline to be IN the window
or OUT of the window.
The hardware doesn't handle wraparound so we must manually
deal with it by swapping the IN range to the inverse OUT
range, or vice versa.
Also add a bit of paranoia to catch the edge case of waiting
for the entire frame. That doesn't make sense since an IN
wait would be a nop, and an OUT wait would imply waiting
forever. Most of the time this also results in both scanline
ranges (original and inverted) to have lower=upper+1
which is nonsense from the hw POV.
For now we are only handling the case where the scanline wait
happens prior to latching the double buffered registers during
the commit (which might change the timings due to LRR/VRR/etc.)
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240624191032.27333-10-ville.syrjala@linux.intel.com
Reviewed-by: Animesh Manna <animesh.manna@intel.com>
When determining various scanlines for DSB use we should take into
account whether VRR is active at the time when the DSB uses said
scanline information. For now all DSB scanline usage occurs prior
to the actual commit, so we only need to care about the state of
VRR at that time.
I've decided to move intel_crtc_scanline_to_hw() in its entirety
to the DSB code as it will also need to know the actual state
of VRR in order to do its job 100% correctly.
TODO: figure out how much of this could be moved to some
more generic place and perhaps be shared with the CPU
vblank evasion code/etc...
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240624191032.27333-8-ville.syrjala@linux.intel.com
Reviewed-by: Animesh Manna <animesh.manna@intel.com>
Currently we calculate the DEwake scanline based on
the delayed vblank start, while in reality it should be computed
based on the undelayed vblank start (as that is where the DSB
actually starts). Currently it doesn't really matter as we
don't have any vblank delay configured, but that may change
in the future so let's be accurate in what we do.
We can also remove the max() as intel_crtc_scanline_to_hw()
can deal with negative numbers, which there really shouldn't
be anyway.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240624191032.27333-7-ville.syrjala@linux.intel.com
Reviewed-by: Animesh Manna <animesh.manna@intel.com>
Currently we switch from out software idea of a scanline
to the hw's idea of a scanline during the commit phase in
_intel_dsb_commit(). While that is slightly easier due to
fastsets fiddling with the timings, we'll also need to
generate proper hw scanline numbers already when emitting
DSB scanline wait instructions. So this approach won't
do in the future. Switch to hw scanline numbers earlier.
Also intel_dsb_dewake_scanline() itself already makes
some assumptions about VRR that don't take into account
VRR toggling during fastsets, so technically delaying
the sw->hw conversion doesn't even help us.
The other reason for delaying the conversion was that we
are using intel_get_crtc_scanline() during intel_dsb_commit()
which gives us the current sw scanline. But this is pretty
low level stuff anyway so just using raw PIPEDSL reads seems
fine here, and that of course gives us the hw scanline
directly, reducing the need to do so many conversions.
v2: Return the non-hw scanline from intel_dsb_dewake_scanline()
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240624191032.27333-5-ville.syrjala@linux.intel.com
Looks like the undelayed vblank gets signalled exactly when
the active period ends. That is a problem for DSB+VRR when
we are already in vblank and expect DSB to start executing
as soon as we send the push. Instead of starting, the DSB
just keeps on waiting for the undelayed vblank which won't
signal until the end of the next frame's active period,
which is far too late.
The end result is that DSB won't have even started
executing by the time the flips/etc. have completed.
We then wait for an extra 1ms, after which we terminate
the DSB and report a timeout:
[drm] *ERROR* [CRTC:80:pipe A] DSB 0 timed out waiting for idle (current head=0xfedf4000, head=0x0, tail=0x1080)
To fix this let's configure DSB to use the so called VRR
"safe window" instead of the undelayed vblank to trigger
the DSB vblank logic, when VRR is enabled.
Cc: stable@vger.kernel.org
Fixes: 34d8311f4a ("drm/i915/dsb: Re-instate DSB for LUT updates")
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/9927
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240306040806.21697-3-ville.syrjala@linux.intel.com
Reviewed-by: Animesh Manna <animesh.manna@intel.com>
If fixed refresh rate program the PKGC_LATENCY register
with the highest latency from level 1 and above LP registers
and program ADDED_WAKE_TIME = DSB execution time.
else program PKGC_LATENCY with all 1's and ADDED_WAKE_TIME as 0.
This is used to improve package C residency by sending the highest
latency tolerance requirement (LTR) when the planes are done with the
frame until the next frame programming window (set context latency,
window 2) starts.
Bspec: 68986
--v2
-Fix indentation [Chaitanya]
--v3
-Take into account if fixed refrersh rate or not [Vinod]
-Added wake time dependengt on DSB execution time [Vinod]
-Use REG_FIELD_PREP [Jani]
-Call program_pkgc_latency from appropriate place [Jani]
-no need for the ~0 while setting max latency [Jani]
-change commit message to add the new changes made in.
--v4
-Remove extra blank line [Vinod]
-move the vrr.enable check to previous loop [Vinod]
Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com>
Reviewed-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
Reviewed-by: Vinod Govindapillai <vinod.govindapillai@intel.com>
Signed-off-by: Animesh Manna <animesh.manna@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240219063638.1467114-1-suraj.kandpal@intel.com
The display engine does not snoop the caches so we should mark
the DSB command buffer as I915_CACHE_NONE.
i915_gem_object_create_internal() always gives us I915_CACHE_LLC
on LLC platforms. And to make things 100% correct we should also
clflush at the end, if necessary.
Note that currently this is a non-issue as we always write the
command buffer through a WC mapping, so a cache flush is not actually
needed. But we might actually want to consider a WB mapping since
we also end up reading from the command buffer (in the indexed
reg write handling). Either that or we should do something else
to avoid those reads (might actually be even more sensible on DGFX
since we end up reading over PCIe). But we should measure the overhead
first...
Anyways, no real harm in adding the belts and suspenders here so
that the code will work correctly regardless of how we map the
buffer. If we do get a WC mapping (as we request)
i915_gem_object_flush_map() will be a nop. Well, apart form
a wmb() which may just flush the WC buffer a bit earlier
than would otherwise happen (at the latest the mmio accesses
would trigger the WC flush).
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231009132204.15098-2-ville.syrjala@linux.intel.com
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Using system memory for the DSB command buffer doesn't appear to work.
On DG2 it seems like the hardware internally replaces the actual memory
reads with zeroes, and so we end up executing a bunch of NOOPs instead
of whatever commands we put in the buffer. To determine that I measured
the time it takes to execute the instructions, and the results are
always more or less consistent with executing a buffer full of NOOPs
from local memory.
Another theory I considered was some kind of cache coherency issue.
Looks like i915_gem_object_pin_map_unlocked() will in fact give you a
WB mapping for system memory on DGFX regardless of what mapping mode
was requested (WC in case of the DSB code). But clflush did not
change the behaviour at all, so that theory seems moot.
On DG1 it looks like the hardware might actually be fetching data from
system memory as the logs indicate that we just get underruns. But that
is equally bad, so doesn't look like we can really use system memory on
DG1 either.
Thus always allocate the DSB command buffer from local memory on
discrete GPUs.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231009132204.15098-1-ville.syrjala@linux.intel.com
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Normally we could be in a deep PkgC state all the way up to the
point when DSB starts its execution at the transcoders undelayed
vblank. The DSB will then have to wait for the hardware to
wake up before it can execute anything. This will waste a huge
chunk of the vblank time just waiting, and risks the DSB execution
spilling into the vertical active period. That will be very bad,
especially when programming the LUTs as the anti-collision logic
will cause DSB to corrupt LUT writes during vertical active.
To avoid these problems we can instruct the DSB to pre-wake the
display engine on a specific scanline so that everything will
be 100% ready to go when we hit the transcoder's undelayed vblank.
One annoyance is that the scanline is specified as just that,
a single scanline. So if we happen to start the DSB execution
after passing said scanline no DEwake will happen and we may drop
back into some PkgC state before reaching the transcoder's undelayed
vblank. To prevent that we'll use the "force DEwake" bit to manually
force the display engine to stay awake. We'll then have to clear
the force bit again after the DSB is done (the force bit remains
effective even when the DSB is otherwise disabled).
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230606191504.18099-18-ville.syrjala@linux.intel.com
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
i915_gem_object_create_internal() does not hand out zeroed
memory. Thus we may confuse whatever stale garbage is in
there as a previous register write and mistakenly handle the
first actual register write as an indexed write. This can
end up corrupting the instruction sufficiently well to lose
the entire register write.
Make sure we've actually emitted a previous instruction before
attemting indexed register write merging.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230606191504.18099-7-ville.syrjala@linux.intel.com
Reviewed-by: Animesh Manna <animesh.manna@intel.com>
The caller should more or less know how many DSB commands it
wants to emit into the command buffer, so allow it to specify
the size of the command buffer rather than having the low level
DSB code guess it.
Technically we can emit as many as 134+1033 (for adl+ degamma +
10bit gamma) register writes but thanks to the DSB indexed register
write command we get significant space savings so the current size
estimate of 8KiB (~1024 DSB commands) is sufficient for now.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221216003810.13338-11-ville.syrjala@linux.intel.com
Reviewed-by: Animesh Manna <animesh.manna@intel.com>