I've seen this trigger twice now, where the i915_gem_object_to_ggtt()
call in intel_unpin_fb_obj() returns NULL, resulting in an oops
immediately afterwards as the (inlined) call to i915_vma_unpin_fence()
tries to dereference it.
It seems to be some race condition where the object is going away at
shutdown time, since both times happened when shutting down the X
server. The call chains were different:
- VT ioctl(KDSETMODE, KD_TEXT):
intel_cleanup_plane_fb+0x5b/0xa0 [i915]
drm_atomic_helper_cleanup_planes+0x6f/0x90 [drm_kms_helper]
intel_atomic_commit_tail+0x749/0xfe0 [i915]
intel_atomic_commit+0x3cb/0x4f0 [i915]
drm_atomic_commit+0x4b/0x50 [drm]
restore_fbdev_mode+0x14c/0x2a0 [drm_kms_helper]
drm_fb_helper_restore_fbdev_mode_unlocked+0x34/0x80 [drm_kms_helper]
drm_fb_helper_set_par+0x2d/0x60 [drm_kms_helper]
intel_fbdev_set_par+0x18/0x70 [i915]
fb_set_var+0x236/0x460
fbcon_blank+0x30f/0x350
do_unblank_screen+0xd2/0x1a0
vt_ioctl+0x507/0x12a0
tty_ioctl+0x355/0xc30
do_vfs_ioctl+0xa3/0x5e0
SyS_ioctl+0x79/0x90
entry_SYSCALL_64_fastpath+0x13/0x94
- i915 unpin_work workqueue:
intel_unpin_work_fn+0x58/0x140 [i915]
process_one_work+0x1f1/0x480
worker_thread+0x48/0x4d0
kthread+0x101/0x140
and this patch purely papers over the issue by adding a NULL pointer
check and a WARN_ON_ONCE() to avoid the oops that would then generally
make the machine unresponsive.
Other callers of i915_gem_object_to_ggtt() seem to also check for the
returned pointer being NULL and warn about it, so this clearly has
happened before in other places.
[ Reported it originally to the i915 developers on Jan 8, applying the
ugly workaround on my own now after triggering the problem for the
second time with no feedback.
This is likely to be the same bug reported as
https://bugs.freedesktop.org/show_bug.cgi?id=98829https://bugs.freedesktop.org/show_bug.cgi?id=99134
which has a patch for the underlying problem, but it hasn't gotten to
me, so I'm applying the workaround. ]
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
First set of i915 fixes for code in next.
* tag 'drm-intel-next-fixes-2016-12-22' of git://anongit.freedesktop.org/git/drm-intel:
drm/i915: skip the first 4k of stolen memory on everything >= gen8
drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping
drm/i915: Fix use after free in logical_render_ring_init
drm/i915: disable PSR by default on HSW/BDW
drm/i915: Fix setting of boost freq tunable
drm/i915: tune down the fast link training vs boot fail
drm/i915: Reorder phys backing storage release
drm/i915/gen9: Fix PCODE polling during SAGV disabling
drm/i915/gen9: Fix PCODE polling during CDCLK change notification
drm/i915/dsi: Fix chv_exec_gpio disabling the GPIOs it is setting
drm/i915/dsi: Fix swapping of MIPI_SEQ_DEASSERT_RESET / MIPI_SEQ_ASSERT_RESET
drm/i915/dsi: Do not clear DPOUNIT_CLOCK_GATE_DISABLE from vlv_init_display_clock_gating
drm/i915: drop the struct_mutex when wedged or trying to reset
commit 848496e590
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date: Wed Jul 13 16:32:03 2016 +0300
drm/i915: Wait up to 3ms for the pcu to ack the cdclk change request on SKL
increased the timeout to match the spec, but we still see a timeout on
at least one SKL. A CDCLK change request following the failed one will
succeed nevertheless.
I could reproduce this problem easily by running kms_pipe_crc_basic in a
loop. In all failure cases _wait_for() was pre-empted for >3ms and so in
the worst case - when the pre-emption happened right after calculating
timeout__ in _wait_for() - we called skl_cdclk_wait_for_pcu_ready() only
once which failed and so _wait_for() timed out. As opposed to this the
spec says to keep retrying the request for at most a 3ms period.
To fix this send the first request explicitly to guarantee that there is
3ms between the first and last request. Though this matches the spec, I
noticed that in rare cases this can still time out if we sent only a few
requests (in the worst case 2) _and_ PCODE is busy for some reason even
after a previous request and a 3ms delay. To work around this retry the
polling with pre-emption disabled to maximize the number of requests.
Also increase the timeout to 10ms to account for interrupts that could
reduce the number of requests. With this change I couldn't trigger
the problem.
v2:
- Use 1ms poll period instead of 10us. (Chris)
v3:
- Poll with pre-emption disabled to increase the number of request
attempts. (Ville, Chris)
- Factor out a helper to poll, it's also needed by the next patch.
v4:
- Pass reply_mask, reply to skl_pcode_request(), instead of assuming the
reply is generic. (Ville)
v5:
- List the request specific timeout values as code comment. (Ville)
v6:
- Try the poll first with preemption enabled.
- Add code comment about first request being queued by PCODE. (Art)
- Add timeout_base_ms argument. (Ville)
v7:
- Clarify code comment about first queued request. (Chris)
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Art Runyan <arthur.j.runyan@intel.com>
Cc: <stable@vger.kernel.org> # v4.2- : 3b2c171 : drm/i915: Wait up to 3ms
Cc: <stable@vger.kernel.org> # v4.2-
Fixes: 5d96d8afcf ("drm/i915/skl: Deinit/init the display at suspend/resume")
Reference: https://bugs.freedesktop.org/show_bug.cgi?id=97929
Testcase: igt/kms_pipe_crc_basic/suspend-read-crc-pipe-B
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1480955258-26311-1-git-send-email-imre.deak@intel.com
(cherry picked from commit a0b8a1fe34)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
first set of fixes for -next.
* tag 'drm-intel-next-fixes-2016-12-07' of git://anongit.freedesktop.org/git/drm-intel:
drm/i915: Move priority bumping for flips earlier
drm/i915: Hold a reference on the request for its fence chain
drm/i915/audio: fix hdmi audio noise issue
drm/i915/debugfs: Increment return value of gt.next_seqno
drm/i915/debugfs: Drop i915_hws_info
drm/i915: Initialize dev_priv->atomic_cdclk_freq at init time
drm/i915: Fix cdclk vs. dev_cdclk mess when not recomputing things
drm/i915: Make skl_write_{plane,cursor}_wm() static
drm/i915: Complete requests in nop_submit_request
drm/i915/gvt: fix lock not released bug for dispatch_workload() err path
drm/i915/gvt: fix getting 64bit bar size error
drm/i915/gvt: fix missing init param.primary
Big thing is that drm-misc is now officially a group maintainer/committer
model thing, with MAINTAINERS suitably updated. Otherwise just the usual
pile of misc things all over, nothing that stands out this time around.
* tag 'drm-misc-next-2016-11-29' of git://anongit.freedesktop.org/git/drm-misc: (33 commits)
drm: Introduce drm_framebuffer_assign()
drm/bridge: adv7511: Enable the audio data and clock pads on adv7533
drm/bridge: adv7511: Add Audio support
drm/edid: Consider alternate cea timings to be the same VIC
drm/atomic: Constify drm_atomic_crtc_needs_modeset()
drm: bridge: dw-hdmi: add ASoC dependency
drm: Fix shift operations for drm_fb_helper::drm_target_preferred()
drm: Avoid NULL dereference for DRM_LEGACY debug message
drm: Use u64_to_user_ptr() helper for blob ioctls
drm: Fix conflicting macro parameter in drm_mm_for_each_node_in_range()
drm: Fixup kernel doc for driver->gem_create_object
drm/hisilicon/hibmc: mark PM functions __maybe_unused
drm/hisilicon/hibmc: Checking for NULL instead of IS_ERR()
drm: bridge: add DesignWare HDMI I2S audio support
drm: Check against color expansion in drm_mm_reserve_node()
drm: Define drm_mm_for_each_node_in_range()
drm/doc: Fix links in drm_property.c
MAINTAINERS: Add link to drm-misc documentation
vgaarb: use valid dev pointer in vgaarb_info()
drm/atomic: Unconfuse the old_state mess in commmit_tail
...
Currently we only clflush the scanout if it is in the CPU domain. Also
flush if we have a pending CPU clflush. We also want to treat the
dirtyfb path similar, and flush any pending writes there as well.
v2: Only send the fb flush message if flushing the dirt on flip
v3: Make flush-for-flip and dirtyfb look more alike since they serve
similar roles as end-of-frame marker.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> #v2
Link: http://patchwork.freedesktop.org/patch/msgid/20161118211747.25197-1-chris@chris-wilson.co.uk
Kernel pointer does not sound like an useful thing to log and
pipe name is already contained in the crtc name.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Tvrtko needs
commit b3c11ac267
Author: Eric Engestrom <eric@engestrom.ch>
Date: Sat Nov 12 01:12:56 2016 +0000
drm: move allocation out of drm_get_format_name()
to be able to apply his patches without conflicts.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
It has been suggested that having per-plane modifiers is making life
more difficult for userspace, so let's just retire modifier[1-3] and
use modifier[0] to apply to the entire framebuffer.
Obviosuly this means that if individual planes need different tiling
layouts and whatnot we will need a new modifier for each combination
of planes with different tiling layouts.
For a bit of extra backwards compatilbilty the kernel will allow
non-zero modifier[1+] but it require that they will match modifier[0].
This in case there's existing userspace out there that sets
modifier[1+] to something non-zero with planar formats.
Mostly a cocci job, with a bit of manual stuff mixed in.
@@
struct drm_framebuffer *fb;
expression E;
@@
- fb->modifier[E]
+ fb->modifier
@@
struct drm_framebuffer fb;
expression E;
@@
- fb.modifier[E]
+ fb.modifier
Cc: Kristian Høgsberg <hoegsberg@gmail.com>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Cc: dczaplejewicz@collabora.co.uk
Suggested-by: Kristian Høgsberg <hoegsberg@gmail.com>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Acked-by: Daniel Stone <daniels@collabora.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1479295996-26246-1-git-send-email-ville.syrjala@linux.intel.com
dev_priv->hw_ddb is only used by skl_update_crtcs, but the ddb
allocation for each pipe is calculated in crtc_state.
We can rid of the global member by looking at crtc_state.
Do this by saving all active old ddb allocations from the old crtc_state
in an array, and then point them to the new allocation every time we update
a crtc.
This will allow us to keep track of the intermediate ddb allocations,
which is what hw_ddb was previously used for. With hw_ddb gone all
SKL-style watermark values are properly maintained only in crtc_state.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1478609742-13603-5-git-send-email-maarten.lankhorst@linux.intel.com
[mlankhorst: Reword commit message.]
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
This is the last bit required for making nonblocking modesets work
correctly. The state in intel_crtc->hw_ddb is updated in the
nonblocking part of a nonblocking commit.
This means that even attempting a commit before a nonblocking modeset
completes will fail, because intel_crtc->hw_ddb still has stale values.
The stale values are 0 if the crtc is being enabled resulting in a
failure during atomic check, but it may also result in double use of
ddb allocations.
Fix this by explicitly copying the ddb allocation from the old state.
This has to be done explicitly, because a modeset that doesn't change
active pipes, or a modeset converted to a fastset will will clear the
current state.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1478609742-13603-4-git-send-email-maarten.lankhorst@linux.intel.com
[mlankhorst: Reword commit message.]
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
The watermark updates for SKL style watermarks are no longer done
in the plane callbacks, but are now called in a separate watermark
update function that's called during the same vblank evasion,
before the plane updates.
This also gets rid of the global skl_results, which was required for
keeping track of the current atomic commit.
Changes since v1:
- Move line unwrap to correct patch. (Lyude)
- Make sure we don't regress ILK watermarks. (Matt)
- Rephrase commit message. (Matt)
Changes since v2:
- Fix disable watermark check to use the correct way to determine single
step watermark support.
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lyude <cpaul@redhat.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1478609742-13603-3-git-send-email-maarten.lankhorst@linux.intel.com
[mlankhorst: Small whitespace fix in skl_initial_wm]
Allow the driver to write watermarks during atomic evasion.
This will make it possible to write the watermarks in a cleaner
way on gen9+.
intel_atomic_state is not used here yet, but will be used when
we program all watermarks as a separate step during evasion.
This also writes linetime all the time, while before it was only
done during plane updates. This looks like this could be a bugfix,
but I'm not sure what it affects.
Changes since v1:
- Add comment about atomic evasion to commit message.
- Unwrap I915_WRITE call. (Lyude)
Changes since v2:
- Rename atomic_evade_watermarks to atomic_update_watermarks. (Ville)
- Add line wraps where appropriate, fix grammar in commit message. (Matt)
Changes since v3:
- Actually fix commit message. (Matt)
- Line wrap calls to watermark update functions. (Matt)
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1478609742-13603-2-git-send-email-maarten.lankhorst@linux.intel.com
The function's behaviour was changed in 90844f0004, without changing
its signature, causing people to keep using it the old way without
realising they were now leaking memory.
Rob Clark also noticed it was also allocating GFP_KERNEL memory in
atomic contexts, breaking them.
Instead of having to allocate GFP_ATOMIC memory and fixing the callers
to make them cleanup the memory afterwards, let's change the function's
signature by having the caller take care of the memory and passing it to
the function.
The new parameter is a single-field struct in order to enforce the size
of its buffer and help callers to correctly manage their memory.
Fixes: 90844f0004 ("drm: make drm_get_format_name thread-safe")
Cc: Rob Clark <robdclark@gmail.com>
Cc: Christian König <christian.koenig@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Sinclair Yeh <syeh@vmware.com> (vmwgfx)
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161112011309.9799-1-eric@engestrom.ch
After this patch only conversion of INTEL_INFO(p)->gen to
INTEL_GEN(dev_priv) remains before the __I915__ macro can
be removed.
v2: Tidy vlv_compute_wm. (David Weinehall)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: David Weinehall <david.weinehall@linux.intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>