linux

mirror of https://github.com/raspberrypi/linux.git synced 2025-12-11 04:20:06 +00:00

Author	SHA1	Message	Date
Philipp Stanner	9df13c356d	drm/sched: Clarify docu concerning drm_sched_job_arm() The documentation for drm_sched_job_arm() and especially drm_sched_job_cleanup() does not make it very clear why drm_sched_job_arm() is a point of no return, which it indeed is. Make the nature of drm_sched_job_arm() in the docu as clear as possible. Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250313093053.65001-2-phasta@kernel.org	2025-03-14 10:02:49 +01:00
Christian König	c67c0fef5d	drm/sched: revert "drm_sched_job_cleanup(): correct false doc" This reverts commit `44d2f310f0`. The function drm_sched_job_arm() is indeed the point of no return. The background is that it is nearly impossible for the driver to correctly retract the fence and signal it in the order enforced by the dma_fence framework. The code in drm_sched_job_cleanup() is for the purpose to cleanup after the job was armed through drm_sched_job_arm() and processed by the scheduler. We can certainly improve the documentation, but removing the warning is clearly not a good idea. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250312134400.2176393-1-christian.koenig@amd.com	2025-03-12 15:02:12 +01:00
Philipp Stanner	72ebc18b34	drm/sched: Document run_job() refcount hazard drm_sched_backend_ops.run_job() returns a dma_fence for the scheduler. That fence is signalled by the driver once the hardware completed the associated job. The scheduler does not increment the reference count on that fence, but implicitly expects to inherit this fence from run_job(). This is relatively subtle and prone to misunderstandings. This implies that, to keep a reference for itself, a driver needs to call dma_fence_get() in addition to dma_fence_init() in that callback. It's further complicated by the fact that the scheduler even decrements the refcount in drm_sched_run_job_work() since it created a new reference in drm_sched_fence_scheduled(). It does, however, still use its pointer to the fence after calling dma_fence_put() - which is safe because of the aforementioned new reference, but actually still violates the refcounting rules. Move the call to dma_fence_put() to the position behind the last usage of the fence. Suggested-by: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Philipp Stanner <pstanner@redhat.com> Reviewed-by: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250305130551.136682-4-phasta@kernel.org	2025-03-06 16:36:22 +01:00
Philipp Stanner	44d2f310f0	drm/sched: drm_sched_job_cleanup(): correct false doc drm_sched_job_cleanup()'s documentation claims that calling drm_sched_job_arm() is a "point of no return", implying that afterwards a job cannot be cancelled anymore. This is not correct, as proven by the function's code itself, which takes a previous call to drm_sched_job_arm() into account. In truth, the decisive factors are whether fences have been shared (e.g., with other processes) and if the job has been submitted to an entity already. Correct the wrong docstring. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250304141346.102683-2-phasta@kernel.org	2025-03-05 14:13:59 +01:00
Jani Nikula	e5f3081291	drm/sched: stop passing non struct drm_device to drm_err() and friends The expectation is that the struct drm_device based logging helpers get passed an actual struct drm_device pointer rather than some random struct pointer where you can dereference the ->dev member. Convert drm_err(sched, ...) to dev_err(sched->dev, ...) and similar. This matches current usage, as struct drm_device is not available, but drops "[drm]" or "[drm] ERROR" prefix from logging. Unfortunately, there's no dev_WARN_ON(), so the conversion is not exactly the same. Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch> Acked-by: Philipp Stanner <phasta@kernel.org> Reviewed-by: Louis Chauvet <louis.chauvet@bootlin.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/fe441dd1469d2b03e6b2ff247078bdde2011c6e3.1737644530.git.jani.nikula@intel.com	2025-03-04 17:03:43 +02:00
Tvrtko Ursulin	b6eb664d89	drm/sched: Add internal job peek/pop API Idea is to add helpers for peeking and popping jobs from entities with the goal of decoupling the hidden assumption in the code that queue_node is the first element in struct drm_sched_job. That assumption usually comes in the form of: while ((job = to_drm_sched_job(spsc_queue_pop(&entity->job_queue)))) Which breaks if the queue_node is re-positioned due to_drm_sched_job being implemented with a container_of. This also allows us to remove duplicate definitions of to_drm_sched_job. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <phasta@kernel.org> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250221105038.79665-2-tvrtko.ursulin@igalia.com	2025-02-24 10:17:39 +01:00
Philipp Stanner	796a9f55a8	drm/sched: Use struct for drm_sched_init() params drm_sched_init() has a great many parameters and upcoming new functionality for the scheduler might add even more. Generally, the great number of parameters reduces readability and has already caused one missnaming, addressed in: commit `6f1cacf4eb` ("drm/nouveau: Improve variable name in nouveau_sched_init()"). Introduce a new struct for the scheduler init parameters and port all users. Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Acked-by: Matthew Brost <matthew.brost@intel.com> # for Xe Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> # for Panfrost and Panthor Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com> # for Etnaviv Reviewed-by: Frank Binns <frank.binns@imgtec.com> # for Imagination Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> # for Sched Reviewed-by: Maíra Canal <mcanal@igalia.com> # for v3d Reviewed-by: Danilo Krummrich <dakr@kernel.org> Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> # for amdxdna Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250211111422.21235-2-phasta@kernel.org	2025-02-12 11:59:52 +01:00
Tvrtko Ursulin	51678bb9a7	drm/sched: Add helper to check job dependencies Lets isolate scheduler internals from drivers such as pvr which currently walks the dependency array to look for fences. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Danilo Krummrich <dakr@redhat.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <pstanner@redhat.com> Reviewed-by: Matt Coster <matt.coster@imgtec.com> Acked-by: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250113103341.43914-1-tvrtko.ursulin@igalia.com	2025-01-20 09:20:21 +01:00
Tvrtko Ursulin	440aaf479c	drm/sched: Remove weak paused submission checks There is no need to check the boolean in the work item's prologues since the boolean can be set at any later time anyway. The helper which pauses submission sets it and synchronously cancels the work and helpers which queue the work check for the flag so all should be good. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Danilo Krummrich <dakr@redhat.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250114105942.64832-1-tvrtko.ursulin@igalia.com	2025-01-16 11:17:21 +01:00
Tvrtko Ursulin	573b73e5ac	drm/sched: Delete unused update_job_credits No driver is using the update_job_credits() schduler vfunc so lets remove it. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Danilo Krummrich <dakr@redhat.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <pstanner@redhat.com> Acked-by: Danilo Krummrich <dakr@kernel.org> Acked-by: Boris Brezillon <boris.brezillon@collabora.com> Acked-by: Matt Coster <matt.coster@imgtec.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250110111301.76909-1-tvrtko.ursulin@igalia.com	2025-01-13 10:35:44 +01:00
Bagas Sanjaya	314d44bc8e	drm/sched: Fix drm_sched_fini() docu generation Commit `baf4afc583` ("drm/sched: Improve teardown documentation") documents problems of drm_sched_fini() in form of a list. The checklist triggers htmldocs warning (but renders correctly in htmldocs output): Documentation/gpu/drm-mm:571: ./drivers/gpu/drm/scheduler/sched_main.c:1359: ERROR: Unexpected indentation. Separate the list from the preceding paragraph by a blank line to fix the warning. While at it, also end the aforementioned paragraph by a colon. Fixes: `baf4afc583` ("drm/sched: Improve teardown documentation") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/r/20241108175655.6d3fcfb7@canb.auug.org.au/ Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> [phasta: Adjust commit message] Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20241217034915.62594-1-bagasdotme@gmail.com	2024-12-19 15:57:30 +01:00
Philipp Stanner	baf4afc583	drm/sched: Improve teardown documentation If jobs are still enqueued in struct drm_gpu_scheduler.pending_list when drm_sched_fini() gets called, those jobs will be leaked since that function stops both job-submission and (automatic) job-cleanup. It is, thus, up to the driver to take care of preventing leaks. The related function drm_sched_wqueue_stop() also prevents automatic job cleanup. Those pitfals are not reflected in the documentation, currently. Explicitly inform about the leak problem in the docstring of drm_sched_fini(). Additionally, detail the purpose of drm_sched_wqueue_{start,stop} and hint at the consequences for automatic cleanup. Signed-off-by: Philipp Stanner <pstanner@redhat.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241105143137.71893-2-pstanner@redhat.com	2024-11-07 10:05:54 +01:00
Dave Airlie	30169bb645	Backmerge v6.12-rc6 of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into drm-next Backmerge Linus tree for some drm-fixes needed for msm and xe merges. Signed-off-by: Dave Airlie <airlied@redhat.com>	2024-11-04 14:25:33 +10:00
Philipp Stanner	2e0757012c	drm/sched: Document purpose of drm_sched_{start,stop} drm_sched_start()'s and drm_sched_stop()'s names suggest that those functions might be intended for actively starting and stopping the scheduler on initialization and teardown. They are, however, only used on timeout handling (reset recovery). The docstrings should reflect that to prevent confusion. Document those functions' purpose. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Philipp Stanner <pstanner@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241029133819.78696-2-pstanner@redhat.com	2024-10-31 12:48:49 +01:00
Matthew Brost	746ae46c11	drm/sched: Mark scheduler work queues with WQ_MEM_RECLAIM drm_gpu_scheduler.submit_wq is used to submit jobs, jobs are in the path of dma-fences, and dma-fences are in the path of reclaim. Mark scheduler work queue with WQ_MEM_RECLAIM to ensure forward progress during reclaim; without WQ_MEM_RECLAIM, work queues cannot make forward progress during reclaim. v2: - Fixes tags (Philipp) - Reword commit message (Philipp) Cc: Luben Tuikov <ltuikov89@gmail.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Philipp Stanner <pstanner@redhat.com> Cc: stable@vger.kernel.org Fixes: `34f50cc644` ("drm/sched: Use drm sched lockdep map for submit_wq") Fixes: `a6149f0393` ("drm/sched: Convert drm scheduler to use a work queue rather than kthread") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Philipp Stanner <pstanner@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241023235917.1836428-1-matthew.brost@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-10-28 14:12:56 -04:00
Philipp Stanner	3ae80b3757	drm/sched: warn about drm_sched_job_init()'s partial init drm_sched_job_init()'s name suggests that after the function succeeded, parameter "job" will be fully initialized. This is not the case; some members are only later set, notably drm_sched_job.sched by drm_sched_job_arm(). Document that drm_sched_job_init() does not set all struct members. Document the lifetime of drm_sched_job.sched. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Philipp Stanner <pstanner@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241023141530.113370-2-pstanner@redhat.com	2024-10-25 18:02:04 +02:00
Philipp Stanner	2320c9e6a7	drm/sched: memset() 'job' in drm_sched_job_init() drm_sched_job_init() has no control over how users allocate struct drm_sched_job. Unfortunately, the function can also not set some struct members such as job->sched. This could theoretically lead to UB by users dereferencing the struct's pointer members too early. It is easier to debug such issues if these pointers are initialized to NULL, so dereferencing them causes a NULL pointer exception. Accordingly, drm_sched_entity_init() does precisely that and initializes its struct with memset(). Initialize parameter "job" to 0 in drm_sched_job_init(). Signed-off-by: Philipp Stanner <pstanner@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241021105028.19794-2-pstanner@redhat.com Reviewed-by: Christian König <christian.koenig@amd.com>	2024-10-22 16:08:41 +02:00
Tvrtko Ursulin	134e71bd1e	drm/sched: Further optimise drm_sched_entity_push_job Having removed one re-lock cycle on the entity->lock in a patch titled "drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit larger refactoring we can do the same optimisation on the rq->lock. (Currently both drm_sched_rq_add_entity() and drm_sched_rq_update_fifo_locked() take and release the same lock.) To achieve this we make drm_sched_rq_update_fifo_locked() and drm_sched_rq_add_entity() expect the rq->lock to be held. We also align drm_sched_rq_update_fifo_locked(), drm_sched_rq_add_entity() and drm_sched_rq_remove_fifo_locked() function signatures, by adding rq as a parameter to the latter. v2: * Fix after rebase of the series. * Avoid naming inconsistency between drm_sched_rq_add/remove. (Christian) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Luben Tuikov <ltuikov89@gmail.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <pstanner@redhat.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Philipp Stanner <pstanner@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241016122013.7857-6-tursulin@igalia.com	2024-10-17 12:20:06 +02:00
Tvrtko Ursulin	f93126f5d5	drm/sched: Re-group and rename the entity run-queue lock When writing to a drm_sched_entity's run-queue, writers are protected through the lock drm_sched_entity.rq_lock. This naming, however, frequently collides with the separate internal lock of struct drm_sched_rq, resulting in uses like this: spin_lock(&entity->rq_lock); spin_lock(&entity->rq->lock); Rename drm_sched_entity.rq_lock to improve readability. While at it, re-order that struct's members to make it more obvious what the lock protects. v2: * Rename some rq_lock straddlers in kerneldoc, improve commit text. (Philipp) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Suggested-by: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Luben Tuikov <ltuikov89@gmail.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <pstanner@redhat.com> Reviewed-by: Christian König <christian.koenig@amd.com> [pstanner: Fix typo in docstring] Signed-off-by: Philipp Stanner <pstanner@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241016122013.7857-5-tursulin@igalia.com	2024-10-17 12:19:16 +02:00
Tvrtko Ursulin	6a313579ea	drm/sched: Stop setting current entity in FIFO mode It does not seem there is a need to set the current entity in FIFO mode since ot only serves as being a "cursor" in round-robin mode. Even if scheduling mode is changed at runtime the change in behaviour is simply to restart from the first entity, instead of continuing in RR mode from where FIFO left it, and that sounds completely fine. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Luben Tuikov <ltuikov89@gmail.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <pstanner@redhat.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Philipp Stanner <pstanner@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241016122013.7857-3-tursulin@igalia.com	2024-10-17 12:15:11 +02:00
Tvrtko Ursulin	d42a254633	drm/sched: Optimise drm_sched_entity_push_job In FIFO mode (which is the default), both drm_sched_entity_push_job() and drm_sched_rq_update_fifo(), where the latter calls the former, are currently taking and releasing the same entity->rq_lock. We can avoid that design inelegance, and also have a miniscule efficiency improvement on the submit from idle path, by introducing a new drm_sched_rq_update_fifo_locked() helper and pulling up the lock taking to its callers. v2: * Remove drm_sched_rq_update_fifo() altogether. (Christian) v3: * Improved commit message. (Philipp) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Luben Tuikov <ltuikov89@gmail.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <pstanner@redhat.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Philipp Stanner <pstanner@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241016122013.7857-2-tursulin@igalia.com	2024-10-17 12:15:11 +02:00
Dave Airlie	54bc1d3255	Merge tag 'drm-misc-next-2024-09-26' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.13: UAPI Changes: - panthor: Add realtime group priority and priority query. Cross-subsystem Changes: - Add Vivek Kasireddy as udmabuf maintainer. - Assorted udmabuf changes. - Device tree binding updates. - dmabuf documentation fixes. - Move drm_rect to drm core module from kms helper. Core Changes: - Update scheduler documentation and concurrency fixes. - drm/ci updates. - Add memory-agnostic fbdev client and client-agnostic setup helper. - Huge driver conversion for using the above. Driver Changes: - Assorted fixes to imx, panel/nt35510, sti, accel/ivpu, v3d, vkms, host1x. - Add panel quirks for AYA NEO panels. - Make module autoloading work for bridge/it6505 and mcde. - Add huge page support to v3d using a custom shmfs. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/a9b95e6f-9f35-464e-83f6-bda75b35ee0b@linux.intel.com	2024-10-09 11:58:39 +10:00
Dave Airlie	7fefa1edc2	Merge tag 'drm-misc-next-2024-09-20' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.12: UAPI Changes: - Add panthor/DEV_QUERY_TIMESTAMP_INFO query. Cross-subsystem Changes: - Updated dt bindings. - Add documentation explaining default errnos for fences. - Mark dma-buf heaps creation functions as __init. Core Changes: - Split DSC helpers from DP helpers. - Clang build fixes for drm/mm test. - Remove simple pipeline support for gem-vram, no longer any users left after converting bochs. - Add erno to drm_sched_start to distinguish between GPU and queue reset. - Add drm_framebuffer testcases. - Fix uninitialized spinlock acquisition with CONFIG_DRM_PANIC=n. - Use read_trylock instead of read_lock in dma_fence_begin_signalling to quiesce lockdep. Driver Changes: - Assorted small fixes and updates for tegra, host1x, imagination, nouveau, panfrost, panthor, panel/ili9341, mali, exynos, panel/samsung-s6e3fa7, ast, bridge/ti-sn65dsi86, panel/himax-hx83112a, bridge/tc358767, bridge/imx8mp-hdmi-tx, panel/khadas-ts050, panel/nt36523, panel/sony-acx565akm, kmb, accel/qaic, omap, v3d. - Add bridge/TI TDP158. - Assorted documentation updates. - Convert bochs from simple drm to gem shmem, and check modes against available memory. - Many VC4 fixes, most related to scaling and YUV support. - Convert some drivers to use SYSTEM_SLEEP_PM_OPS and RUNTIME_PM_OPS. - Rockchip 4k@60 support. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/445713a6-2427-4c53-8ec2-3a894ec62405@linux.intel.com	2024-10-09 09:03:46 +10:00
Matthew Brost	34f50cc644	drm/sched: Use drm sched lockdep map for submit_wq Avoid leaking a lockdep map on each drm sched creation and destruction by using a single lockdep map for all drm sched allocated submit_wq. v2: - Use alloc_ordered_workqueue_lockdep_map (Tejun) Cc: Luben Tuikov <ltuikov89@gmail.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Acked-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20241002131639.3425022-2-matthew.brost@intel.com Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>	2024-10-02 17:53:45 +02:00
Dave Airlie	43102a2012	Merge tag 'drm-misc-fixes-2024-09-26' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Short summary of fixes pull: atomic: - Use correct type when reading damage rectangles display: - Fix kernel docs dp-mst: - Fix DSC decompression detection hdmi: - Fix infoframe size panthor: - Fix locking sched: - Update maintainers - Fix race condition whne queueing up jobs sysfb: - Disable sysfb if framebuffer parent device is unknown vbox: - Fix VLA handling Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20240926121045.GA561653@localhost.localdomain	2024-10-01 08:15:55 +10:00
Shuicheng Lin	1e436f4fff	drm/scheduler: Improve documentation Function drm_sched_entity_push_job() doesn't have a return value, remove the return value description for it. Correct several other typo errors. v2 (Philipp): - more correction with related comments. Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20240917144732.2758572-1-shuicheng.lin@intel.com	2024-09-24 12:02:37 +02:00
Rob Clark	440d52b370	drm/sched: Fix dynamic job-flow control race Fixes a race condition reported here: https://github.com/AsahiLinux/linux/issues/309#issuecomment-2238968609 The whole premise of lockless access to a single-producer-single- consumer queue is that there is just a single producer and single consumer. That means we can't call drm_sched_can_queue() (which is about queueing more work to the hw, not to the spsc queue) from anywhere other than the consumer (wq). This call in the producer is just an optimization to avoid scheduling the consuming worker if it cannot yet queue more work to the hw. It is safe to drop this optimization to avoid the race condition. Suggested-by: Asahi Lina <lina@asahilina.net> Fixes: `a78422e9df` ("drm/sched: implement dynamic job-flow control") Closes: https://github.com/AsahiLinux/linux/issues/309 Cc: stable@vger.kernel.org Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Danilo Krummrich <dakr@kernel.org> Tested-by: Janne Grunau <j@jannau.net> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20240913202301.16772-1-robdclark@gmail.com	2024-09-24 01:14:16 +02:00
Christian König	b2ef808786	drm/sched: add optional errno to drm_sched_start() The current implementation of drm_sched_start uses a hardcoded -ECANCELED to dispose of a job when the parent/hw fence is NULL. This results in drm_sched_job_done being called with -ECANCELED for each job with a NULL parent in the pending list, making it difficult to distinguish between recovery methods, whether a queue reset or a full GPU reset was used. To improve this, we first try a soft recovery for timeout jobs and use the error code -ENODATA. If soft recovery fails, we proceed with a queue reset, where the error code remains -ENODATA for the job. Finally, for a full GPU reset, we use error codes -ECANCELED or -ETIME. This patch adds an error code parameter to drm_sched_start, allowing us to differentiate between queue reset and GPU reset failures. This enables user mode and test applications to validate the expected correctness of the requested operation. After a successful queue reset, the only way to continue normal operation is to call drm_sched_job_done with the specific error code -ENODATA. v1: Initial implementation by Jesse utilized amdgpu_device_lock_reset_domain and amdgpu_device_unlock_reset_domain to allow user mode to track the queue reset status and distinguish between queue reset and GPU reset. v2: Christian suggested using the error codes -ENODATA for queue reset and -ECANCELED or -ETIME for GPU reset, returned to amdgpu_cs_wait_ioctl. v3: To meet the requirements, we introduce a new function drm_sched_start_ex with an additional parameter to set dma_fence_set_error, allowing us to handle the specific error codes appropriately and dispose of bad jobs with the selected error code depending on whether it was a queue reset or GPU reset. v4: Alex suggested using a new name, drm_sched_start_with_recovery_error, which more accurately describes the function's purpose. Additionally, it was recommended to add documentation details about the new method. v5: Fixed declaration of new function drm_sched_start_with_recovery_error.(Alex) v6 (chk): rebase on upstream changes, cleanup the commit message, drop the new function again and update all callers, apply the errno also to scheduler fences with hw fences v7 (chk): rebased Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240826122541.85663-1-christian.koenig@amd.com	2024-09-06 18:05:52 +02:00
Christian König	83b501c179	drm/scheduler: remove full_recover from drm_sched_start This was basically just another one of amdgpus hacks. The parameter allowed to restart the scheduler without turning fence signaling on again. That this is absolutely not a good idea should be obvious by now since the fences will then just sit there and never signal. While at it cleanup the code a bit. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240722083816.99685-1-christian.koenig@amd.com	2024-07-25 14:05:12 +02:00
Daniel Vetter	f112b68f27	Merge v6.8-rc6 into drm-next Thomas Zimmermann asked to backmerge -rc6 for drm-misc branches, there's a few same-area-changed conflicts (xe and amdgpu mostly) that are getting a bit too annoying. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2024-02-26 11:41:07 +01:00
Matthew Brost	d0399da9fb	drm/sched: Re-queue run job worker when drm_sched_entity_pop_job() returns NULL Rather then loop over entities until one with a ready job is found, re-queue the run job worker when drm_sched_entity_pop_job() returns NULL. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Christian König <christian.koenig@amd.com> Fixes: `66dbd9004a` ("drm/sched: Drain all entities in DRM sched run job worker") Reviewed-by: Luben Tuikov <ltuikov89@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240130030413.2031009-1-matthew.brost@intel.com	2024-02-06 12:47:43 +10:00
Dave Airlie	f8e4806e0d	Merge tag 'drm-misc-next-2024-01-11' of git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for v6.9: UAPI Changes: virtio: - add Venus capset defines Cross-subsystem Changes: Core Changes: - fix drm_fixp2int_ceil() - documentation fixes - clean ups - allow DRM_MM_DEBUG with DRM=m - build fixes for debugfs support - EDID cleanups - sched: error-handling fixes - ttm: add tests Driver Changes: bridge: - ite-6505: fix DP link-training bug - samsung-dsim: fix error checking in probe - tc358767: fix regmap usage efifb: - use copy of global screen_info state hisilicon: - fix EDID includes mgag200: - improve ioremap usage - convert to struct drm_edid nouveau: - disp: use kmemdup() - fix EDID includes - documentation fixes panel: - ltk050h3146w: error-handling fixes - panel-edp: support delay between power-on and enable; use put_sync in unprepare; support Mediatek MT8173 Chromebooks, BOE NV116WHM-N49 V8.0, BOE NV122WUM-N41, CSO MNC207QS1-1 plus DT bindings - panel-lvds: support EDT ETML0700Z9NDHA plus DT bindings - panel-novatek: FRIDA FRD400B25025-A-CTK plus DT bindings qaic: - fixes to BO handling - make use of DRM managed release - fix order of remove operations rockchip: - analogix_dp: get encoder port from DT - inno_hdmi: support HDMI for RK3128 - lvds: error-handling fixes simplefb: - fix logging ssd130x: - support SSD133x plus DT bindings tegra: - fix error handling tilcdc: - make use of DRM managed release v3d: - show memory stats in debugfs vc4: - fix error handling in plane prepare_fb - fix framebuffer test in plane helpers vesafb: - use copy of global screen_info state virtio: - cleanups vkms: - fix OOB access when programming the LUT - Kconfig improvements vmwgfx: - unmap surface before changing plane state - fix memory leak in error handling - documentation fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20240111154902.GA8448@linux-uq9g	2024-02-05 13:50:15 +10:00
Matthew Brost	66dbd9004a	drm/sched: Drain all entities in DRM sched run job worker All entities must be drained in the DRM scheduler run job worker to avoid the following case. An entity found that is ready, no job found ready on entity, and run job worker goes idle with other entities + jobs ready. Draining all ready entities (i.e. loop over all ready entities) in the run job worker ensures all job that are ready will be scheduled. Cc: Thorsten Leemhuis <regressions@leemhuis.info> Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Closes: https://lore.kernel.org/all/CABXGCsM2VLs489CH-vF-1539-s3in37=bwuOWtoeeE+q26zE+Q@mail.gmail.com/ Reported-and-tested-by: Mario Limonciello <mario.limonciello@amd.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3124 Link: https://lore.kernel.org/all/20240123021155.2775-1-mario.limonciello@amd.com/ Reported-and-tested-by: Vlastimil Babka <vbabka@suse.cz> Closes: https://lore.kernel.org/dri-devel/05ddb2da-b182-4791-8ef7-82179fd159a8@amd.com/T/#m0c31d4d1b9ae9995bb880974c4f1dbaddc33a48a Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240124210811.1639040-1-matthew.brost@intel.com	2024-01-26 12:46:36 +10:00
Markus Elfring	26a4591b31	drm/sched: Return an error code only as a constant in drm_sched_init() Return an error code without storing it in an intermediate variable. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Link: https://patchwork.freedesktop.org/patch/msgid/85f8004e-f0c9-42d9-8c59-30f1b4e0b89e@web.de Reviewed-by: Luben Tuikov <ltuikov89@gmail.com> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>	2024-01-07 22:37:25 -05:00
Markus Elfring	3bb4561806	drm/sched: One function call less in drm_sched_init() after error detection The kfree() function was called in one case by the drm_sched_init() function during error handling even if the passed data structure member contained a null pointer. This issue was detected by using the Coccinelle software. Thus adjust a jump target. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Link: https://patchwork.freedesktop.org/patch/msgid/85066512-983d-480c-a44d-32405ab1b80e@web.de Reviewed-by: Luben Tuikov <ltuikov89@gmail.com> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>	2024-01-07 22:37:25 -05:00
Bert Karwatzki	f92a39ae47	drm/sched: Partial revert of "Qualify drm_sched_wakeup() by drm_sched_entity_is_ready()" Commit `f3123c2590`, in combination with the use of work queues by the GPU scheduler, leads to random lock-ups of the GUI. This is a partial revert of of commit `f3123c2590` since drm_sched_wakeup() still needs its entity argument to pass it to drm_sched_can_queue(). Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2994 Link: https://lists.freedesktop.org/archives/dri-devel/2023-November/431606.html Signed-off-by: Bert Karwatzki <spasswolf@web.de> Link: https://patchwork.freedesktop.org/patch/msgid/20231127160955.87879-1-spasswolf@web.de Link: https://lore.kernel.org/r/36bece178ff5dc705065e53d1e5e41f6db6d87e4.camel@web.de Fixes: `f3123c2590` ("drm/sched: Qualify drm_sched_wakeup() by drm_sched_entity_is_ready()") Reviewed-by: Luben Tuikov <ltuikov89@gmail.com> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>	2023-11-28 14:16:56 -05:00
Luben Tuikov	38f922a563	drm/sched: Reverse run-queue priority enumeration Reverse run-queue priority enumeration such that the higest priority is now 0, and for each consecutive integer the prioirty diminishes. Run-queues correspond to priorities. To an external observer a scheduler created with a single run-queue, and another created with DRM_SCHED_PRIORITY_COUNT number of run-queues, should always schedule sched->sched_rq[0] with the same "priority", as that index run-queue exists in both schedulers, i.e. a scheduler with one run-queue or many. This patch makes it so. In other words, the "priority" of sched->sched_rq[n], n >= 0, is the same for any scheduler created with any allowable number of run-queues (priorities), 0 to DRM_SCHED_PRIORITY_COUNT. Cc: Rob Clark <robdclark@gmail.com> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: Danilo Krummrich <dakr@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: linux-arm-msm@vger.kernel.org Cc: freedreno@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Luben Tuikov <ltuikov89@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231124052752.6915-6-ltuikov89@gmail.com	2023-11-24 23:03:53 -05:00
Luben Tuikov	fe375c7480	drm/sched: Rename priority MIN to LOW Rename DRM_SCHED_PRIORITY_MIN to DRM_SCHED_PRIORITY_LOW. This mirrors DRM_SCHED_PRIORITY_HIGH, for a list of DRM scheduler priorities in ascending order, DRM_SCHED_PRIORITY_LOW, DRM_SCHED_PRIORITY_NORMAL, DRM_SCHED_PRIORITY_HIGH, DRM_SCHED_PRIORITY_KERNEL. Cc: Rob Clark <robdclark@gmail.com> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: Danilo Krummrich <dakr@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: linux-arm-msm@vger.kernel.org Cc: freedreno@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Luben Tuikov <ltuikov89@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231124052752.6915-5-ltuikov89@gmail.com	2023-11-24 23:03:53 -05:00
Danilo Krummrich	a78422e9df	drm/sched: implement dynamic job-flow control Currently, job flow control is implemented simply by limiting the number of jobs in flight. Therefore, a scheduler is initialized with a credit limit that corresponds to the number of jobs which can be sent to the hardware. This implies that for each job, drivers need to account for the maximum job size possible in order to not overflow the ring buffer. However, there are drivers, such as Nouveau, where the job size has a rather large range. For such drivers it can easily happen that job submissions not even filling the ring by 1% can block subsequent submissions, which, in the worst case, can lead to the ring run dry. In order to overcome this issue, allow for tracking the actual job size instead of the number of jobs. Therefore, add a field to track a job's credit count, which represents the number of credits a job contributes to the scheduler's credit limit. Signed-off-by: Danilo Krummrich <dakr@redhat.com> Reviewed-by: Luben Tuikov <ltuikov89@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231110001638.71750-1-dakr@redhat.com	2023-11-10 02:54:29 +01:00
Luben Tuikov	f3123c2590	drm/sched: Qualify drm_sched_wakeup() by drm_sched_entity_is_ready() Don't "wake up" the GPU scheduler unless the entity is ready, as well as we can queue to the scheduler, i.e. there is no point in waking up the scheduler for the entity unless the entity is ready. Signed-off-by: Luben Tuikov <ltuikov89@gmail.com> Fixes: `bc8d6a9df9` ("drm/sched: Don't disturb the entity when in RR-mode scheduling") Reviewed-by: Danilo Krummrich <dakr@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231110000123.72565-2-ltuikov89@gmail.com	2023-11-09 19:05:35 -05:00
Luben Tuikov	bc8d6a9df9	drm/sched: Don't disturb the entity when in RR-mode scheduling Don't call drm_sched_select_entity() in drm_sched_run_job_queue(). In fact, rename __drm_sched_run_job_queue() to just drm_sched_run_job_queue(), and let it do just that, schedule the work item for execution. The problem is that drm_sched_run_job_queue() calls drm_sched_select_entity() to determine if the scheduler has an entity ready in one of its run-queues, and in the case of the Round-Robin (RR) scheduling, the function drm_sched_rq_select_entity_rr() does just that, selects the _next_ entity which is ready, sets up the run-queue and completion and returns that entity. The FIFO scheduling algorithm is unaffected. Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then in the case of RR scheduling, that would result in drm_sched_select_entity() having been called twice, which may result in skipping a ready entity if more than one entity is ready. This commit fixes this by eliminating the call to drm_sched_select_entity() from drm_sched_run_job_queue(), and leaves it only in drm_sched_run_job_work(). v2: Rebased on top of Tvrtko's renames series of patches. (Luben) Add fixes-tag. (Tvrtko) Signed-off-by: Luben Tuikov <ltuikov89@gmail.com> Fixes: `f7fe64ad0f` ("drm/sched: Split free_job into own work item") Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Danilo Krummrich <dakr@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231107041020.10035-2-ltuikov89@gmail.com	2023-11-07 23:16:39 -05:00
Tvrtko Ursulin	f12af4c461	drm/sched: Drop suffix from drm_sched_wakeup_if_can_queue Because a) helper is exported to other parts of the scheduler and b) there isn't a plain drm_sched_wakeup to begin with, I think we can drop the suffix and by doing so separate the intimiate knowledge between the scheduler components a bit better. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Luben Tuikov <ltuikov89@gmail.com> Cc: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231102105538.391648-6-tvrtko.ursulin@linux.intel.com Reviewed-by: Luben Tuikov <ltuikov89@gmail.com> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>	2023-11-04 21:18:47 -04:00
Tvrtko Ursulin	35a4279d42	drm/sched: Rename drm_sched_run_job_queue_if_ready and clarify kerneldoc "If ready" is not immediately clear what it means - is the scheduler ready or something else? Drop the suffix, clarify kerneldoc, and employ the same naming scheme as in drm_sched_run_free_queue: - drm_sched_run_job_queue - enqueues if there is something to enqueue and scheduler is ready (can queue) - __drm_sched_run_job_queue - low-level helper to simply queue the job Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Luben Tuikov <ltuikov89@gmail.com> Cc: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231102105538.391648-5-tvrtko.ursulin@linux.intel.com Reviewed-by: Luben Tuikov <ltuikov89@gmail.com> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>	2023-11-04 21:18:35 -04:00
Tvrtko Ursulin	67dd1d8c9f	drm/sched: Rename drm_sched_free_job_queue to be more descriptive The current name makes it sound like helper will free a queue, while what it does is it enqueues the free job worker. Rename it to drm_sched_run_free_queue to align with existing drm_sched_run_job_queue. Despite that creating an illusion there are two queues, while in reality there is only one, at least it creates a consistent naming for the two enqueuing helpers. At the same time simplify the "if done" helper by dropping the suffix and adding a double underscore prefix to the one which just enqueues. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Luben Tuikov <ltuikov89@gmail.com> Cc: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231102105538.391648-4-tvrtko.ursulin@linux.intel.com Reviewed-by: Luben Tuikov <ltuikov89@gmail.com> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>	2023-11-04 21:18:26 -04:00
Tvrtko Ursulin	e608d9f7ac	drm/sched: Move free worker re-queuing out of the if block Whether or not there are more jobs to clean up does not depend on the existance of the current job, given both drm_sched_get_finished_job and drm_sched_free_job_queue_if_done take and drop the job list lock. Therefore it is confusing to make it read like there is a dependency. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Luben Tuikov <ltuikov89@gmail.com> Cc: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231102105538.391648-3-tvrtko.ursulin@linux.intel.com Reviewed-by: Luben Tuikov <ltuikov89@gmail.com> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>	2023-11-04 21:18:08 -04:00
Tvrtko Ursulin	7abbbe2694	drm/sched: Rename drm_sched_get_cleanup_job to be more descriptive "Get cleanup job" makes it sound like helper is returning a job which will execute some cleanup, or something, while the kerneldoc itself accurately says "fetch the next _finished_ job". So lets rename the helper to be self documenting. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Luben Tuikov <ltuikov89@gmail.com> Cc: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231102105538.391648-2-tvrtko.ursulin@linux.intel.com Reviewed-by: Luben Tuikov <ltuikov89@gmail.com> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>	2023-11-04 21:17:42 -04:00
Matthew Brost	3c6c7ca450	drm/sched: Add a helper to queue TDR immediately Add a helper whereby a driver can invoke TDR immediately. v2: - Drop timeout args, rename function, use mod delayed work (Luben) v3: - s/XE/Xe (Luben) - present tense in commit message (Luben) - Adjust comment for drm_sched_tdr_queue_imm (Luben) v4: - Adjust commit message (Luben) Cc: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://lore.kernel.org/r/20231031032439.1558703-6-matthew.brost@intel.com Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>	2023-11-01 17:29:23 -04:00
Matthew Brost	7a36dcfa16	drm/sched: Add drm_sched_start_timeout_unlocked helper Also add a lockdep assert to drm_sched_start_timeout. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://lore.kernel.org/r/20231031032439.1558703-5-matthew.brost@intel.com Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>	2023-11-01 17:29:22 -04:00
Matthew Brost	f7fe64ad0f	drm/sched: Split free_job into own work item Rather than call free_job and run_job in same work item have a dedicated work item for each. This aligns with the design and intended use of work queues. v2: - Test for DMA_FENCE_FLAG_TIMESTAMP_BIT before setting timestamp in free_job() work item (Danilo) v3: - Drop forward dec of drm_sched_select_entity (Boris) - Return in drm_sched_run_job_work if entity NULL (Boris) v4: - Replace dequeue with peek and invert logic (Luben) - Wrap to 100 lines (Luben) - Update comments for _queue / _queue_if_ready functions (Luben) v5: - Drop peek argument, blindly reinit idle (Luben) - s/drm_sched_free_job_queue_if_ready/drm_sched_free_job_queue_if_done (Luben) - Update work_run_job & work_free_job kernel doc (Luben) v6: - Do not move drm_sched_select_entity in file (Luben) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20231031032439.1558703-4-matthew.brost@intel.com Reviewed-by: Luben Tuikov <ltuikov89@gmail.com> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>	2023-11-01 17:29:21 -04:00
Matthew Brost	a6149f0393	drm/sched: Convert drm scheduler to use a work queue rather than kthread In Xe, the new Intel GPU driver, a choice has made to have a 1 to 1 mapping between a drm_gpu_scheduler and drm_sched_entity. At first this seems a bit odd but let us explain the reasoning below. 1. In Xe the submission order from multiple drm_sched_entity is not guaranteed to be the same completion even if targeting the same hardware engine. This is because in Xe we have a firmware scheduler, the GuC, which allowed to reorder, timeslice, and preempt submissions. If a using shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls apart as the TDR expects submission order == completion order. Using a dedicated drm_gpu_scheduler per drm_sched_entity solve this problem. 2. In Xe submissions are done via programming a ring buffer (circular buffer), a drm_gpu_scheduler provides a limit on number of jobs, if the limit of number jobs is set to RING_SIZE / MAX_SIZE_PER_JOB we get flow control on the ring for free. A problem with this design is currently a drm_gpu_scheduler uses a kthread for submission / job cleanup. This doesn't scale if a large number of drm_gpu_scheduler are used. To work around the scaling issue, use a worker rather than kthread for submission / job cleanup. v2: - (Rob Clark) Fix msm build - Pass in run work queue v3: - (Boris) don't have loop in worker v4: - (Tvrtko) break out submit ready, stop, start helpers into own patch v5: - (Boris) default to ordered work queue v6: - (Luben / checkpatch) fix alignment in msm_ringbuffer.c - (Luben) s/drm_sched_submit_queue/drm_sched_wqueue_enqueue - (Luben) Update comment for drm_sched_wqueue_enqueue - (Luben) Positive check for submit_wq in drm_sched_init - (Luben) s/alloc_submit_wq/own_submit_wq v7: - (Luben) s/drm_sched_wqueue_enqueue/drm_sched_run_job_queue v8: - (Luben) Adjust var names / comments Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://lore.kernel.org/r/20231031032439.1558703-3-matthew.brost@intel.com Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>	2023-11-01 17:29:21 -04:00

1 2 3 4

166 Commits