[ 114.987980][ T5313] WARNING: CPU: 6 PID: 5313 at io_uring/io_uring.c:872 io_req_post_cqe+0x12e/0x4f0
[ 114.991597][ T5313] RIP: 0010:io_req_post_cqe+0x12e/0x4f0
[ 115.001880][ T5313] Call Trace:
[ 115.002222][ T5313] <TASK>
[ 115.007813][ T5313] io_send+0x4fe/0x10f0
[ 115.009317][ T5313] io_issue_sqe+0x1a6/0x1740
[ 115.012094][ T5313] io_wq_submit_work+0x38b/0xed0
[ 115.013223][ T5313] io_worker_handle_work+0x62a/0x1600
[ 115.013876][ T5313] io_wq_worker+0x34f/0xdf0
As the comment states, io_req_post_cqe() should only be used by
multishot requests, i.e. REQ_F_APOLL_MULTISHOT, which bundled sends are
not. Add a flag signifying whether a request wants to post multiple
CQEs. Eventually REQ_F_APOLL_MULTISHOT should imply the new flag, but
that's left out for simplicity.
Cc: stable@vger.kernel.org
Fixes: a05d1f625c ("io_uring/net: support bundles for send")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8b611dbb54d1cd47a88681f5d38c84d0c02bc563.1743067183.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull io_uring updates from Jens Axboe:
"This is the first of the io_uring pull requests for the 6.15 merge
window, there will be others once the net tree has gone in. This
contains:
- Cleanup and unification of cancelation handling across various
request types.
- Improvement for bundles, supporting them both for incrementally
consumed buffers, and for non-multishot requests.
- Enable toggling of using iowait while waiting on io_uring events or
not. Unfortunately this is still tied with CPU frequency boosting
on short waits, as the scheduler side has not been very receptive
to splitting the (useless) iowait stat from the cpufreq implied
boost.
- Add support for kbuf nodes, enabling zero-copy support for the ublk
block driver.
- Various cleanups for resource node handling.
- Series greatly cleaning up the legacy provided (non-ring based)
buffers. For years, we've been pushing the ring provided buffers as
the way to go, and that is what people have been using. Reduce the
complexity and code associated with legacy provided buffers.
- Series cleaning up the compat handling.
- Series improving and cleaning up the recvmsg/sendmsg iovec and msg
handling.
- Series of cleanups for io-wq.
- Start adding a bunch of selftests. The liburing repository
generally carries feature and regression tests for everything, but
at least for ublk initially, we'll try and go the route of having
it in selftests as well. We'll see how this goes, might decide to
migrate more tests this way in the future.
- Various little cleanups and fixes"
* tag 'for-6.15/io_uring-20250322' of git://git.kernel.dk/linux: (108 commits)
selftests: ublk: add stripe target
selftests: ublk: simplify loop io completion
selftests: ublk: enable zero copy for null target
selftests: ublk: prepare for supporting stripe target
selftests: ublk: move common code into common.c
selftests: ublk: increase max buffer size to 1MB
selftests: ublk: add single sqe allocator helper
selftests: ublk: add generic_01 for verifying sequential IO order
selftests: ublk: fix starting ublk device
io_uring: enable toggle of iowait usage when waiting on CQEs
selftests: ublk: fix write cache implementation
selftests: ublk: add variable for user to not show test result
selftests: ublk: don't show `modprobe` failure
selftests: ublk: add one dependency header
io_uring/kbuf: enable bundles for incrementally consumed buffers
Revert "io_uring/rsrc: simplify the bvec iter count calculation"
selftests: ublk: improve test usability
selftests: ublk: add stress test for covering IO vs. killing ublk server
selftests: ublk: add one stress test for covering IO vs. removing device
selftests: ublk: load/unload ublk_drv when preparing & cleaning up tests
...
Pull lsm updates from Paul Moore:
- Various minor updates to the LSM Rust bindings
Changes include marking trivial Rust bindings as inlines and comment
tweaks to better reflect the LSM hooks.
- Add LSM/SELinux access controls to io_uring_allowed()
Similar to the io_uring_disabled sysctl, add a LSM hook to
io_uring_allowed() to enable LSMs a simple way to enforce security
policy on the use of io_uring. This pull request includes SELinux
support for this new control using the io_uring/allowed permission.
- Remove an unused parameter from the security_perf_event_open() hook
The perf_event_attr struct parameter was not used by any currently
supported LSMs, remove it from the hook.
- Add an explicit MAINTAINERS entry for the credentials code
We've seen problems in the past where patches to the credentials code
sent by non-maintainers would often languish on the lists for
multiple months as there was no one explicitly tasked with the
responsibility of reviewing and/or merging credentials related code.
Considering that most of the code under security/ has a vested
interest in ensuring that the credentials code is well maintained,
I'm volunteering to look after the credentials code and Serge Hallyn
has also volunteered to step up as an official reviewer. I posted the
MAINTAINERS update as a RFC to LKML in hopes that someone else would
jump up with an "I'll do it!", but beyond Serge it was all crickets.
- Update Stephen Smalley's old email address to prevent confusion
This includes a corresponding update to the mailmap file.
* tag 'lsm-pr-20250323' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
mailmap: map Stephen Smalley's old email addresses
lsm: remove old email address for Stephen Smalley
MAINTAINERS: add Serge Hallyn as a credentials reviewer
MAINTAINERS: add an explicit credentials entry
cred,rust: mark Credential methods inline
lsm,rust: reword "destroy" -> "release" in SecurityCtx
lsm,rust: mark SecurityCtx methods inline
perf: Remove unnecessary parameter of security check
lsm: fix a missing security_uring_allowed() prototype
io_uring,lsm,selinux: add LSM hooks for io_uring_setup()
io_uring: refactor io_uring_allowed()
Pull timer cleanups from Thomas Gleixner:
"A treewide hrtimer timer cleanup
hrtimers are initialized with hrtimer_init() and a subsequent store to
the callback pointer. This turned out to be suboptimal for the
upcoming Rust integration and is obviously a silly implementation to
begin with.
This cleanup replaces the hrtimer_init(T); T->function = cb; sequence
with hrtimer_setup(T, cb);
The conversion was done with Coccinelle and a few manual fixups.
Once the conversion has completely landed in mainline, hrtimer_init()
will be removed and the hrtimer::function becomes a private member"
* tag 'timers-cleanups-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (100 commits)
wifi: rt2x00: Switch to use hrtimer_update_function()
io_uring: Use helper function hrtimer_update_function()
serial: xilinx_uartps: Use helper function hrtimer_update_function()
ASoC: fsl: imx-pcm-fiq: Switch to use hrtimer_setup()
RDMA: Switch to use hrtimer_setup()
virtio: mem: Switch to use hrtimer_setup()
drm/vmwgfx: Switch to use hrtimer_setup()
drm/xe/oa: Switch to use hrtimer_setup()
drm/vkms: Switch to use hrtimer_setup()
drm/msm: Switch to use hrtimer_setup()
drm/i915/request: Switch to use hrtimer_setup()
drm/i915/uncore: Switch to use hrtimer_setup()
drm/i915/pmu: Switch to use hrtimer_setup()
drm/i915/perf: Switch to use hrtimer_setup()
drm/i915/gvt: Switch to use hrtimer_setup()
drm/i915/huc: Switch to use hrtimer_setup()
drm/amdgpu: Switch to use hrtimer_setup()
stm class: heartbeat: Switch to use hrtimer_setup()
i2c: Switch to use hrtimer_setup()
iio: Switch to use hrtimer_setup()
...
Add iou_vec to commands and wire caching for it, but don't expose it to
users just yet. We need the vec cleared on initial alloc, but since
we can't place it at the beginning at the moment, zero the entire
async_data. It's cached, and the performance effects only the initial
allocation, and it might be not a bad idea since we're exposing those
bits to outside drivers.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c0f2145b75791bc6106eb4e72add2cf6a2c72a7a.1742579999.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
By default, io_uring marks a waiting task as being in iowait, if it's
sleeping waiting on events and there are pending requests. This isn't
necessarily always useful, and may be confusing on non-storage setups
where iowait isn't expected. It can also cause extra power usage, by
preventing the CPU from entering lower sleep states.
This adds a new enter flag, IORING_ENTER_NO_IOWAIT. If set, then
io_uring will not account the sleeping task as being in iowait. If the
kernel supports this feature, then it will be marked by having the
IORING_FEAT_NO_IOWAIT feature flag set.
As the kernel currently does not support separating the iowait
accounting and CPU frequency boosting, the IORING_ENTER_NO_IOWAIT
controls both of these at the same time. In the future, if those do end
up being split, then it'd be possible to control them separately.
However, it seems more likely that the kernel will decouple iowait and
CPU frequency boosting anyway.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Multishot errors can be mapped 1:1 to normal errors, but there are not
identical. It leads to a peculiar situation where all multishot requests
has to check in what context they're run and return different codes.
Unify them starting with EAGAIN / IOU_ISSUE_SKIP_COMPLETE(EIOCBQUEUED)
pair, which mean that core io_uring still owns the request and it should
be retried. In case of multishot it's naturally just continues to poll,
otherwise it might poll, use iowq or do any other kind of allowed
blocking. Introduce IOU_RETRY aliased to -EAGAIN for that.
Apart from obvious upsides, multishot can now also check for misuse of
IOU_ISSUE_SKIP_COMPLETE.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/da117b79ce72ecc3ab488c744e29fae9ba54e23b.1741453534.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add a helper function io_cache_free() that returns an allocation to a
io_alloc_cache, falling back on kfree() if the io_alloc_cache is full.
This is the inverse of io_cache_alloc(), which takes an allocation from
an io_alloc_cache and falls back on kmalloc() if the cache is empty.
Convert 4 callers to use the helper.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Suggested-by: Li Zetao <lizetao1@huawei.com>
Link: https://lore.kernel.org/r/20250304194814.2346705-1-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Provide an interface for the kernel to leverage the existing
pre-registered buffers that io_uring provides. User space can reference
these later to achieve zero-copy IO.
User space must register an empty fixed buffer table with io_uring in
order for the kernel to make use of it.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20250227223916.143006-5-kbusch@meta.com
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Merge mainline fixes into 6.15 branch, as upcoming patches depend on
fixes that went into the 6.14 mainline branch.
* io_uring-6.14:
io_uring/net: save msg_control for compat
io_uring/rw: clean up mshot forced sync mode
io_uring/rw: move ki_complete init into prep
io_uring/rw: don't directly use ki_complete
io_uring/rw: forbid multishot async reads
io_uring/rsrc: remove unused constants
io_uring: fix spelling error in uapi io_uring.h
io_uring: prevent opcode speculation
io-wq: backoff when retrying worker creation
io_poll_issue() forwards the call to io_issue_sqe() and thus inherits
some of the handling. That's not particularly failure resistant, as for
example returning an innocently looking IOU_OK from a multishot issue
will lead to severe bugs.
Reimplement io_poll_issue() without io_issue_sqe()'s request completion
logic. Remove extra checks as we know that req->file is already set,
linked timeout are armed, and iopoll is not supported. Also cover it
with warnings for now.
The patch should be useful by itself, but it's also preparing the
codebase for other future clean ups.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3096d7b1026d9a52426a598bdfc8d9d324555545.1740331076.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull io_uring fixes from Jens Axboe:
- Series fixing an issue with multishot read on pollable files that may
return -EIOCBQUEUED from ->read_iter(). Four small patches for that,
the first one deliberately done in such a way that it'd be easy to
backport
- Remove some dead constant definitions
- Use array_index_nospec() for opcode indexing
- Work-around for worker creation retries in the presence of signals
* tag 'io_uring-6.14-20250221' of git://git.kernel.dk/linux:
io_uring/rw: clean up mshot forced sync mode
io_uring/rw: move ki_complete init into prep
io_uring/rw: don't directly use ki_complete
io_uring/rw: forbid multishot async reads
io_uring/rsrc: remove unused constants
io_uring: fix spelling error in uapi io_uring.h
io_uring: prevent opcode speculation
io-wq: backoff when retrying worker creation
Add a new object called an interface queue (ifq) that represents a net
rx queue that has been configured for zero copy. Each ifq is registered
using a new registration opcode IORING_REGISTER_ZCRX_IFQ.
The refill queue is allocated by the kernel and mapped by userspace
using a new offset IORING_OFF_RQ_RING, in a similar fashion to the main
SQ/CQ. It is used by userspace to return buffers that it is done with,
which will then be re-used by the netdev again.
The main CQ ring is used to notify userspace of received data by using
the upper 16 bytes of a big CQE as a new struct io_uring_zcrx_cqe. Each
entry contains the offset + len to the data.
For now, each io_uring instance only has a single ifq.
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: David Wei <dw@davidwei.uk>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20250215000947.789731-2-dw@davidwei.uk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
8e5b3b89ec ("io_uring: remove struct io_tw_state::locked") removed the
only field of io_tw_state but kept it as a task work callback argument
to "forc[e] users not to invoke them carelessly out of a wrong context".
Passing the struct io_tw_state * argument adds a few instructions to all
callers that can't inline the functions and see the argument is unused.
So pass struct io_tw_state by value instead. Since it's a 0-sized value,
it can be passed without any instructions needed to initialize it.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Link: https://lore.kernel.org/r/20250217022511.1150145-2-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In preparation for changing how io_tw_state is passed, introduce a type
alias io_tw_token_t for struct io_tw_state *. This allows for changing
the representation in one place, without having to update the many
functions that just forward their struct io_tw_state * argument.
Also add a comment to struct io_tw_state to explain its purpose.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Link: https://lore.kernel.org/r/20250217022511.1150145-1-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
It is desirable to allow LSM to configure accessibility to io_uring
because it is a coarse yet very simple way to restrict access to it. So,
add an LSM for io_uring_allowed() to guard access to io_uring.
Cc: Paul Moore <paul@paul-moore.com>
Signed-off-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
[PM: merge fuzz due to changes in preceding patches, subj tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Have io_uring_allowed() return an error code directly instead of
true/false. This is needed for follow-up work to guard io_uring_setup()
with LSM.
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
[PM: goto-to-return conversion as discussed on-list]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Pull more io_uring updates from Jens Axboe:
- Series cleaning up the alloc cache changes from this merge window,
and then another series on top making it better yet.
This also solves an issue with KASAN_EXTRA_INFO, by making io_uring
resilient to KASAN using parts of the freed struct for storage
- Cleanups and simplications to buffer cloning and io resource node
management
- Fix an issue introduced in this merge window where READ/WRITE_ONCE
was used on an atomic_t, which made some archs complain
- Fix for an errant connect retry when the socket has been shut down
- Fix for multishot and provided buffers
* tag 'io_uring-6.14-20250131' of git://git.kernel.dk/linux:
io_uring/net: don't retry connect operation on EPOLLERR
io_uring/rw: simplify io_rw_recycle()
io_uring: remove !KASAN guards from cache free
io_uring/net: extract io_send_select_buffer()
io_uring/net: clean io_msg_copy_hdr()
io_uring/net: make io_net_vec_assign() return void
io_uring: add alloc_cache.c
io_uring: dont ifdef io_alloc_cache_kasan()
io_uring: include all deps for alloc_cache.h
io_uring: fix multishots with selected buffers
io_uring/register: use atomic_read/write for sq_flags migration
io_uring/alloc_cache: get rid of _nocache() helper
io_uring: get rid of alloc cache init_once handling
io_uring/uring_cmd: cleanup struct io_uring_cmd_data layout
io_uring/uring_cmd: use cached cmd_op in io_uring_cmd_sock()
io_uring/msg_ring: don't leave potentially dangling ->tctx pointer
io_uring/rsrc: Move lockdep assert from io_free_rsrc_node() to caller
io_uring/rsrc: remove unused parameter ctx for io_rsrc_node_alloc()
io_uring: clean up io_uring_register_get_file()
io_uring/rsrc: Simplify buffer cloning by locking both rings
Add the const qualifier to all the ctl_tables in the tree except for
watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls,
loadpin_sysctl_table and the ones calling register_net_sysctl (./net,
drivers/inifiniband dirs). These are special cases as they use a
registration function with a non-const qualified ctl_table argument or
modify the arrays before passing them on to the registration function.
Constifying ctl_table structs will prevent the modification of
proc_handler function pointers as the arrays would reside in .rodata.
This is made possible after commit 78eb4ea25c ("sysctl: treewide:
constify the ctl_table argument of proc_handlers") constified all the
proc_handlers.
Created this by running an spatch followed by a sed command:
Spatch:
virtual patch
@
depends on !(file in "net")
disable optional_qualifier
@
identifier table_name != {
watchdog_hardlockup_sysctl,
iwcm_ctl_table,
ucma_ctl_table,
memory_allocation_profiling_sysctls,
loadpin_sysctl_table
};
@@
+ const
struct ctl_table table_name [] = { ... };
sed:
sed --in-place \
-e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \
kernel/utsname_sysctl.c
Reviewed-by: Song Liu <song@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> # for kernel/trace/
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> # SCSI
Reviewed-by: Darrick J. Wong <djwong@kernel.org> # xfs
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Acked-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
init_once is called when an object doesn't come from the cache, and
hence needs initial clearing of certain members. While the whole
struct could get cleared by memset() in that case, a few of the cache
members are large enough that this may cause unnecessary overhead if
the caches used aren't large enough to satisfy the workload. For those
cases, some churn of kmalloc+kfree is to be expected.
Ensure that the 3 users that need clearing put the members they need
cleared at the start of the struct, and wrap the rest of the struct in
a struct group so the offset is known.
While at it, improve the interaction with KASAN such that when/if
KASAN writes to members inside the struct that should be retained over
caching, it won't trip over itself. For rw and net, the retaining of
the iovec over caching is disabled if KASAN is enabled. A helper will
free and clear those members in that case.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull io_uring updates from Jens Axboe:
"Not a lot in terms of features this time around, mostly just cleanups
and code consolidation:
- Support for PI meta data read/write via io_uring, with NVMe and
SCSI covered
- Cleanup the per-op structure caching, making it consistent across
various command types
- Consolidate the various user mapped features into a concept called
regions, making the various users of that consistent
- Various cleanups and fixes"
* tag 'for-6.14/io_uring-20250119' of git://git.kernel.dk/linux: (56 commits)
io_uring/fdinfo: fix io_uring_show_fdinfo() misuse of ->d_iname
io_uring: reuse io_should_terminate_tw() for cmds
io_uring: Factor out a function to parse restrictions
io_uring/rsrc: require cloned buffers to share accounting contexts
io_uring: simplify the SQPOLL thread check when cancelling requests
io_uring: expose read/write attribute capability
io_uring/rw: don't gate retry on completion context
io_uring/rw: handle -EAGAIN retry at IO completion time
io_uring/rw: use io_rw_recycle() from cleanup path
io_uring/rsrc: simplify the bvec iter count calculation
io_uring: ensure io_queue_deferred() is out-of-line
io_uring/rw: always clear ->bytes_done on io_async_rw setup
io_uring/rw: use NULL for rw->free_iovec assigment
io_uring/rw: don't mask in f_iocb_flags
io_uring/msg_ring: Drop custom destructor
io_uring: Move old async data allocation helper to header
io_uring/rw: Allocate async data through helper
io_uring/net: Allocate msghdr async data through helper
io_uring/uring_cmd: Allocate async data through generic helper
io_uring/poll: Allocate apoll with generic alloc_cache helper
...
In io_uring_try_cancel_requests, we check whether sq_data->thread ==
current to determine if the function is called by the SQPOLL thread to do
iopoll when IORING_SETUP_SQPOLL is set. This check can race with the SQPOLL
thread termination.
io_uring_cancel_generic is used in 2 places: io_uring_cancel_generic and
io_ring_exit_work. In io_uring_cancel_generic, we have the information
whether the current is SQPOLL thread already. And the SQPOLL thread never
reaches io_ring_exit_work.
So to avoid the racy check, this commit adds a boolean flag to
io_uring_try_cancel_requests to determine if the caller is SQPOLL thread.
Reported-by: syzbot+3c750be01dab672c513d@syzkaller.appspotmail.com
Reported-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Li Zetao <lizetao1@huawei.com>
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20250113160331.44057-1-minhquangbui99@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull io_uring fixes from Jens Axboe:
- Fix for multishot timeout updates only using the updated value for
the first invocation, not subsequent ones
- Silence a false positive lockdep warning
- Fix the eventfd signaling and putting RCU logic
- Fix fault injected SQPOLL setup not clearing the task pointer in the
error path
- Fix local task_work looking at the SQPOLL thread rather than just
signaling the safe variant. Again one of those theoretical issues,
which should be closed up none the less.
* tag 'io_uring-6.13-20250111' of git://git.kernel.dk/linux:
io_uring: don't touch sqd->thread off tw add
io_uring/sqpoll: zero sqd->thread on tctx errors
io_uring/eventfd: ensure io_eventfd_signal() defers another RCU period
io_uring: silence false positive warnings
io_uring/timeout: fix multishot updates
With IORING_SETUP_SQPOLL all requests are created by the SQPOLL task,
which means that req->task should always match sqd->thread. Since
accesses to sqd->thread should be separately protected, use req->task
in io_req_normal_work_add() instead.
Note, in the eyes of io_req_normal_work_add(), the SQPOLL task struct
is always pinned and alive, and sqd->thread can either be the task or
NULL. It's only problematic if the compiler decides to reload the value
after the null check, which is not so likely.
Cc: stable@vger.kernel.org
Cc: Bui Quang Minh <minhquangbui99@gmail.com>
Reported-by: lizetao <lizetao1@huawei.com>
Fixes: 78f9b61bd8 ("io_uring: wake SQPOLL task when task_work is added to an empty queue")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1cbbe72cf32c45a8fee96026463024cd8564a7d7.1736541357.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull vfs fixes from Christian Brauner:
"afs:
- Fix the maximum cell name length
- Fix merge preference rule failure condition
fuse:
- Fix fuse_get_user_pages() so it doesn't risk misleading the caller
to think pages have been allocated when they actually haven't
- Fix direct-io folio offset and length calculation
netfs:
- Fix async direct-io handling
- Fix read-retry for filesystems that don't provide a
->prepare_read() method
vfs:
- Prevent truncating 64-bit offsets to 32-bits in iomap
- Fix memory barrier interactions when polling
- Remove MNT_ONRB to fix concurrent modification of @mnt->mnt_flags
leading to MNT_ONRB to not be raised and invalid access to a list
member"
* tag 'vfs-6.13-rc7.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
poll: kill poll_does_not_wait()
sock_poll_wait: kill the no longer necessary barrier after poll_wait()
io_uring_poll: kill the no longer necessary barrier after poll_wait()
poll_wait: kill the obsolete wait_address check
poll_wait: add mb() to fix theoretical race between waitqueue_active() and .poll()
afs: Fix merge preference rule failure condition
netfs: Fix read-retry for fs with no ->prepare_read()
netfs: Fix kernel async DIO
fs: kill MNT_ONRB
iomap: avoid avoid truncating 64-bit offset to 32 bits
afs: Fix the maximum cell name length
fuse: Set *nbytesp=0 in fuse_get_user_pages on allocation failure
fuse: fix direct io folio offset and length calculation
Rather than try and have io_read/io_write turn REQ_F_REISSUE into
-EAGAIN, catch the REQ_F_REISSUE when the request is otherwise
considered as done. This is saner as we know this isn't happening
during an actual submission, and it removes the need to randomly
check REQ_F_REISSUE after read/write submission.
If REQ_F_REISSUE is set, __io_submit_flush_completions() will skip over
this request in terms of posting a CQE, and the regular request
cleaning will ensure that it gets reissued via io-wq.
Signed-off-by: Jens Axboe <axboe@kernel.dk>