Dylan Yudaken
ed5ccb3bee
io_uring: remove priority tw list optimisation
...
This optimisation has some built in assumptions that make it easy to
introduce bugs. It also does not have clear wins that make it worth keeping.
Signed-off-by: Dylan Yudaken <dylany@fb.com >
Link: https://lore.kernel.org/r/20220622134028.2013417-2-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:15 -06:00
Pavel Begunkov
9da070b142
io_uring: consistent naming for inline completion
...
Improve naming of the inline/deferred completion helper so it's
consistent with it's *_post counterpart. Add some comments and extra
lockdeps to ensure the locking is done right.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/797c619943dac06529e9d3fcb16e4c3cde6ad1a3.1655684496.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:15 -06:00
Pavel Begunkov
46929b0868
io_uring: add io_commit_cqring_flush()
...
Since __io_commit_cqring_flush users moved to different files, introduce
io_commit_cqring_flush() helper and encapsulate all flags testing details
inside.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/0da03887435dd9869ffe46dcd3962bf104afcca3.1655684496.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:15 -06:00
Pavel Begunkov
253993210b
io_uring: introduce locking helpers for CQE posting
...
spin_lock(&ctx->completion_lock);
/* post CQEs */
io_commit_cqring(ctx);
spin_unlock(&ctx->completion_lock);
io_cqring_ev_posted(ctx);
We have many places repeating this sequence, and the three function
unlock section is not perfect from the maintainance perspective and also
makes it harder to add new locking/sync trick.
Introduce two helpers. io_cq_lock(), which is simple and only grabs
->completion_lock, and io_cq_unlock_post() encapsulating the three call
section.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/fe0c682bf7f7b55d9be55b0d034be9c1949277dc.1655684496.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:14 -06:00
Pavel Begunkov
305bef9887
io_uring: hide eventfd assumptions in eventfd paths
...
Some io_uring-eventfd users assume that there won't be spurious wakeups.
That assumption has to be honoured by all io_cqring_ev_posted() callers,
which is inconvenient and from time to time leads to problems but should
be maintained to not break the userspace.
Instead of making the callers track whether a CQE was posted or not, hide
it inside io_eventfd_signal(). It saves ->cached_cq_tail it saw last time
and triggers the eventfd only when ->cached_cq_tail changed since then.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/0ffc66bae37a2513080b601e4370e147faaa72c5.1655684496.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:14 -06:00
Pavel Begunkov
affa87db90
io_uring: fix multi ctx cancellation
...
io_uring_try_cancel_requests() loops until there is nothing left to do
with the ring, however there might be several rings and they might have
dependencies between them, e.g. via poll requests.
Instead of cancelling rings one by one, try to cancel them all and only
then loop over if we still potenially some work to do.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/8d491fe02d8ac4c77ff38061cf86b9a827e8845c.1655684496.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:14 -06:00
Pavel Begunkov
d9dee4302a
io_uring: remove ->flush_cqes optimisation
...
It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often
->flush_cqes flag prevents from completion being flushed. Sometimes it's
high level of concurrency that enables it at least for one CQE, but
sometimes it doesn't save much because nobody waiting on the CQ.
Remove ->flush_cqes flag and the optimisation, it should benefit the
normal use case. Note, that there is no spurious eventfd problem with
that as checks for spuriousness were incorporated into
io_eventfd_signal().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/692e81eeddccc096f449a7960365fa7b4a18f8e6.1655637157.git.asml.silence@gmail.com
[axboe: remove now dead state->flush_cqes variable]
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:14 -06:00
Pavel Begunkov
a830ffd287
io_uring: move io_eventfd_signal()
...
Move io_eventfd_signal() in the sources without any changes and kill its
forward declaration.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/9ebebb3f6f56f5a5448a621e0b6a537720c43334.1655637157.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:14 -06:00
Pavel Begunkov
d142c3ec8d
io_uring: remove extra io_commit_cqring()
...
We don't post events in __io_commit_cqring_flush() anymore but send all
requests to tw, so no need to do io_commit_cqring() there.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/f2481e32375e749be89c42e4804268b608722cef.1655637157.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:14 -06:00
Pavel Begunkov
48863ffd3e
io_uring: clean up tracing events
...
We have lots of trace events accepting an io_uring request and wanting
to print some of its fields like user_data, opcode, flags and so on.
However, as trace points were unaware of io_uring structures, we had to
pass all the fields as arguments. Teach trace/events/io_uring.h about
struct io_kiocb and stop the misery of passing a horde of arguments to
trace helpers.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/40ff72f92798114e56d400f2b003beb6cde6ef53.1655384063.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:14 -06:00
Pavel Begunkov
27a9d66fec
io_uring: kill extra io_uring_types.h includes
...
io_uring/io_uring.h already includes io_uring_types.h, no need to
include it every time. Kill it in a bunch of places, it prepares us for
following patches.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/94d8c943fbe0ef949981c508ddcee7fc1c18850f.1655384063.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:14 -06:00
Pavel Begunkov
b3659a65be
io_uring: change ->cqe_cached invariant for CQE32
...
With IORING_SETUP_CQE32 ->cqe_cached doesn't store a real address but
rather an implicit offset into cqes. Store the real cqe pointer and
increment it accordingly if CQE32.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/1ee1838cba16bed96381a006950b36ba640d998c.1655455613.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:14 -06:00
Pavel Begunkov
68494a65d0
io_uring: introduce io_req_cqe_overflow()
...
__io_fill_cqe_req() is hot and inlined, we want it to be as small as
possible. Add io_req_cqe_overflow() accepting only a request and doing
all overflow accounting, and replace with it two calls to 6 argument
io_cqring_event_overflow().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/048b9fbcce56814d77a1a540409c98c3d383edcb.1655455613.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:14 -06:00
Pavel Begunkov
faf88dde06
io_uring: don't inline __io_get_cqe()
...
__io_get_cqe() is not as hot as io_get_cqe(), no need to inline it, it
sheds ~500B from the binary.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/c1ac829198a881b7af8710926f99a3559b9f24c0.1655455613.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:14 -06:00
Pavel Begunkov
d245bca637
io_uring: don't expose io_fill_cqe_aux()
...
Deduplicate some code and add a helper for filling an aux CQE, locking
and notification.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/b7c6557c8f9dc5c4cfb01292116c682a0ff61081.1655455613.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:14 -06:00
Pavel Begunkov
9ca9fb24d5
io_uring: mutex locked poll hashing
...
Currently we do two extra spin lock/unlock pairs to add a poll/apoll
request to the cancellation hash table and remove it from there.
On the submission side we often already hold ->uring_lock and tw
completion is likely to hold it as well. Add a second cancellation hash
table protected by ->uring_lock. In concerns for latency because of a
need to have the mutex locked on the completion side, use the new table
only in following cases:
1) IORING_SETUP_SINGLE_ISSUER: only one task grabs uring_lock, so there
is little to no contention and so the main tw hander will almost
always end up grabbing it before calling callbacks.
2) IORING_SETUP_SQPOLL: same as with single issuer, only one task is
a major user of ->uring_lock.
3) apoll: we normally grab the lock on the completion side anyway to
execute the request, so it's free.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/1bbad9c78c454b7b92f100bbf46730a37df7194f.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com >
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:14 -06:00
Pavel Begunkov
e6f89be614
io_uring: introduce a struct for hash table
...
Instead of passing around a pointer to hash buckets, add a bit of type
safety and wrap it into a structure.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/d65bc3faba537ec2aca9eabf334394936d44bd28.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com >
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:13 -06:00
Pavel Begunkov
97bbdc06a4
io_uring: add IORING_SETUP_SINGLE_ISSUER
...
Add a new IORING_SETUP_SINGLE_ISSUER flag and the userspace visible part
of it, i.e. put limitations of submitters. Also, don't allow it together
with IOPOLL as we're not going to put it to good use.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/4bcc41ee467fdf04c8aab8baf6ce3ba21858c3d4.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com >
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:13 -06:00
Pavel Begunkov
8b1dfd343a
io_uring: clean up io_ring_ctx_alloc
...
Add a variable for the number of hash buckets in io_ring_ctx_alloc(),
makes it more readable.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/993926ed0d614ba9a76b2a85bebae2babcb13983.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com >
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:13 -06:00
Pavel Begunkov
4a07723fb4
io_uring: limit the number of cancellation buckets
...
Don't allocate to many hash/cancellation buckets, there might be too
many, clamp it to 8 bits, or 256 * 64B = 16KB. We don't usually have too
many requests, and 256 buckets should be enough, especially since we
do hash search only in the cancellation path.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/b9620c8072ba61a2d50eba894b89bd93a94a9abd.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com >
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:13 -06:00
Hao Xu
38513c464d
io_uring: switch cancel_hash to use per entry spinlock
...
Add a new io_hash_bucket structure so that each bucket in cancel_hash
has separate spinlock. Use per entry lock for cancel_hash, this removes
some completion lock invocation and remove contension between different
cancel_hash entries.
Signed-off-by: Hao Xu <howeyxu@tencent.com >
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/05d1e135b0c8bce9d1441e6346776589e5783e26.1655371007.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:13 -06:00
Pavel Begunkov
7012c81593
io_uring: refactor io_req_task_complete()
...
Clean up io_req_task_complete() and deduplicate io_put_kbuf() calls.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/ae3148ac7eb5cce3e06895cde306e9e959d6f6ae.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com >
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:13 -06:00
Pavel Begunkov
75d7b3aec1
io_uring: kill REQ_F_COMPLETE_INLINE
...
REQ_F_COMPLETE_INLINE is only needed to delay queueing into the
completion list to io_queue_sqe() as __io_req_complete() is inlined and
we don't want to bloat the kernel.
As now we complete in a more centralised fashion in io_issue_sqe() we
can get rid of the flag and queue to the list directly.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/600ba20a9338b8a39b249b23d3d177803613dde4.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com >
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:13 -06:00
Jens Axboe
bb8f870031
io_uring: remove unused IO_REQ_CACHE_SIZE defined
...
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:13 -06:00
Pavel Begunkov
c65f5279ba
io_uring: don't set REQ_F_COMPLETE_INLINE in tw
...
io_req_task_complete() enqueues requests for state completion itself, no
need for REQ_F_COMPLETE_INLINE, which is only serve the purpose of not
bloating the kernel.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/aca80f71464ad02c06f1311d998a2d6ee0b31573.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:13 -06:00
Pavel Begunkov
3a08576b96
io_uring: remove check_cq checking from hot paths
...
All ctx->check_cq events are slow path, don't test every single flag one
by one in the hot path, but add a common guarding if.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/dff026585cea7ff3a172a7c83894a3b0111bbf6a.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:13 -06:00
Pavel Begunkov
aeaa72c694
io_uring: never defer-complete multi-apoll
...
Luckily, nnobody completes multi-apoll requests outside the polling
functions, but don't set IO_URING_F_COMPLETE_DEFER in any case as
there is nobody who is catching REQ_F_COMPLETE_INLINE, and so will leak
requests if used.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/a65ed3f5effd9321ee06e6edea294a03be3e15a0.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:13 -06:00
Pavel Begunkov
aa1e90f64e
io_uring: move small helpers to headers
...
There is a bunch of inline helpers that will be useful not only to the
core of io_uring, move them to headers.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com >
Link: https://lore.kernel.org/r/22df99c83723e44cba7e945e8519e64e3642c064.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:13 -06:00
Jens Axboe
d9b57aa3cf
io_uring: move opcode table to opdef.c
...
We already have the declarations in opdef.h, move the rest into its own
file rather than in the main io_uring.c file.
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Jens Axboe
f3b44f92e5
io_uring: move read/write related opcodes to its own file
...
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Jens Axboe
c98817e6cd
io_uring: move remaining file table manipulation to filetable.c
...
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Jens Axboe
7357298448
io_uring: move rsrc related data, core, and commands
...
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Jens Axboe
3b77495a97
io_uring: split provided buffers handling into its own file
...
Move both the opcodes related to it, and the internals code dealing with
it.
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Jens Axboe
7aaff708a7
io_uring: move cancelation into its own file
...
This also helps cleanup the io_uring.h cancel parts, as we can make
things static in the cancel.c file, mostly.
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Jens Axboe
329061d3e2
io_uring: move poll handling into its own file
...
Add a io_poll_issue() rather than export the general task_work locking
and io_issue_sqe(), and put the io_op_defs definition and structure into
a separate header file so that poll can use it.
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Jens Axboe
cfd22e6b33
io_uring: add opcode name to io_op_defs
...
This kills the last per-op switch.
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Jens Axboe
92ac8beaea
io_uring: include and forward-declaration sanitation
...
Remove some dead headers we no longer need, and get rid of the
io_ring_ctx and io_uring_fops forward declarations.
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Jens Axboe
c9f06aa7de
io_uring: move io_uring_task (tctx) helpers into its own file
...
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Jens Axboe
a4ad4f748e
io_uring: move fdinfo helpers to its own file
...
This also means moving a bit more of the fixed file handling to the
filetable side, which makes sense separately too.
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Jens Axboe
e5550a1447
io_uring: use io_is_uring_fops() consistently
...
Convert the last spots that check for io_uring_fops to use the provided
helper instead.
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Jens Axboe
17437f3114
io_uring: move SQPOLL related handling into its own file
...
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Jens Axboe
59915143e8
io_uring: move timeout opcodes and handling into its own file
...
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Jens Axboe
e418bbc97b
io_uring: move our reference counting into a header
...
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Jens Axboe
36404b09aa
io_uring: move msg_ring into its own file
...
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:12 -06:00
Jens Axboe
f9ead18c10
io_uring: split network related opcodes into its own file
...
While at it, convert the handlers to just use io_eopnotsupp_prep()
if CONFIG_NET isn't set.
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:11 -06:00
Jens Axboe
e0da14def1
io_uring: move statx handling to its own file
...
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:11 -06:00
Jens Axboe
a9c210cebe
io_uring: move epoll handler to its own file
...
Would be nice to sort out Kconfig for this and don't even compile
epoll.c if we don't have epoll configured.
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:11 -06:00
Jens Axboe
4cf9049528
io_uring: add a dummy -EOPNOTSUPP prep handler
...
Add it and use it for the epoll handling, if epoll isn't configured.
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:11 -06:00
Jens Axboe
99f15d8d61
io_uring: move uring_cmd handling to its own file
...
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:11 -06:00
Jens Axboe
cd40cae29e
io_uring: split out open/close operations
...
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-24 18:39:11 -06:00