Avoid races and block request allocation until io-uring
queues are ready.
This is a especially important for background requests,
as bg request completion might cause lock order inversion
of the typical queue->lock and then fc->bg_lock
fuse_request_end
spin_lock(&fc->bg_lock);
flush_bg_queue
fuse_send_one
fuse_uring_queue_fuse_req
spin_lock(&queue->lock);
Signed-off-by: Bernd Schubert <bernd@bsbernd.com>
Reviewed-by: Luis Henriques <luis@igalia.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
On teardown struct file_operations::uring_cmd requests
need to be completed by calling io_uring_cmd_done().
Not completing all ring entries would result in busy io-uring
tasks giving warning messages in intervals and unreleased
struct file.
Additionally the fuse connection and with that the ring can
only get released when all io-uring commands are completed.
Completion is done with ring entries that are
a) in waiting state for new fuse requests - io_uring_cmd_done
is needed
b) already in userspace - io_uring_cmd_done through teardown
is not needed, the request can just get released. If fuse server
is still active and commits such a ring entry, fuse_uring_cmd()
already checks if the connection is active and then complete the
io-uring itself with -ENOTCONN. I.e. special handling is not
needed.
This scheme is basically represented by the ring entry state
FRRS_WAIT and FRRS_USERSPACE.
Entries in state:
- FRRS_INIT: No action needed, do not contribute to
ring->queue_refs yet
- All other states: Are currently processed by other tasks,
async teardown is needed and it has to wait for the two
states above. It could be also solved without an async
teardown task, but would require additional if conditions
in hot code paths. Also in my personal opinion the code
looks cleaner with async teardown.
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> # io_uring
Reviewed-by: Luis Henriques <luis@igalia.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
fuse-over-io-uring uses existing functions to find requests based
on their unique id - make these functions non-static.
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Luis Henriques <luis@igalia.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Move 'struct fuse_copy_state' and fuse_copy_* functions
to fuse_dev_i.h to make it available for fuse-io-uring.
'copy_out_args()' is renamed to 'fuse_copy_out_args'.
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Luis Henriques <luis@igalia.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
This change sets up FUSE operations to always have headers in
args.in_args[0], even for opcodes without an actual header.
This step prepares for a clean separation of payload from headers,
initially it is used by fuse-over-io-uring.
For opcodes without a header, we use a zero-sized struct as a
placeholder. This approach:
- Keeps things consistent across all FUSE operations
- Will help with payload alignment later
- Avoids future issues when header sizes change
Op codes that already have an op code specific header do not
need modification.
Op codes that have neither payload nor op code headers
are not modified either (FUSE_READLINK and FUSE_DESTROY).
FUSE_BATCH_FORGET already has the header in the right place,
but is not using fuse_copy_args - as -over-uring is currently
not handling forgets it does not matter for now, but header
separation will later need special attention for that op code.
Correct the struct fuse_args->in_args array max size.
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Luis Henriques <luis@igalia.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
This function is needed by fuse_uring.c to clean ring queues,
so make it non static. Especially in non-static mode the function
name 'end_requests' should be prefixed with fuse_
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Luis Henriques <luis@igalia.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Pull fuse updates from Miklos Szeredi:
- Add page -> folio conversions (Joanne Koong, Josef Bacik)
- Allow max size of fuse requests to be configurable with a sysctl
(Joanne Koong)
- Allow FOPEN_DIRECT_IO to take advantage of async code path (yangyun)
- Fix large kernel reads (like a module load) in virtio_fs (Hou Tao)
- Fix attribute inconsistency in case readdirplus (and plain lookup in
corner cases) is racing with inode eviction (Zhang Tianci)
- Fix a WARN_ON triggered by virtio_fs (Asahi Lina)
* tag 'fuse-update-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (30 commits)
virtiofs: dax: remove ->writepages() callback
fuse: check attributes staleness on fuse_iget()
fuse: remove pages for requests and exclusively use folios
fuse: convert direct io to use folios
mm/writeback: add folio_mark_dirty_lock()
fuse: convert writebacks to use folios
fuse: convert retrieves to use folios
fuse: convert ioctls to use folios
fuse: convert writes (non-writeback) to use folios
fuse: convert reads to use folios
fuse: convert readdir to use folios
fuse: convert readlink to use folios
fuse: convert cuse to use folios
fuse: add support in virtio for requests using folios
fuse: support folios in struct fuse_args_pages and fuse_copy_pages()
fuse: convert fuse_notify_store to use folios
fuse: convert fuse_retrieve to use folios
fuse: use the folio based vmstat helpers
fuse: convert fuse_writepage_need_send to take a folio
fuse: convert fuse_do_readpage to use folios
...
All fuse requests use folios instead of pages for transferring data.
Remove pages from the requests and exclusively use folios.
No functional changes.
[SzM: rename back folio_descs -> descs, etc.]
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
This adds support in struct fuse_args_pages and fuse_copy_pages() for
using folios instead of pages for transferring data. Both folios and
pages must be supported right now in struct fuse_args_pages and
fuse_copy_pages() until all request types have been converted to use
folios. Once all have been converted, then
struct fuse_args_pages and fuse_copy_pages() will only support folios.
Right now in fuse, all folios are one page (large folios are not yet
supported). As such, copying folio->page is sufficient for copying
the entire folio in fuse_copy_pages().
No functional changes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
all failure exits prior to fdget() leave the scope, all matching fdput()
are immediately followed by leaving the scope.
[xfs_ioc_commit_range() chunk moved here as well]
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This function creates pages in an inode and copies data into them,
update the function to use a folio instead of a page, and use the
appropriate folio helpers.
[SzM: use filemap_grab_folio()]
[Hau Tao: The third argument of folio_zero_range() should be the length to
be zeroed, not the total length. Fix it by using folio_zero_segment()
instead in fuse_notify_store()]
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
We're just looking for pages in a mapping, use a folio and the folio
lookup function directly instead of using the page helper.
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
no_llseek had been defined to NULL two years ago, in commit 868941b144
("fs: remove no_llseek")
To quote that commit,
At -rc1 we'll need do a mechanical removal of no_llseek -
git grep -l -w no_llseek | grep -v porting.rst | while read i; do
sed -i '/\<no_llseek\>/d' $i
done
would do it.
Unfortunately, that hadn't been done. Linus, could you do that now, so
that we could finally put that thing to rest? All instances are of the
form
.llseek = no_llseek,
so it's obviously safe.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull fuse updates from Miklos Szeredi:
- Add support for idmapped fuse mounts (Alexander Mikhalitsyn)
- Add optimization when checking for writeback (yangyun)
- Add tracepoints (Josef Bacik)
- Clean up writeback code (Joanne Koong)
- Clean up request queuing (me)
- Misc fixes
* tag 'fuse-update-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (32 commits)
fuse: use exclusive lock when FUSE_I_CACHE_IO_MODE is set
fuse: clear FR_PENDING if abort is detected when sending request
fs/fuse: convert to use invalid_mnt_idmap
fs/mnt_idmapping: introduce an invalid_mnt_idmap
fs/fuse: introduce and use fuse_simple_idmap_request() helper
fs/fuse: fix null-ptr-deref when checking SB_I_NOIDMAP flag
fuse: allow O_PATH fd for FUSE_DEV_IOC_BACKING_OPEN
virtio_fs: allow idmapped mounts
fuse: allow idmapped mounts
fuse: warn if fuse_access is called when idmapped mounts are allowed
fuse: handle idmappings properly in ->write_iter()
fuse: support idmapped ->rename op
fuse: support idmapped ->set_acl
fuse: drop idmap argument from __fuse_get_acl
fuse: support idmapped ->setattr op
fuse: support idmapped ->permission inode op
fuse: support idmapped getattr inode op
fuse: support idmap for mkdir/mknod/symlink/create/tmpfile
fuse: support idmapped FUSE_EXT_GROUPS
fuse: add an idmap argument to fuse_simple_request
...
Pull 'struct fd' updates from Al Viro:
"Just the 'struct fd' layout change, with conversion to accessor
helpers"
* tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
add struct fd constructors, get rid of __to_fd()
struct fd: representation change
introduce fd_file(), convert all accessors to it.
If idmap == NULL *and* filesystem daemon declared idmapped mounts
support, then uid/gid values in a fuse header will be -1.
No functional changes intended.
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Pull fuse fixes from Miklos Szeredi:
- Fix EIO if splice and page stealing are enabled on the fuse device
- Disable problematic combination of passthrough and writeback-cache
- Other bug fixes found by code review
* tag 'fuse-fixes-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: disable the combination of passthrough and writeback cache
fuse: update stats for pages in dropped aux writeback list
fuse: clear PG_uptodate when using a stolen page
fuse: fix memory leak in fuse_create_open
fuse: check aborted connection before adding requests to pending list for resending
fuse: use unsigned type for getxattr/listxattr size truncation
I've been timing various fuse operations and it's quite annoying to do
with kprobes. Add two tracepoints for sending and ending fuse requests
to make it easier to debug and time various operations.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Virtiofs has its own queuing mechanism, but still requests are first queued
on fiq->pending to be immediately dequeued and queued onto the virtio
queue.
The queuing on fiq->pending is unnecessary and might even have some
performance impact due to being a contention point.
Forget requests are handled similarly.
Move the queuing of requests and forgets into the fiq->ops->*.
fuse_iqueue_ops are renamed to reflect the new semantics.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Fixed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Tested-by: Peter-Jan Gootzen <pgootzen@nvidia.com>
Reviewed-by: Peter-Jan Gootzen <pgootzen@nvidia.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Originally when a stolen page was inserted into fuse's page cache by
fuse_try_move_page(), it would be marked uptodate. Then
fuse_readpages_end() would call SetPageUptodate() again on the already
uptodate page.
Commit 413e8f014c ("fuse: Convert fuse_readpages_end() to use
folio_end_read()") changed that by replacing the SetPageUptodate() +
unlock_page() combination with folio_end_read(), which does mostly the
same, except it sets the uptodate flag with an xor operation, which in the
above scenario resulted in the uptodate flag being cleared, which in turn
resulted in EIO being returned on the read.
Fix by clearing PG_uptodate instead of setting it in fuse_try_move_page(),
conforming to the expectation of folio_end_read().
Reported-by: Jürg Billeter <j@bitron.ch>
Debugged-by: Matthew Wilcox <willy@infradead.org>
Fixes: 413e8f014c ("fuse: Convert fuse_readpages_end() to use folio_end_read()")
Cc: <stable@vger.kernel.org> # v6.10
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
There is a race condition where inflight requests will not be aborted if
they are in the middle of being re-sent when the connection is aborted.
If fuse_resend has already moved all the requests in the fpq->processing
lists to its private queue ("to_queue") and then the connection starts
and finishes aborting, these requests will be added to the pending queue
and remain on it indefinitely.
Fixes: 760eac73f9 ("fuse: Introduce a new notification type for resend pending requests")
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Cc: <stable@vger.kernel.org> # v6.9
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
fuse_notify_store(), unlike fuse_do_readpage(), does not enable page
zeroing (because it can be used to change partial page contents).
So fuse_notify_store() must be more careful to fully initialize page
contents (including parts of the page that are beyond end-of-file)
before marking the page uptodate.
The current code can leave beyond-EOF page contents uninitialized, which
makes these uninitialized page contents visible to userspace via mmap().
This is an information leak, but only affects systems which do not
enable init-on-alloc (via CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y or the
corresponding kernel command line parameter).
Link: https://bugs.chromium.org/p/project-zero/issues/detail?id=2574
Cc: stable@kernel.org
Fixes: a1d75f2582 ("fuse: add store request")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For any changes of struct fd representation we need to
turn existing accesses to fields into calls of wrappers.
Accesses to struct fd::flags are very few (3 in linux/file.h,
1 in net/socket.c, 3 in fs/overlayfs/file.c and 3 more in
explicit initializers).
Those can be dealt with in the commit converting to
new layout; accesses to struct fd::file are too many for that.
This commit converts (almost) all of f.file to
fd_file(f). It's not entirely mechanical ('file' is used as
a member name more than just in struct fd) and it does not
even attempt to distinguish the uses in pointer context from
those in boolean context; the latter will be eventually turned
into a separate helper (fd_empty()).
NOTE: mass conversion to fd_empty(), tempting as it
might be, is a bad idea; better do that piecewise in commit
that convert from fdget...() to CLASS(...).
[conflicts in fs/fhandle.c, kernel/bpf/syscall.c, mm/memcontrol.c
caught by git; fs/stat.c one got caught by git grep]
[fs/xattr.c conflict]
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The following warning was reported by lee bruce:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 8264 at fs/fuse/dev.c:300
fuse_request_end+0x685/0x7e0 fs/fuse/dev.c:300
Modules linked in:
CPU: 0 PID: 8264 Comm: ab2 Not tainted 6.9.0-rc7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
RIP: 0010:fuse_request_end+0x685/0x7e0 fs/fuse/dev.c:300
......
Call Trace:
<TASK>
fuse_dev_do_read.constprop.0+0xd36/0x1dd0 fs/fuse/dev.c:1334
fuse_dev_read+0x166/0x200 fs/fuse/dev.c:1367
call_read_iter include/linux/fs.h:2104 [inline]
new_sync_read fs/read_write.c:395 [inline]
vfs_read+0x85b/0xba0 fs/read_write.c:476
ksys_read+0x12f/0x260 fs/read_write.c:619
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xce/0x260 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
......
</TASK>
The warning is due to the FUSE_NOTIFY_RESEND notify sent by the write()
syscall in the reproducer program and it happens as follows:
(1) calls fuse_dev_read() to read the INIT request
The read succeeds. During the read, bit FR_SENT will be set on the
request.
(2) calls fuse_dev_write() to send an USE_NOTIFY_RESEND notify
The resend notify will resend all processing requests, so the INIT
request is moved from processing list to pending list again.
(3) calls fuse_dev_read() with an invalid output address
fuse_dev_read() will try to copy the same INIT request to the output
address, but it will fail due to the invalid address, so the INIT
request is ended and triggers the warning in fuse_request_end().
Fix it by clearing FR_SENT when re-adding requests into pending list.
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Reported-by: xingwei lee <xrivendell7@gmail.com>
Reported-by: yue sun <samsun1006219@gmail.com>
Closes: https://lore.kernel.org/linux-fsdevel/58f13e47-4765-fce4-daf4-dffcc5ae2330@huaweicloud.com/T/#m091614e5ea2af403b259e7cea6a49e51b9ee07a7
Fixes: 760eac73f9 ("fuse: Introduce a new notification type for resend pending requests")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
When fuse_resend() moves the requests from processing lists to pending
list, it uses __set_bit() to set FR_PENDING bit in req->flags.
Using __set_bit() is not safe, because other functions may update
req->flags concurrently (e.g., request_wait_answer() may call
set_bit(FR_INTERRUPTED, &flags)).
Fix it by using set_bit() instead.
Fixes: 760eac73f9 ("fuse: Introduce a new notification type for resend pending requests")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Some FUSE daemons want to know if the received request is a resend
request. The high bit of the fuse request ID is utilized for indicating
this, enabling the receiver to perform appropriate handling.
The init flag "FUSE_HAS_RESEND" is added to indicate this feature.
Signed-off-by: Zhao Chen <winters.zc@antgroup.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
When a FUSE daemon panics and failover, we aim to minimize the impact on
applications by reusing the existing FUSE connection. During this process,
another daemon is employed to preserve the FUSE connection's file
descriptor. The new started FUSE Daemon will takeover the fd and continue
to provide service.
However, it is possible for some inflight requests to be lost and never
returned. As a result, applications awaiting replies would become stuck
forever. To address this, we can resend these pending requests to the
new started FUSE daemon.
This patch introduces a new notification type "FUSE_NOTIFY_RESEND", which
can trigger resending of the pending requests, ensuring they are properly
processed again.
Signed-off-by: Zhao Chen <winters.zc@antgroup.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
FUSE server calls the FUSE_DEV_IOC_BACKING_OPEN ioctl with a backing file
descriptor. If the call succeeds, a backing file identifier is returned.
A later change will be using this backing file id in a reply to OPEN
request with the flag FOPEN_PASSTHROUGH to setup passthrough of file
operations on the open FUSE file to the backing file.
The FUSE server should call FUSE_DEV_IOC_BACKING_CLOSE ioctl to close the
backing file by its id.
This can be done at any time, but if an open reply with FOPEN_PASSTHROUGH
flag is still in progress, the open may fail if the backing file is
closed before the fuse file was opened.
Setting up backing files requires a server with CAP_SYS_ADMIN privileges.
For the backing file to be successfully setup, the backing file must
implement both read_iter and write_iter file operations.
The limitation on the level of filesystem stacking allowed for the
backing file is enforced before setting up the backing file.
Signed-off-by: Alessio Balsini <balsini@android.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Use correct function name in kernel-doc notation. (1)
Don't use "/**" to begin non-kernel-doc comments. (3)
Fixes these warnings:
fs/fuse/cuse.c:272: warning: expecting prototype for cuse_parse_dev_info(). Prototype was for cuse_parse_devinfo() instead
fs/fuse/dev.c:212: warning: expecting prototype for A new request is available, wake fiq(). Prototype was for fuse_dev_wake_and_unlock() instead
fs/fuse/dir.c:149: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* Mark the attributes as stale due to an atime change. Avoid the invalidate if
fs/fuse/file.c:656: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* In case of short read, the caller sets 'pos' to the position of
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Will need to add supplementary groups to create messages, so add the
general concept of a request extension. A request extension is appended to
the end of the main request. It has a header indicating the size and type
of the extension.
The create security context (fuse_secctx_*) is similar to the generic
request extension, so include that as well in a backward compatible manner.
Add the total extension length to the request header. The offset of the
extension block within the request can be calculated by:
inh->len - inh->total_extlen * 8
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Pull MM updates from Andrew Morton:
- More userfaultfs work from Peter Xu
- Several convert-to-folios series from Sidhartha Kumar and Huang Ying
- Some filemap cleanups from Vishal Moola
- David Hildenbrand added the ability to selftest anon memory COW
handling
- Some cpuset simplifications from Liu Shixin
- Addition of vmalloc tracing support by Uladzislau Rezki
- Some pagecache folioifications and simplifications from Matthew
Wilcox
- A pagemap cleanup from Kefeng Wang: we have VM_ACCESS_FLAGS, so use
it
- Miguel Ojeda contributed some cleanups for our use of the
__no_sanitize_thread__ gcc keyword.
This series should have been in the non-MM tree, my bad
- Naoya Horiguchi improved the interaction between memory poisoning and
memory section removal for huge pages
- DAMON cleanups and tuneups from SeongJae Park
- Tony Luck fixed the handling of COW faults against poisoned pages
- Peter Xu utilized the PTE marker code for handling swapin errors
- Hugh Dickins reworked compound page mapcount handling, simplifying it
and making it more efficient
- Removal of the autonuma savedwrite infrastructure from Nadav Amit and
David Hildenbrand
- zram support for multiple compression streams from Sergey Senozhatsky
- David Hildenbrand reworked the GUP code's R/O long-term pinning so
that drivers no longer need to use the FOLL_FORCE workaround which
didn't work very well anyway
- Mel Gorman altered the page allocator so that local IRQs can remnain
enabled during per-cpu page allocations
- Vishal Moola removed the try_to_release_page() wrapper
- Stefan Roesch added some per-BDI sysfs tunables which are used to
prevent network block devices from dirtying excessive amounts of
pagecache
- David Hildenbrand did some cleanup and repair work on KSM COW
breaking
- Nhat Pham and Johannes Weiner have implemented writeback in zswap's
zsmalloc backend
- Brian Foster has fixed a longstanding corner-case oddity in
file[map]_write_and_wait_range()
- sparse-vmemmap changes for MIPS, LoongArch and NIOS2 from Feiyang
Chen
- Shiyang Ruan has done some work on fsdax, to make its reflink mode
work better under xfstests. Better, but still not perfect
- Christoph Hellwig has removed the .writepage() method from several
filesystems. They only need .writepages()
- Yosry Ahmed wrote a series which fixes the memcg reclaim target
beancounting
- David Hildenbrand has fixed some of our MM selftests for 32-bit
machines
- Many singleton patches, as usual
* tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (313 commits)
mm/hugetlb: set head flag before setting compound_order in __prep_compound_gigantic_folio
mm: mmu_gather: allow more than one batch of delayed rmaps
mm: fix typo in struct pglist_data code comment
kmsan: fix memcpy tests
mm: add cond_resched() in swapin_walk_pmd_entry()
mm: do not show fs mm pc for VM_LOCKONFAULT pages
selftests/vm: ksm_functional_tests: fixes for 32bit
selftests/vm: cow: fix compile warning on 32bit
selftests/vm: madv_populate: fix missing MADV_POPULATE_(READ|WRITE) definitions
mm/gup_test: fix PIN_LONGTERM_TEST_READ with highmem
mm,thp,rmap: fix races between updates of subpages_mapcount
mm: memcg: fix swapcached stat accounting
mm: add nodes= arg to memory.reclaim
mm: disable top-tier fallback to reclaim on proactive reclaim
selftests: cgroup: make sure reclaim target memcg is unprotected
selftests: cgroup: refactor proactive reclaim code to reclaim_until()
mm: memcg: fix stale protection of reclaim target memcg
mm/mmap: properly unaccount memory on mas_preallocate() failure
omfs: remove ->writepage
jfs: remove ->writepage
...
Commit 8ed1f0e22f ("fs/fuse: fix ioctl type confusion") fixed a type
confusion bug by adding an ->f_op comparison.
Based on some off-list discussion back then, another check was added to
compare the f_cred->user_ns. This is not for security reasons, but was
based on the idea that a FUSE device FD should be using the UID/GID
mappings of its f_cred->user_ns, and those translations are done using
fc->user_ns, which matches the f_cred->user_ns of the initial FUSE device
FD thanks to the check in fuse_fill_super(). See also commit 8cb08329b0
("fuse: Support fuse filesystems outside of init_user_ns").
But FUSE_DEV_IOC_CLONE is, at a higher level, a *cloning* operation that
copies an existing context (with a weird API that involves first opening
/dev/fuse, then tying the resulting new FUSE device FD to an existing FUSE
instance). So if an application is already passing FUSE FDs across userns
boundaries and dealing with the resulting ID mapping complications somehow,
it doesn't make much sense to block this cloning operation.
I've heard that this check is an obstacle for some folks, and I don't see a
good reason to keep it, so remove it.
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Add a flag to entry expiration that lets the filesystem expire a dentry
without kicking it out from the cache immediately.
This makes a difference for overmounted dentries, where plain invalidation
would detach all submounts before dropping the dentry from the cache. If
only expiry is set on the dentry, then any overmounts are left alone and
until ->d_revalidate() is called.
Note: ->d_revalidate() is not called for the case of following a submount,
so invalidation will only be triggered for the non-overmounted case. The
dentry could also be mounted in a different mount instance, in which case
any submounts will still be detached.
Suggested-by: Jakob Blomer <jblomer@cern.ch>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>