Commit 8d391972ae ("gfs2: Remove __gfs2_writepage()") changed the log
flush code in gfs2_ail1_start_one() to call aops->writepages() instead
of aops->writepage(). For jdata inodes, this means that we will now try
to reserve log space and start a transaction before we can determine
that the pages in question have already been journaled. When this
happens in the context of gfs2_logd(), it can now appear that not enough
log space is available for freeing up log space, and we will lock up.
Fix that by issuing journal writes directly instead of going through
aops->writepages() in the log flush code.
Fixes: 8d391972ae ("gfs2: Remove __gfs2_writepage()")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
__gfs2_jdata_write_folio can't return AOP_WRITEPAGE_ACTIVATE, so don't
check for it in gfs2_write_jdata_batch.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
There are no remaining callers of gfs2_jdata_writepage() other than
vmscan, which is known to do more harm than good.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
For journalled data, folio migration currently works by writing the folio
back, freeing the folio and faulting the new folio back in. We can
bypass that by telling the migration code to migrate the buffer_heads
attached to our folios.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Pull gfs2 updates from Andreas Gruenbacher:
- Add support for non-blocking lookup (MAY_NOT_BLOCK / LOOKUP_RCU)
- Various minor fixes and cleanups
* tag 'gfs2-v6.7-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Fix freeze consistency check in log_write_header
gfs2: Refcounting fix in gfs2_thaw_super
gfs2: Minor gfs2_{freeze,thaw}_super cleanup
gfs2: Use wait_event_freezable_timeout() for freezable kthread
gfs2: Add missing set_freezable() for freezable kthread
gfs2: Remove use of error flag in journal reads
gfs2: Lift withdraw check out of gfs2_ail1_empty
gfs2: Rename gfs2_withdrawn to gfs2_withdrawing_or_withdrawn
gfs2: Mark withdraws as unlikely
gfs2: Minor gfs2_ail1_empty cleanup
gfs2: use is_subdir()
gfs2: d_obtain_alias(ERR_PTR(...)) will do the right thing
gfs2: Use GL_NOBLOCK flag for non-blocking lookups
gfs2: Add GL_NOBLOCK flag
gfs2: rgrp: fix kernel-doc warnings
gfs2: fix kernel BUG in gfs2_quota_cleanup
gfs2: Fix inode_go_instantiate description
gfs2: Fix kernel NULL pointer dereference in gfs2_rgrp_dump
This function checks whether the filesystem has been been marked to be
withdrawn eventually or has been withdrawn already. Rename this
function to avoid confusing code like checking for gfs2_withdrawing()
when gfs2_withdrawn() has already returned true.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Mark the gfs2_withdrawn(), gfs2_withdrawing(), and
gfs2_withdraw_in_prog() inline functions as likely to return %false.
This allows to get rid of likely() and unlikely() annotations at the
call sites of those functions.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Pull gfs2 updates from Andreas Gruenbacher:
- Don't update inode timestamps for direct writes (performance
regression fix)
- Skip no-op quota records instead of panicing
- Fix a RCU race in gfs2_permission()
- Various other smaller fixes and cleanups all over the place
* tag 'gfs2-v6.6-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (24 commits)
gfs2: don't withdraw if init_threads() got interrupted
gfs2: remove dead code in add_to_queue
gfs2: Fix slab-use-after-free in gfs2_qd_dealloc
gfs2: Silence "suspicious RCU usage in gfs2_permission" warning
gfs2: fs: derive f_fsid from s_uuid
gfs2: No longer use 'extern' in function declarations
gfs2: Rename gfs2_lookup_{ simple => meta }
gfs2: Convert gfs2_internal_read to folios
gfs2: Convert stuffed_readpage to folios
gfs2: Minor gfs2_write_jdata_batch PAGE_SIZE cleanup
gfs2: Get rid of gfs2_alloc_blocks generation parameter
gfs2: Add metapath_dibh helper
gfs2: Clean up quota.c:print_message
gfs2: Clean up gfs2_alloc_parms initializers
gfs2: Two quota=account mode fixes
gfs2: Stop using GFS2_BASIC_BLOCK and GFS2_BASIC_BLOCK_SHIFT
gfs2: setattr_chown: Add missing initialization
gfs2: fix an oops in gfs2_permission
gfs2: ignore negated quota changes
gfs2: Don't update inode timestamps for direct writes
...
In gfs2_write_jdata_batch(), to compute the number of blocks, compute
the total size of the folio batch instead of the number of pages it
contains. Not a functional change.
Note that we don't currently allow mounting filesystems with a block
size bigger than the page size. We could change that after converting
the page cache to folios. The page cache would then only contain
block-size or bigger folios, so rounding wouldn't become an issue here.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Those helpers don't add any clarity and are easy to use wrong. Spell
them out to make more obvious what's happening.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Pull gfs2 updates from Andreas Gruenbacher:
- Fix a glock state (non-)transition bug when a dlm request times out
and is canceled, and we have locking requests that can now be granted
immediately
- Various fixes and cleanups in how the logd and quotad daemons are
woken up and terminated
- Fix several bugs in the quota data reference counting and shrinking.
Free quota data objects synchronously in put_super() instead of
letting call_rcu() run wild
- Make sure not to deallocate quota data during a withdraw; rather,
defer quota data deallocation to put_super(). Withdraws can happen in
contexts in which callers on the stack are holding quota data
references
- Many minor quota fixes and cleanups by Bob
- Update the the mailing list address for gfs2 and dlm. (It's the same
list for both and we are moving it to gfs2@lists.linux.dev)
- Various other minor cleanups
* tag 'gfs2-v6.5-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (51 commits)
MAINTAINERS: Update dlm mailing list
MAINTAINERS: Update gfs2 mailing list
gfs2: change qd_slot_count to qd_slot_ref
gfs2: check for no eligible quota changes
gfs2: Remove useless assignment
gfs2: simplify slot_get
gfs2: Simplify qd2offset
gfs2: introduce qd_bh_get_or_undo
gfs2: Remove quota allocation info from quota file
gfs2: use constant for array size
gfs2: Set qd_sync_gen in do_sync
gfs2: Remove useless err set
gfs2: Small gfs2_quota_lock cleanup
gfs2: move qdsb_put and reduce redundancy
gfs2: improvements to sysfs status
gfs2: Don't try to sync non-changes
gfs2: Simplify function need_sync
gfs2: remove unneeded pg_oflow variable
gfs2: remove unneeded variable done
gfs2: pass sdp to gfs2_write_buf_to_page
...
First, function gfs2_ail_flush_reqd checks the SDF_FORCE_AIL_FLUSH flag
to determine if an AIL flush should be forced in low-memory situations.
However, it also immediately clears the flag, and when called repeatedly
as in function gfs2_logd, the flag will be lost. Fix that by pulling
the SDF_FORCE_AIL_FLUSH flag check out of gfs2_ail_flush_reqd.
Second, function gfs2_writepages sets the SDF_FORCE_AIL_FLUSH flag
whether or not enough pages were written. If enough pages could be
written, flushing the AIL is unnecessary, though.
Third, gfs2_writepages doesn't wake up logd after setting the
SDF_FORCE_AIL_FLUSH flag, so it can take a long time for logd to react.
It would be preferable to wake up logd, but that hurts the performance
of some workloads and we don't quite understand why so far, so don't
wake up logd so far.
Fixes: b066a4eebd ("gfs2: forcibly flush ail to relieve memory pressure")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Simplify code pattern of 'folio->index + folio_nr_pages(folio)' by using
the existing helper folio_next_index().
Signed-off-by: Minjie Du <duminjie@vivo.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
When filesystem blocksize is less than folio size (either with
mapping_large_folio_support() or with blocksize < pagesize) and when the
folio is uptodate in pagecache, then even a byte write can cause
an entire folio to be written to disk during writeback. This happens
because we currently don't have a mechanism to track per-block dirty
state within struct iomap_folio_state. We currently only track uptodate
state.
This patch implements support for tracking per-block dirty state in
iomap_folio_state->state bitmap. This should help improve the filesystem
write performance and help reduce write amplification.
Performance testing of below fio workload reveals ~16x performance
improvement using nvme with XFS (4k blocksize) on Power (64K pagesize)
FIO reported write bw scores improved from around ~28 MBps to ~452 MBps.
1. <test_randwrite.fio>
[global]
ioengine=psync
rw=randwrite
overwrite=1
pre_read=1
direct=0
bs=4k
size=1G
dir=./
numjobs=8
fdatasync=1
runtime=60
iodepth=64
group_reporting=1
[fio-run]
2. Also our internal performance team reported that this patch improves
their database workload performance by around ~83% (with XFS on Power)
Reported-by: Aravinda Herle <araherle@in.ibm.com>
Reported-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Pull gfs2 updates from Andreas Gruenbacher:
- Move the freeze/thaw logic from glock callback context to process /
worker thread context to prevent deadlocks
- Fix a quota reference couting bug in do_qc()
- Carry on deallocating inodes even when gfs2_rindex_update() fails
- Retry filesystem-internal reads when they are interruped by a signal
- Eliminate kmap_atomic() in favor of kmap_local_page() /
memcpy_{from,to}_page()
- Get rid of noop_direct_IO
- And a few more minor fixes and cleanups
* tag 'gfs2-v6.4-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (23 commits)
gfs2: Add quota_change type
gfs2: Use memcpy_{from,to}_page where appropriate
gfs2: Convert remaining kmap_atomic calls to kmap_local_page
gfs2: Replace deprecated kmap_atomic with kmap_local_page
gfs: Get rid of unnucessary locking in inode_go_dump
gfs2: gfs2_freeze_lock_shared cleanup
gfs2: Replace sd_freeze_state with SDF_FROZEN flag
gfs2: Rework freeze / thaw logic
gfs2: Rename SDF_{FS_FROZEN => FREEZE_INITIATOR}
gfs2: Reconfiguring frozen filesystem already rejected
gfs2: Rename gfs2_freeze_lock{ => _shared }
gfs2: Rename the {freeze,thaw}_super callbacks
gfs2: Rename remaining "transaction" glock references
gfs2: retry interrupted internal reads
gfs2: Fix possible data races in gfs2_show_options()
gfs2: Fix duplicate should_fault_in_pages() call
gfs2: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method
gfs2: Don't remember delete unless it's successful
gfs2: Update rl_unlinked before releasing rgrp lock
gfs2: Fix gfs2_qa_get imbalance in gfs2_quota_hold
...
Replace kmap_local_page() + memcpy() + kunmap_local() sequences with
memcpy_{from,to}_page() where we are not doing anything else with the
mapped page.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
kmap_atomic() is deprecated in favor of kmap_local_{folio,page}().
Therefore, replace kmap_atomic() with kmap_local_page() in
gfs2_internal_read() and stuffed_readpage().
kmap_atomic() disables page-faults and preemption (the latter only for
!PREEMPT_RT kernels), However, the code within the mapping/un-mapping in
gfs2_internal_read() and stuffed_readpage() does not depend on the
above-mentioned side effects.
Therefore, a mere replacement of the old API with the new one is all that
is required (i.e., there is no need to explicitly add any calls to
pagefault_disable() and/or preempt_disable()).
Signed-off-by: Deepak R Varma <drv@mailo.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
The iomap-based read operations done by gfs2 for its system files, such
as rindex, may sometimes be interrupted and return -EINTR. This
confuses some users of gfs2_internal_read(). Fix that by retrying
interrupted reads.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Since commit a2ad63daa8 ("VFS: add FMODE_CAN_ODIRECT file flag"), file
systems can just set the FMODE_CAN_ODIRECT flag at open time instead of
wiring up a dummy direct_IO method to indicate support for direct I/O.
Remove .direct_IO from gfs2_aops and set FMODE_CAN_ODIRECT in
gfs2_open_common for regular files that do not use data journalling.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Pull MM updates from Andrew Morton:
- Daniel Verkamp has contributed a memfd series ("mm/memfd: add
F_SEAL_EXEC") which permits the setting of the memfd execute bit at
memfd creation time, with the option of sealing the state of the X
bit.
- Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
thread-safe for pmd unshare") which addresses a rare race condition
related to PMD unsharing.
- Several folioification patch serieses from Matthew Wilcox, Vishal
Moola, Sidhartha Kumar and Lorenzo Stoakes
- Johannes Weiner has a series ("mm: push down lock_page_memcg()")
which does perform some memcg maintenance and cleanup work.
- SeongJae Park has added DAMOS filtering to DAMON, with the series
"mm/damon/core: implement damos filter".
These filters provide users with finer-grained control over DAMOS's
actions. SeongJae has also done some DAMON cleanup work.
- Kairui Song adds a series ("Clean up and fixes for swap").
- Vernon Yang contributed the series "Clean up and refinement for maple
tree".
- Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It
adds to MGLRU an LRU of memcgs, to improve the scalability of global
reclaim.
- David Hildenbrand has added some userfaultfd cleanup work in the
series "mm: uffd-wp + change_protection() cleanups".
- Christoph Hellwig has removed the generic_writepages() library
function in the series "remove generic_writepages".
- Baolin Wang has performed some maintenance on the compaction code in
his series "Some small improvements for compaction".
- Sidhartha Kumar is doing some maintenance work on struct page in his
series "Get rid of tail page fields".
- David Hildenbrand contributed some cleanup, bugfixing and
generalization of pte management and of pte debugging in his series
"mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with
swap PTEs".
- Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
flag in the series "Discard __GFP_ATOMIC".
- Sergey Senozhatsky has improved zsmalloc's memory utilization with
his series "zsmalloc: make zspage chain size configurable".
- Joey Gouly has added prctl() support for prohibiting the creation of
writeable+executable mappings.
The previous BPF-based approach had shortcomings. See "mm: In-kernel
support for memory-deny-write-execute (MDWE)".
- Waiman Long did some kmemleak cleanup and bugfixing in the series
"mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".
- T.J. Alumbaugh has contributed some MGLRU cleanup work in his series
"mm: multi-gen LRU: improve".
- Jiaqi Yan has provided some enhancements to our memory error
statistics reporting, mainly by presenting the statistics on a
per-node basis. See the series "Introduce per NUMA node memory error
statistics".
- Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
regression in compaction via his series "Fix excessive CPU usage
during compaction".
- Christoph Hellwig does some vmalloc maintenance work in the series
"cleanup vfree and vunmap".
- Christoph Hellwig has removed block_device_operations.rw_page() in
ths series "remove ->rw_page".
- We get some maple_tree improvements and cleanups in Liam Howlett's
series "VMA tree type safety and remove __vma_adjust()".
- Suren Baghdasaryan has done some work on the maintainability of our
vm_flags handling in the series "introduce vm_flags modifier
functions".
- Some pagemap cleanup and generalization work in Mike Rapoport's
series "mm, arch: add generic implementation of pfn_valid() for
FLATMEM" and "fixups for generic implementation of pfn_valid()"
- Baoquan He has done some work to make /proc/vmallocinfo and
/proc/kcore better represent the real state of things in his series
"mm/vmalloc.c: allow vread() to read out vm_map_ram areas".
- Jason Gunthorpe rationalized the GUP system's interface to the rest
of the kernel in the series "Simplify the external interface for
GUP".
- SeongJae Park wishes to migrate people from DAMON's debugfs interface
over to its sysfs interface. To support this, we'll temporarily be
printing warnings when people use the debugfs interface. See the
series "mm/damon: deprecate DAMON debugfs interface".
- Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
and clean-ups" series.
- Huang Ying has provided a dramatic reduction in migration's TLB flush
IPI rates with the series "migrate_pages(): batch TLB flushing".
- Arnd Bergmann has some objtool fixups in "objtool warning fixes".
* tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits)
include/linux/migrate.h: remove unneeded externs
mm/memory_hotplug: cleanup return value handing in do_migrate_range()
mm/uffd: fix comment in handling pte markers
mm: change to return bool for isolate_movable_page()
mm: hugetlb: change to return bool for isolate_hugetlb()
mm: change to return bool for isolate_lru_page()
mm: change to return bool for folio_isolate_lru()
objtool: add UACCESS exceptions for __tsan_volatile_read/write
kmsan: disable ftrace in kmsan core code
kasan: mark addr_has_metadata __always_inline
mm: memcontrol: rename memcg_kmem_enabled()
sh: initialize max_mapnr
m68k/nommu: add missing definition of ARCH_PFN_OFFSET
mm: percpu: fix incorrect size in pcpu_obj_full_size()
maple_tree: reduce stack usage with gcc-9 and earlier
mm: page_alloc: call panic() when memoryless node allocation fails
mm: multi-gen LRU: avoid futile retries
migrate_pages: move THP/hugetlb migration support check to simplify code
migrate_pages: batch flushing TLB
migrate_pages: share more code between _unmap and _move
...
Convert gfs2_page_add_databufs() to folios and rename it to
gfs2_trans_add_databufs().
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
The ->writepage() and ->writepages() operations are supposed to write
entire pages. However, on filesystems with a block size smaller than
PAGE_SIZE, __gfs2_jdata_writepage() only adds the first block to the
current transaction instead of adding the entire page. Fix that.
Fixes: 18ec7d5c3f ("[GFS2] Make journaled data files identical to normal files on disk")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Convert function to use folios throughout. This is in preparation for the
removal of find_get_pgaes_range_tag(). This change removes 8 calls to
compound_head().
Also had to modify and rename gfs2_write_jdata_pagevec() to take in and
utilize folio_batch rather than pagevec and use folios rather than pages.
gfs2_write_jdata_batch() now supports large folios.
Link: https://lkml.kernel.org/r/20230104211448.4804-18-vishal.moola@gmail.com
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Check if the inode size of stuffed (inline) inodes is within the allowed
range when reading inodes from disk (gfs2_dinode_in()). This prevents
us from on-disk corruption.
The two checks in stuffed_readpage() and gfs2_unstuffer_page() that just
truncate inline data to the maximum allowed size don't actually make
sense, and they can be removed now as well.
Reported-by: syzbot+7bb81dfa9cda07d9cd9d@syzkaller.appspotmail.com
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Pull more iomap updates from Darrick Wong:
"In the past 10 days or so I've not heard any ZOMG STOP style
complaints about removing ->writepage support from gfs2 or zonefs, so
here's the pull request removing them (and the underlying fs iomap
support) from the kernel:
- Remove iomap_writepage and all callers, since the mm apparently
never called the zonefs or gfs2 writepage functions"
* tag 'iomap-6.0-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: remove iomap_writepage
zonefs: remove ->writepage
gfs2: remove ->writepage
gfs2: stop using generic_writepages in gfs2_ail1_start_one
There is nothing iomap-specific about iomap_migratepage(), and it fits
a pattern used by several other filesystems, so move it to mm/migrate.c,
convert it to be filemap_migrate_folio() and convert the iomap filesystems
to use it.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
->writepage is only used for single page writeback from memory reclaim,
and not called at all for cgroup writeback. Follow the lead of XFS
and remove ->writepage and rely entirely on ->writepages.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
All but two of the callers already have a folio; pass a folio into
try_to_free_buffers(). This removes the last user of cancel_dirty_page()
so remove that wrapper function too.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Change all the filesystems which used iomap_releasepage to use the
new function.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
By making filler_t the same as read_folio, we can use the same function
for both in gfs2. We can push the use of folios down one more level
in jffs2 and nfs. We also increase type safety for future users of the
various read_cache_page() family of functions by forcing the parameter
to be a pointer to struct file (or NULL).
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
mpage_readpage still works in terms of pages, and has not been audited
for correctness with large folios, so include an assertion that the
filesystem is not passing it large folios. Convert all the filesystems
to call mpage_read_folio() instead of mpage_readpage().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>