Commit Graph

366 Commits

Author SHA1 Message Date
Anshuman Khandual
92868577d0 selftests/mm: fix compiler -Wmaybe-uninitialized warning
Following build warning comes up for cow test as 'transferred' variable has
not been initialized. Fix the warning via zero init for the variable.

  CC       cow
cow.c: In function `do_test_vmsplice_in_parent':
cow.c:365:61: warning: `transferred' may be used uninitialized [-Wmaybe-uninitialized]
  365 |                 cur = read(fds[0], new + total, transferred - total);
      |                                                 ~~~~~~~~~~~~^~~~~~~
cow.c:296:29: note: `transferred' was declared here
  296 |         ssize_t cur, total, transferred;
      |                             ^~~~~~~~~~~
  CC       compaction_test
  CC       gup_longterm

Link: https://lkml.kernel.org/r/20250409095006.1422620-1-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11 17:32:41 -07:00
Mark Brown
9c02223e2d selftests/mm: generate a temporary mountpoint for cgroup filesystem
Currently if the filesystem for the cgroups version it wants to use is not
mounted charge_reserved_hugetlb.sh and hugetlb_reparenting_test.sh tests
will attempt to mount it on the hard coded path /dev/cgroup/memory,
deleting that directory when the test finishes.  This will fail if there
is not a preexisting directory at that path, and since the directory is
deleted subsequent runs of the test will fail.  Instead of relying on this
hard coded directory name use mktemp to generate a temporary directory to
use as a mountpoint, fixing both the assumption and the disruption caused
by deleting a preexisting directory.

This means that if the relevant cgroup filesystem is not already mounted
then we rely on having coreutils (which provides mktemp) installed.  I
suspect that many current users are relying on having things automounted
by default, and given that the script relies on bash it's probably not an
unreasonable requirement.

Link: https://lkml.kernel.org/r/20250404-kselftest-mm-cgroup2-detection-v1-1-3dba6d32ba8c@kernel.org
Fixes: 209376ed2a ("selftests/vm: make charge_reserved_hugetlb.sh work with existing cgroup setting")
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: Aishwarya TCV <aishwarya.tcv@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11 17:32:37 -07:00
Li Wang
983e760bcd selftest/mm: va_high_addr_switch: add ppc64 support check
Add PPC64 Radix MMU support to the va_high_addr_switch.sh by introducing
check_supported_ppc64().  The function verifies:

  - 5-level paging (PGTABLE_LEVELS >= 5) enable in kernel config
  - Radix MMU (required for PPC64 5-level translation)
  - HugePages availability (needed for some tests)

If any check fails, the test is skipped (ksft_skip).  This ensures
compatibility with Power9/Power10 systems running in Radix MMU mode.

Avoid failures on 4-level paging system:

  # mmap(NULL, MAP_HUGETLB): 0xffffffffffffffff - FAILED
  # mmap(LOW_ADDR, MAP_HUGETLB): 0xffffffffffffffff - FAILED
  # mmap(HIGH_ADDR, MAP_HUGETLB): 0xffffffffffffffff - FAILED
  # mmap(HIGH_ADDR, MAP_HUGETLB) again: 0xffffffffffffffff - FAILED
  # mmap(HIGH_ADDR, MAP_FIXED | MAP_HUGETLB): 0xffffffffffffffff - FAILED
  # mmap(-1, MAP_HUGETLB): 0xffffffffffffffff - FAILED
  # mmap(-1, MAP_HUGETLB) again: 0xffffffffffffffff - FAILED
  # mmap(ADDR_SWITCH_HINT - PAGE_SIZE, 2*HUGETLB_SIZE, MAP_HUGETLB): 0xffffffffffffffff - FAILED
  # mmap(ADDR_SWITCH_HINT , 2*HUGETLB_SIZE, MAP_FIXED | MAP_HUGETLB): 0xffffffffffffffff - FAILED

Link: https://lkml.kernel.org/r/20250327114813.25980-1-liwang@redhat.com
Signed-off-by: Li Wang <liwang@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:11 -07:00
Linus Torvalds
eb0ece1602 Merge tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:

 - The series "Enable strict percpu address space checks" from Uros
   Bizjak uses x86 named address space qualifiers to provide
   compile-time checking of percpu area accesses.

   This has caused a small amount of fallout - two or three issues were
   reported. In all cases the calling code was found to be incorrect.

 - The series "Some cleanup for memcg" from Chen Ridong implements some
   relatively monir cleanups for the memcontrol code.

 - The series "mm: fixes for device-exclusive entries (hmm)" from David
   Hildenbrand fixes a boatload of issues which David found then using
   device-exclusive PTE entries when THP is enabled. More work is
   needed, but this makes thins better - our own HMM selftests now
   succeed.

 - The series "mm: zswap: remove z3fold and zbud" from Yosry Ahmed
   remove the z3fold and zbud implementations. They have been deprecated
   for half a year and nobody has complained.

 - The series "mm: further simplify VMA merge operation" from Lorenzo
   Stoakes implements numerous simplifications in this area. No runtime
   effects are anticipated.

 - The series "mm/madvise: remove redundant mmap_lock operations from
   process_madvise()" from SeongJae Park rationalizes the locking in the
   madvise() implementation. Performance gains of 20-25% were observed
   in one MADV_DONTNEED microbenchmark.

 - The series "Tiny cleanup and improvements about SWAP code" from
   Baoquan He contains a number of touchups to issues which Baoquan
   noticed when working on the swap code.

 - The series "mm: kmemleak: Usability improvements" from Catalin
   Marinas implements a couple of improvements to the kmemleak
   user-visible output.

 - The series "mm/damon/paddr: fix large folios access and schemes
   handling" from Usama Arif provides a couple of fixes for DAMON's
   handling of large folios.

 - The series "mm/damon/core: fix wrong and/or useless damos_walk()
   behaviors" from SeongJae Park fixes a few issues with the accuracy of
   kdamond's walking of DAMON regions.

 - The series "expose mapping wrprotect, fix fb_defio use" from Lorenzo
   Stoakes changes the interaction between framebuffer deferred-io and
   core MM. No functional changes are anticipated - this is preparatory
   work for the future removal of page structure fields.

 - The series "mm/damon: add support for hugepage_size DAMOS filter"
   from Usama Arif adds a DAMOS filter which permits the filtering by
   huge page sizes.

 - The series "mm: permit guard regions for file-backed/shmem mappings"
   from Lorenzo Stoakes extends the guard region feature from its
   present "anon mappings only" state. The feature now covers shmem and
   file-backed mappings.

 - The series "mm: batched unmap lazyfree large folios during
   reclamation" from Barry Song cleans up and speeds up the unmapping
   for pte-mapped large folios.

 - The series "reimplement per-vma lock as a refcount" from Suren
   Baghdasaryan puts the vm_lock back into the vma. Our reasons for
   pulling it out were largely bogus and that change made the code more
   messy. This patchset provides small (0-10%) improvements on one
   microbenchmark.

 - The series "Docs/mm/damon: misc DAMOS filters documentation fixes and
   improves" from SeongJae Park does some maintenance work on the DAMON
   docs.

 - The series "hugetlb/CMA improvements for large systems" from Frank
   van der Linden addresses a pile of issues which have been observed
   when using CMA on large machines.

 - The series "mm/damon: introduce DAMOS filter type for unmapped pages"
   from SeongJae Park enables users of DMAON/DAMOS to filter my the
   page's mapped/unmapped status.

 - The series "zsmalloc/zram: there be preemption" from Sergey
   Senozhatsky teaches zram to run its compression and decompression
   operations preemptibly.

 - The series "selftests/mm: Some cleanups from trying to run them" from
   Brendan Jackman fixes a pile of unrelated issues which Brendan
   encountered while runnimg our selftests.

 - The series "fs/proc/task_mmu: add guard region bit to pagemap" from
   Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to
   determine whether a particular page is a guard page.

 - The series "mm, swap: remove swap slot cache" from Kairui Song
   removes the swap slot cache from the allocation path - it simply
   wasn't being effective.

 - The series "mm: cleanups for device-exclusive entries (hmm)" from
   David Hildenbrand implements a number of unrelated cleanups in this
   code.

 - The series "mm: Rework generic PTDUMP configs" from Anshuman Khandual
   implements a number of preparatoty cleanups to the GENERIC_PTDUMP
   Kconfig logic.

 - The series "mm/damon: auto-tune aggregation interval" from SeongJae
   Park implements a feedback-driven automatic tuning feature for
   DAMON's aggregation interval tuning.

 - The series "Fix lazy mmu mode" from Ryan Roberts fixes some issues in
   powerpc, sparc and x86 lazy MMU implementations. Ryan did this in
   preparation for implementing lazy mmu mode for arm64 to optimize
   vmalloc.

 - The series "mm/page_alloc: Some clarifications for migratetype
   fallback" from Brendan Jackman reworks some commentary to make the
   code easier to follow.

 - The series "page_counter cleanup and size reduction" from Shakeel
   Butt cleans up the page_counter code and fixes a size increase which
   we accidentally added late last year.

 - The series "Add a command line option that enables control of how
   many threads should be used to allocate huge pages" from Thomas
   Prescher does that. It allows the careful operator to significantly
   reduce boot time by tuning the parallalization of huge page
   initialization.

 - The series "Fix calculations in trace_balance_dirty_pages() for cgwb"
   from Tang Yizhou fixes the tracing output from the dirty page
   balancing code.

 - The series "mm/damon: make allow filters after reject filters useful
   and intuitive" from SeongJae Park improves the handling of allow and
   reject filters. Behaviour is made more consistent and the documention
   is updated accordingly.

 - The series "Switch zswap to object read/write APIs" from Yosry Ahmed
   updates zswap to the new object read/write APIs and thus permits the
   removal of some legacy code from zpool and zsmalloc.

 - The series "Some trivial cleanups for shmem" from Baolin Wang does as
   it claims.

 - The series "fs/dax: Fix ZONE_DEVICE page reference counts" from
   Alistair Popple regularizes the weird ZONE_DEVICE page refcount
   handling in DAX, permittig the removal of a number of special-case
   checks.

 - The series "refactor mremap and fix bug" from Lorenzo Stoakes is a
   preparatoty refactoring and cleanup of the mremap() code.

 - The series "mm: MM owner tracking for large folios (!hugetlb) +
   CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in
   which we determine whether a large folio is known to be mapped
   exclusively into a single MM.

 - The series "mm/damon: add sysfs dirs for managing DAMOS filters based
   on handling layers" from SeongJae Park adds a couple of new sysfs
   directories to ease the management of DAMON/DAMOS filters.

 - The series "arch, mm: reduce code duplication in mem_init()" from
   Mike Rapoport consolidates many per-arch implementations of
   mem_init() into code generic code, where that is practical.

 - The series "mm/damon/sysfs: commit parameters online via
   damon_call()" from SeongJae Park continues the cleaning up of sysfs
   access to DAMON internal data.

 - The series "mm: page_ext: Introduce new iteration API" from Luiz
   Capitulino reworks the page_ext initialization to fix a boot-time
   crash which was observed with an unusual combination of compile and
   cmdline options.

 - The series "Buddy allocator like (or non-uniform) folio split" from
   Zi Yan reworks the code to split a folio into smaller folios. The
   main benefit is lessened memory consumption: fewer post-split folios
   are generated.

 - The series "Minimize xa_node allocation during xarry split" from Zi
   Yan reduces the number of xarray xa_nodes which are generated during
   an xarray split.

 - The series "drivers/base/memory: Two cleanups" from Gavin Shan
   performs some maintenance work on the drivers/base/memory code.

 - The series "Add tracepoints for lowmem reserves, watermarks and
   totalreserve_pages" from Martin Liu adds some more tracepoints to the
   page allocator code.

 - The series "mm/madvise: cleanup requests validations and
   classifications" from SeongJae Park cleans up some warts which
   SeongJae observed during his earlier madvise work.

 - The series "mm/hwpoison: Fix regressions in memory failure handling"
   from Shuai Xue addresses two quite serious regressions which Shuai
   has observed in the memory-failure implementation.

 - The series "mm: reliable huge page allocator" from Johannes Weiner
   makes huge page allocations cheaper and more reliable by reducing
   fragmentation.

 - The series "Minor memcg cleanups & prep for memdescs" from Matthew
   Wilcox is preparatory work for the future implementation of memdescs.

 - The series "track memory used by balloon drivers" from Nico Pache
   introduces a way to track memory used by our various balloon drivers.

 - The series "mm/damon: introduce DAMOS filter type for active pages"
   from Nhat Pham permits users to filter for active/inactive pages,
   separately for file and anon pages.

 - The series "Adding Proactive Memory Reclaim Statistics" from Hao Jia
   separates the proactive reclaim statistics from the direct reclaim
   statistics.

 - The series "mm/vmscan: don't try to reclaim hwpoison folio" from
   Jinjiang Tu fixes our handling of hwpoisoned pages within the reclaim
   code.

* tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (431 commits)
  mm/page_alloc: remove unnecessary __maybe_unused in order_to_pindex()
  x86/mm: restore early initialization of high_memory for 32-bits
  mm/vmscan: don't try to reclaim hwpoison folio
  mm/hwpoison: introduce folio_contain_hwpoisoned_page() helper
  cgroup: docs: add pswpin and pswpout items in cgroup v2 doc
  mm: vmscan: split proactive reclaim statistics from direct reclaim statistics
  selftests/mm: speed up split_huge_page_test
  selftests/mm: uffd-unit-tests support for hugepages > 2M
  docs/mm/damon/design: document active DAMOS filter type
  mm/damon: implement a new DAMOS filter type for active pages
  fs/dax: don't disassociate zero page entries
  MM documentation: add "Unaccepted" meminfo entry
  selftests/mm: add commentary about 9pfs bugs
  fork: use __vmalloc_node() for stack allocation
  docs/mm: Physical Memory: Populate the "Zones" section
  xen: balloon: update the NR_BALLOON_PAGES state
  hv_balloon: update the NR_BALLOON_PAGES state
  balloon_compaction: update the NR_BALLOON_PAGES state
  meminfo: add a per node counter for balloon drivers
  mm: remove references to folio in __memcg_kmem_uncharge_page()
  ...
2025-04-01 09:29:18 -07:00
Linus Torvalds
2d09a9449e Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
 "Nothing major this time around.

  Apart from the usual perf/PMU updates, some page table cleanups, the
  notable features are average CPU frequency based on the AMUv1
  counters, CONFIG_HOTPLUG_SMT and MOPS instructions (memcpy/memset) in
  the uaccess routines.

  Perf and PMUs:

   - Support for the 'Rainier' CPU PMU from Arm

   - Preparatory driver changes and cleanups that pave the way for BRBE
     support

   - Support for partial virtualisation of the Apple-M1 PMU

   - Support for the second event filter in Arm CSPMU designs

   - Minor fixes and cleanups (CMN and DWC PMUs)

   - Enable EL2 requirements for FEAT_PMUv3p9

  Power, CPU topology:

   - Support for AMUv1-based average CPU frequency

   - Run-time SMT control wired up for arm64 (CONFIG_HOTPLUG_SMT). It
     adds a generic topology_is_primary_thread() function overridden by
     x86 and powerpc

  New(ish) features:

   - MOPS (memcpy/memset) support for the uaccess routines

  Security/confidential compute:

   - Fix the DMA address for devices used in Realms with Arm CCA. The
     CCA architecture uses the address bit to differentiate between
     shared and private addresses

   - Spectre-BHB: assume CPUs Linux doesn't know about vulnerable by
     default

  Memory management clean-ups:

   - Drop the P*D_TABLE_BIT definition in preparation for 128-bit PTEs

   - Some minor page table accessor clean-ups

   - PIE/POE (permission indirection/overlay) helpers clean-up

  Kselftests:

   - MTE: skip hugetlb tests if MTE is not supported on such mappings
     and user correct naming for sync/async tag checking modes

  Miscellaneous:

   - Add a PKEY_UNRESTRICTED definition as 0 to uapi (toolchain people
     request)

   - Sysreg updates for new register fields

   - CPU type info for some Qualcomm Kryo cores"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (72 commits)
  arm64: mm: Don't use %pK through printk
  perf/arm_cspmu: Fix missing io.h include
  arm64: errata: Add newer ARM cores to the spectre_bhb_loop_affected() lists
  arm64: cputype: Add MIDR_CORTEX_A76AE
  arm64: errata: Add KRYO 2XX/3XX/4XX silver cores to Spectre BHB safe list
  arm64: errata: Assume that unknown CPUs _are_ vulnerable to Spectre BHB
  arm64: errata: Add QCOM_KRYO_4XX_GOLD to the spectre_bhb_k24_list
  arm64/sysreg: Enforce whole word match for open/close tokens
  arm64/sysreg: Fix unbalanced closing block
  arm64: Kconfig: Enable HOTPLUG_SMT
  arm64: topology: Support SMT control on ACPI based system
  arch_topology: Support SMT control for OF based system
  cpu/SMT: Provide a default topology_is_primary_thread()
  arm64/mm: Define PTDESC_ORDER
  perf/arm_cspmu: Add PMEVFILT2R support
  perf/arm_cspmu: Generalise event filtering
  perf/arm_cspmu: Move register definitons to header
  arm64/kernel: Always use level 2 or higher for early mappings
  arm64/mm: Drop PXD_TABLE_BIT
  arm64/mm: Check pmd_table() in pmd_trans_huge()
  ...
2025-03-25 13:16:16 -07:00
Linus Torvalds
df00ded23a Merge tag 'vfs-6.15-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs pidfs updates from Christian Brauner:

 - Allow retrieving exit information after a process has been reaped
   through pidfds via the new PIDFD_INTO_EXIT extension for the
   PIDFD_GET_INFO ioctl. Various tools need access to information about
   a process/task even after it has already been reaped.

   Pidfd polling allows waiting on either task exit or for a task to
   have been reaped. The contract for PIDFD_INFO_EXIT is simply that
   EPOLLHUP must be observed before exit information can be retrieved,
   i.e., exit information is only provided once the task has been reaped
   and then can be retrieved as long as the pidfd is open.

 - Add PIDFD_SELF_{THREAD,THREAD_GROUP} sentinels allowing userspace to
   forgo allocating a file descriptor for their own process. This is
   useful in scenarios where users want to act on their own process
   through pidfds and is akin to AT_FDCWD.

 - Improve premature thread-group leader and subthread exec behavior
   when polling on pidfds:

   (1) During a multi-threaded exec by a subthread, i.e.,
       non-thread-group leader thread, all other threads in the
       thread-group including the thread-group leader are killed and the
       struct pid of the thread-group leader will be taken over by the
       subthread that called exec. IOW, two tasks change their TIDs.

   (2) A premature thread-group leader exit means that the thread-group
       leader exited before all of the other subthreads in the
       thread-group have exited.

   Both cases lead to inconsistencies for pidfd polling with
   PIDFD_THREAD. Any caller that holds a PIDFD_THREAD pidfd to the
   current thread-group leader may or may not see an exit notification
   on the file descriptor depending on when poll is performed. If the
   poll is performed before the exec of the subthread has concluded an
   exit notification is generated for the old thread-group leader. If
   the poll is performed after the exec of the subthread has concluded
   no exit notification is generated for the old thread-group leader.

   The correct behavior is to simply not generate an exit notification
   on the struct pid of a subhthread exec because the struct pid is
   taken over by the subthread and thus remains alive.

   But this is difficult to handle because a thread-group may exit
   premature as mentioned in (2). In that case an exit notification is
   reliably generated but the subthreads may continue to run for an
   indeterminate amount of time and thus also may exec at some point.

   After this pull no exit notifications will be generated for a
   PIDFD_THREAD pidfd for a thread-group leader until all subthreads
   have been reaped. If a subthread should exec before no exit
   notification will be generated until that task exits or it creates
   subthreads and repeates the cycle.

   This means an exit notification indicates the ability for the father
   to reap the child.

* tag 'vfs-6.15-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits)
  selftests/pidfd: third test for multi-threaded exec polling
  selftests/pidfd: second test for multi-threaded exec polling
  selftests/pidfd: first test for multi-threaded exec polling
  pidfs: improve multi-threaded exec and premature thread-group leader exit polling
  pidfs: ensure that PIDFS_INFO_EXIT is available
  selftests/pidfd: add seventh PIDFD_INFO_EXIT selftest
  selftests/pidfd: add sixth PIDFD_INFO_EXIT selftest
  selftests/pidfd: add fifth PIDFD_INFO_EXIT selftest
  selftests/pidfd: add fourth PIDFD_INFO_EXIT selftest
  selftests/pidfd: add third PIDFD_INFO_EXIT selftest
  selftests/pidfd: add second PIDFD_INFO_EXIT selftest
  selftests/pidfd: add first PIDFD_INFO_EXIT selftest
  selftests/pidfd: expand common pidfd header
  pidfs/selftests: ensure correct headers for ioctl handling
  selftests/pidfd: fix header inclusion
  pidfs: allow to retrieve exit information
  pidfs: record exit code and cgroupid at exit
  pidfs: use private inode slab cache
  pidfs: move setting flags into pidfs_alloc_file()
  pidfd: rely on automatic cleanup in __pidfd_prepare()
  ...
2025-03-24 10:16:37 -07:00
Ryan Roberts
a2c6f9c3ca selftests/mm: speed up split_huge_page_test
create_pagecache_thp_and_fd() was previously writing a file sized at twice
the PMD size by making a per-byte write syscall.  This was quite slow when
the PMD size is 4M, but completely intolerable for 32M (PMD size for
arm64's 16K page size), and 512M (PMD size for arm64's 64K page size).

The byte pattern has a 256 byte period, so let's create a 1K buffer and
fill it with exactly 4 periods.  Then we can write the buffer as many
times as is required to fill the file.  This makes things much more
tolerable.

The test now passes for 16K page size.  It still fails for 64K page size
because MAX_PAGECACHE_ORDER is too small for 512M folio size (I think).

Link: https://lkml.kernel.org/r/20250318174343.243631-3-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Acked-by: Peter Xu <peterx@redhat.com>
Acked-by: Rafael Aquini <raquini@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21 22:03:16 -07:00
Ryan Roberts
735b3f7e77 selftests/mm: uffd-unit-tests support for hugepages > 2M
uffd-unit-tests uses a memory area with a fixed 32M size.  Then it
calculates the number of pages by dividing by page_size, which itself is
either the base page size or the PMD huge page size depending on the test
config.  For the latter, we end up with nr_pages=1 for arm64 16K base
pages, and nr_pages=0 for 64K base pages.  This doesn't end well.

So let's make the 32M size a floor and also ensure that we have at least 2
pages given the PMD size.  With this change, the tests pass on arm64 64K
base page size configuration.

Link: https://lkml.kernel.org/r/20250318174343.243631-2-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Acked-by: Peter Xu <peterx@redhat.com>
Acked-by: Rafael Aquini <raquini@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21 22:03:15 -07:00
Brendan Jackman
d8a866c766 selftests/mm: add commentary about 9pfs bugs
As discussed here:

https://lore.kernel.org/lkml/Z9RRkL1hom48z3Tt@google.com/

This code could benefit from some more commentary.

To avoid needing to comment the same thing in multiple places (I guess
more of these SKIPs will need to be added over time, for now I am only
like 20% of the way through Project Run run_vmtests.sh Successfully), add
a dummy "skip tests for this specific reason" function that basically just
serves as a hook to hang comments on.

Link: https://lkml.kernel.org/r/20250317-9pfs-comments-v1-1-9ac96043e146@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21 22:03:14 -07:00
Cyan Yang
f841ad9ca5 selftests/mm/cow: fix the incorrect error handling
Error handling doesn't check the correct return value.  This patch will
fix it.

Link: https://lkml.kernel.org/r/20250312043840.71799-1-cyan.yang@sifive.com
Fixes: f4b5fd6946 ("selftests/vm: anon_cow: THP tests")
Signed-off-by: Cyan Yang <cyan.yang@sifive.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17 22:07:06 -07:00
Zi Yan
80a5c494c8 selftests/mm: add tests for folio_split(), buddy allocator like split
It splits page cache folios to orders from 0 to 8 at different in-folio
offset.

Link: https://lkml.kernel.org/r/20250307174001.242794-9-ziy@nvidia.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <yang@os.amperecomputing.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Kairui Song <kasong@tencent.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17 22:07:00 -07:00
Lorenzo Stoakes
f3b92176f4 tools/selftests: add guard region test for /proc/$pid/pagemap
Add a test to the guard region self tests to assert that the
/proc/$pid/pagemap information now made availabile to the user correctly
identifies and reports guard regions.

As a part of this change, update vm_util.h to add the new bit (note there
is no header file in the kernel where this is exposed, the user is
expected to provide their own mask) and utilise the helper functions there
for pagemap functionality.

[lorenzo.stoakes@oracle.com: fixup define name]
  Link: https://lkml.kernel.org/r/32e83941-e6f5-42ee-9292-a44c16463cf1@lucifer.local
Link: https://lkml.kernel.org/r/164feb0a43ae72650e6b20c3910213f469566311.1740139449.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:41 -07:00
Brendan Jackman
1ddae9d67e selftests/mm/mlock: print error on failure
It's not really possible to start diagnosing this without knowing the
actual error.

Also update the mlock2 helper to behave like libc would by setting errno
and returning -1.

Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-12-dec210a658f5@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:40 -07:00
Brendan Jackman
5d2146a335 selftests/mm: skip mlock tests if nobody user can't read it
If running from a directory that can't be read by unprivileged users,
executing on-fault-test via the nobody user will fail.

The kselftest build does give the file the correct permissions, but after
being installed it might be in a directory without global execute
permissions.

Since the script can't safely fix that, just skip if it happens.  Note
that the stderr of the `ls` command is unfiltered meaning the user sees a
"permission denied" error that can help inform them why the test was
skipped.

Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-11-dec210a658f5@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:40 -07:00
Brendan Jackman
f896c6de83 selftests/mm: ensure uffd-wp-mremap gets pages of each size
This test allocates a page of every available size and doesn't have any
SKIP logic if the allocation fails.  So, ensure it's available and skip
the test if we can't do so.

Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-10-dec210a658f5@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:39 -07:00
Brendan Jackman
e9269b2cc4 selftests/mm: drop unnecessary sudo usage
This script must be run as root anyway (see all the writing to privileged
files in /proc etc).

Remove the unnecessary use of sudo to avoid breaking on single-user
systems that don't have sudo.  This also avoids confusing readers.

Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-9-dec210a658f5@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:39 -07:00
Brendan Jackman
32b42970e8 selftests/mm: skip gup_longterm tests on weird filesystems
Some filesystems don't support ftruncate()ing unlinked files.  They return
ENOENT.  In that case, skip the test.

Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-8-dec210a658f5@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:39 -07:00
Brendan Jackman
571a4b62ed selftests/mm: skip map_populate on weird filesystems
It seems that 9pfs does not allow truncating unlinked files, Mark Brown
has noted that NFS may also behave this way.

It doesn't seem quite right to call this a "bug" but it's probably a
special enough case that it makes sense for the test to just SKIP if it
happens.

Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-7-dec210a658f5@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:39 -07:00
Brendan Jackman
bf6d575e24 selftests/mm: don't fail uffd-stress if too many CPUs
This calculation divides a fixed parameter by an environment-dependent
parameter i.e.  the number of CPUs.

The simple way to avoid machine-specific failures here is to just put a
cap on the max value of the latter.

Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-6-dec210a658f5@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Suggested-by: Mateusz Guzik <mjguzik@gmail.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:38 -07:00
Brendan Jackman
db0f1c138f selftests/mm: print some details when uffd-stress gets bad params
So this can be debugged more easily.

Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-5-dec210a658f5@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:38 -07:00
Brendan Jackman
f3b5535abc selftests/mm/uffd: rename nr_cpus -> nr_parallel
A later commit will bound this variable so it no longer necessarily
matches the number of CPUs.  Rename it appropriately.

Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-4-dec210a658f5@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:38 -07:00
Brendan Jackman
f4b3e6c7f1 selftests/mm: skip uffd-wp-mremap if userfaultfd not available
It's obvious that this should fail in that case, but still, save the
reader the effort of figuring out that they've run into this by just
SKIPping

Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-3-dec210a658f5@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:38 -07:00
Brendan Jackman
0046dbed80 selftests/mm: skip uffd-stress if userfaultfd not available
It's pretty obvious that the test wouldn't work if you don't have the
feature enabled.  But, it's still useful to SKIP instead of failing so the
reader can immediately tell that this is the reason why.

Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-2-dec210a658f5@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:37 -07:00
Brendan Jackman
800ddf3cd7 selftests/mm: report errno when things fail in gup_longterm
Patch series "selftests/mm: Some cleanups from trying to run them", v4.

I never had much luck running mm selftests so I spent a few hours digging
into why.

Looks like most of the reason is missing SKIP checks, so this series is
just adding a bunch of those that I found.  I did not do anything like all
of them, just the ones I spotted in gup_longterm, gup_test, mmap,
userfaultfd and memfd_secret.

It's a bit unfortunate to have to skip those tests when ftruncate() fails,
but I don't have time to dig deep enough into it to actually make them
pass.  I have observed the issue on 9pfs and heard rumours that NFS has a
similar problem.

I'm now able to run these test groups successfully:

- mmap
- gup_test
- compaction
- migration
- page_frag
- userfaultfd
- mlock

I've never gone past "Waiting for hugetlb memory to get depleted", in the
hugetlb tests.  I don't know if they are stuck or if they would eventually
work if I was patient enough (testing on a 1G machine).  I have not
investigated further.

I had some issues with mlock tests failing due to -ENOSRCH from mlock2(),
I can no longer reproduce that though, things work OK now.

Of the remaining tests there may be others that work fine, but there's no
convenient way to survey the whole output of run_vmtests.sh so I'm just
going test by test here.

In my spare moments I am slowly chipping away at a setup to run these
tests continuously in a reasonably hermetic QEMU environment via
virtme-ng:

5fad4b9c59/README.md

Hopefully that will eventually offer a way to provide a "canned"
environment where the tests are known to work, which can be fairly easily
reproduced by any developer.


This patch (of 12):

Just reporting failure doesn't tell you what went wrong.  This can fail in
different ways so report errno to help the reader get started debugging.

Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-0-dec210a658f5@google.com
Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-1-dec210a658f5@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:37 -07:00
Ujwal Kundur
af3b45aac5 selftests/mm: fix spelling
Fix misspelling flagged by codespell.

Link: https://lkml.kernel.org/r/20250215081803.1793-1-ujwal.kundur@gmail.com
Signed-off-by: Ujwal Kundur <ujwal.kundur@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:23 -07:00
Lorenzo Stoakes
0b6d4853d1 tools/selftests: add file/shmem-backed mapping guard region tests
Extend the guard region self tests to explicitly assert that guard regions
work correctly for functionality specific to file-backed and shmem
mappings.

In addition to testing all of the existing guard region functionality that
is currently tested against anonymous mappings against file-backed and
shmem mappings (except those which are exclusive to anonymous mapping), we
now also:

* Test that MADV_SEQUENTIAL does not cause unexpected readahead behaviour.
* Test that MAP_PRIVATE behaves as expected with guard regions installed in
  both a shared and private mapping of an fd.
* Test that a read-only file can correctly establish guard regions.
* Test a probable fault-around case does not interfere with guard regions
  (or vice-versa).
* Test that truncation does not eliminate guard regions.
* Test that hole punching functions as expected in the presence of guard
  regions.
* Test that a read-only mapping of a memfd write sealed mapping can have
  guard regions established within it and function correctly without
  violation of the seal.
* Test that guard regions installed into a mapping of the anonymous zero
  page function correctly.

Link: https://lkml.kernel.org/r/90c16bec5fcaafcd1700dfa3e9988c3e1aa9ac1d.1739469950.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:15 -07:00
Lorenzo Stoakes
272f37d3e9 tools/selftests: expand all guard region tests to file-backed
Extend the guard region tests to allow for test fixture variants for anon,
shmem, and local file files.

This allows us to assert that each of the expected behaviours of anonymous
memory also applies correctly to file-backed (both shmem and an a file
created locally in the current working directory) and thus asserts the
same correctness guarantees as all the remaining tests do.

The fixture teardown is now performed in the parent process rather than
child forked ones, meaning cleanup is always performed, including
unlinking any generated temporary files.

Additionally the variant fixture data type now contains an enum value
indicating the type of backing store and the mmap() invocation is
abstracted to allow for the mapping of whichever backing store the variant
is testing.

We adjust tests as necessary to account for the fact they may now
reference files rather than anonymous memory.

Link: https://lkml.kernel.org/r/ab42228d2bd9b8aa18e9faebcd5c88732a7e5820.1739469950.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:15 -07:00
Lorenzo Stoakes
ce1c0824fc selftests/mm: rename guard-pages to guard-regions
The feature formerly referred to as guard pages is more correctly referred
to as 'guard regions', as in fact no pages are ever allocated in the
process of installing the regions.

To avoid confusion, rename the tests accordingly.

[lorenzo.stoakes@oracle.com: fix guard regions invocation]
  Link: https://lkml.kernel.org/r/13426c71-d069-4407-9340-b227ff8b8736@lucifer.local
Link: https://lkml.kernel.org/r/1c3cd04a3f69b5756b94bda701ac88325a9be18b.1739469950.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:15 -07:00
Mark Brown
85968b6a20 selftests/mm: allow tests to run with no huge pages support
Currently the mm selftests refuse to run if huge pages are not available
in the current system but this is an optional feature and not all the
tests actually require them.  Change the test during startup to be
non-fatal and skip or omit tests which actually rely on having huge pages,
allowing the other tests to be run.

The gup_test does support using madvise() to configure huge pages but it
ignores the error code so we just let it run.

Link: https://lkml.kernel.org/r/20250212-kselftest-mm-no-hugepages-v1-2-44702f538522@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Nico Pache <npache@redhat.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:14 -07:00
Eric Salem
3fae696393 selftests: mm: fix typo
Fix misspelling.

Link: https://lkml.kernel.org/r/77e0e915-36c3-4c95-84b8-0b73aaa17951@gmail.com
Signed-off-by: Eric Salem <ericsalem@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:11 -07:00
Mark Brown
023fff71d8 selftests/mm: fix thuge-gen test name uniqueness
The thuge-gen test_mmap() and test_shmget() tests are repeatedly run for a
variety of sizes but always report the result of their test with the same
name, meaning that automated sysetms running the tests are unable to
distinguish between the various tests.  Add the supplied sizes to the
logged test names to distinguish between runs.

My test automation was getting pretty confused about what was going on
- the test names are a pretty important external interface.

Link: https://lkml.kernel.org/r/20250204-kselftest-mm-fix-dups-v1-1-6afe417ef4bb@kernel.org
Fixes: b38bd9b2c4 ("selftests/mm: thuge-gen: conform to TAP format output")
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:03 -07:00
Zi Yan
ad4c9bb541 selftests/mm: test splitting file-backed THP to any lower order
Now split_huge_page*() supports shmem THP split to any lower order.  Test
it.

The test now reads file content out after split to check if the split
corrupts the file data.

Link: https://lkml.kernel.org/r/20250122161928.1240637-3-ziy@nvidia.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <yang@os.amperecomputing.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:05:56 -07:00
Zi Yan
035a112e5f selftests/mm: make file-backed THP split work by writing PMD size data
Commit acd7ccb284 ("mm: shmem: add large folio support for tmpfs")
changes huge=always to allocate THP/mTHP based on write size and
split_huge_page_test does not write PMD size data, so file-back THP is not
created during the test.  Fix it by writing PMD size data.

Link: https://lkml.kernel.org/r/20250122161928.1240637-1-ziy@nvidia.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <yang@os.amperecomputing.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:05:56 -07:00
Rafael Aquini
67a2f86846 selftests/mm: run_vmtests.sh: fix half_ufd_size_MB calculation
We noticed that uffd-stress test was always failing to run when invoked
for the hugetlb profiles on x86_64 systems with a processor count of 64 or
bigger:

  ...
  # ------------------------------------
  # running ./uffd-stress hugetlb 128 32
  # ------------------------------------
  # ERROR: invalid MiB (errno=9, @uffd-stress.c:459)
  ...
  # [FAIL]
  not ok 3 uffd-stress hugetlb 128 32 # exit=1
  ...

The problem boils down to how run_vmtests.sh (mis)calculates the size of
the region it feeds to uffd-stress.  The latter expects to see an amount
of MiB while the former is just giving out the number of free hugepages
halved down.  This measurement discrepancy ends up violating uffd-stress'
assertion on number of hugetlb pages allocated per CPU, causing it to bail
out with the error above.

This commit fixes that issue by adjusting run_vmtests.sh's
half_ufd_size_MB calculation so it properly renders the region size in
MiB, as expected, while maintaining all of its original constraints in
place.

Link: https://lkml.kernel.org/r/20250218192251.53243-1-aquini@redhat.com
Fixes: 2e47a445d7 ("selftests/mm: run_vmtests.sh: fix hugetlb mem size calculation")
Signed-off-by: Rafael Aquini <raquini@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 17:40:25 -07:00
John Hubbard
0a7565ee6e Revert "selftests/mm: remove local __NR_* definitions"
This reverts commit a5c6bc5900.

The general approach described in commit e076eaca59 ("selftests: break
the dependency upon local header files") was taken one step too far here:
it should not have been extended to include the syscall numbers.  This is
because doing so would require per-arch support in tools/include/uapi, and
no such support exists.

This revert fixes two separate reports of test failures, from Dave
Hansen[1], and Li Wang[2].  An excerpt of Dave's report:

Before this commit (a5c6bc5900) things are
fine.  But after, I get:

	running PKEY tests for unsupported CPU/OS

An excerpt of Li's report:

    I just found that mlock2_() return a wrong value in mlock2-test

[1] https://lore.kernel.org/dc585017-6740-4cab-a536-b12b37a7582d@intel.com
[2] https://lore.kernel.org/CAEemH2eW=UMu9+turT2jRie7+6ewUazXmA6kL+VBo3cGDGU6RA@mail.gmail.com

Link: https://lkml.kernel.org/r/20250214033850.235171-1-jhubbard@nvidia.com
Fixes: a5c6bc5900 ("selftests/mm: remove local __NR_* definitions")
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Li Wang <liwang@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Andrei Vagin <avagin@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-05 21:36:12 -08:00
Mark Brown
5dcf52e2ce selftests/mm: fix check for running THP tests
When testing if we should try to compact memory or drop caches before we
run the THP or HugeTLB tests we use | as an or operator.  This doesn't
work since run_vmtests.sh is written in shell where this is used to pipe
the output of the first argument into the second.  Instead use the shell's
-o operator.

Link: https://lkml.kernel.org/r/20250212-kselftest-mm-no-hugepages-v1-1-44702f538522@kernel.org
Fixes: b433ffa8db ("selftests: mm: perform some system cleanup before using hugepages")
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Nico Pache <npache@redhat.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-17 22:40:04 -08:00
Yury Khrustalev
3809cefe93 selftests/mm: Use PKEY_UNRESTRICTED macro
Replace literal 0 with macro PKEY_UNRESTRICTED where pkey_*() functions
are used in mm selftests for memory protection keys.

Signed-off-by: Yury Khrustalev <yury.khrustalev@arm.com>
Suggested-by: Joey Gouly <joey.gouly@arm.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20250113170619.484698-3-yury.khrustalev@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-02-17 18:16:36 +00:00
Lorenzo Stoakes
62d648c442 selftests/mm: use PIDFD_SELF in guard pages test
Now we have PIDFD_SELF available for process_madvise(), make use of it in
the guard pages test.

This is both more convenient and asserts that PIDFD_SELF works as expected.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Link: https://lore.kernel.org/r/69fbbe088d3424de9983e145228459cb05a8f13d.1738268370.git.lorenzo.stoakes@oracle.com
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-05 15:14:55 +01:00
liuye
7882d8fc8f selftests/mm/mkdirty: fix memory leak in test_uffdio_copy()
Release memory before exception branch returns to prevent memory leaks

Checking tools/testing/selftests/mm/mkdirty.c ...
tools/testing/selftests/mm/mkdirty.c:283:3: error: Memory leak: src [memleak]
  return;
  ^

Link: https://lkml.kernel.org/r/20250114023838.48589-1-liuye@kylinos.cn
Signed-off-by: liuye <liuye@kylinos.cn>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25 20:22:45 -08:00
Thomas Weißschuh
3bd6137220 selftests/mm: virtual_address_range: avoid reading from VM_IO mappings
The virtual_address_range selftest reads from the start of each mapping
listed in /proc/self/maps.  However not all mappings are valid to be
arbitrarily accessed.

For example the vvar data used for virtual clocks on x86 [vvar_vclock] can
only be accessed if 1) the kernel configuration enables virtual clocks and
2) the hypervisor provided the data for it.  Only the VDSO itself has the
necessary information to know this.  Since commit e93d2521b2 ("x86/vdso:
Split virtual clock pages into dedicated mapping") the virtual clock data
was split out into its own mapping, leading to EFAULT from read() during
the validation.

Check for the VM_IO flag as a proxy.  It is present for the VVAR mappings
and MMIO ranges can be dangerous to access arbitrarily.

Link: https://lkml.kernel.org/r/20250114-virtual_address_range-tests-v4-4-6fd7269934a5@linutronix.de
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202412271148.2656e485-lkp@intel.com
Fixes: e93d2521b2 ("x86/vdso: Split virtual clock pages into dedicated mapping")
Fixes: 0104096498 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Suggested-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/lkml/e97c2a5d-c815-4936-a767-ac42a3220a90@redhat.com/
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25 20:22:44 -08:00
Thomas Weißschuh
3c479b5dc6 selftests/mm: vm_util: split up /proc/self/smaps parsing
Upcoming changes want to reuse the /proc/self/smaps parsing logic to parse
the VmFlags field.

As that works differently from the currently parsed HugePage counters,
split up the logic so common functionality can be shared.

While reworking this code, also use the correct sscanf placeholder for the
"uint64_t thp" variable.

Link: https://lkml.kernel.org/r/20250114-virtual_address_range-tests-v4-3-6fd7269934a5@linutronix.de
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: kernel test robot <oliver.sang@intel.com>
Cc: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25 20:22:44 -08:00
Thomas Weißschuh
b2a79f6213 selftests/mm: virtual_address_range: unmap chunks after validation
For each accessed chunk a PTE is created.  More than 1GiB of PTEs is used
in this way.  Remove each PTE after validating a chunk to reduce peak
memory usage.

It is important to only unmap memory that previously mmap()ed, as
unmapping other mappings like the stack, heap or executable mappings will
crash the process.

The mappings read from /proc/self/maps and the return values from mmap()
don't allow a simple correlation due to merging and no guaranteed order. 
To correlate the pointers and mappings use prctl(PR_SET_VMA_ANON_NAME). 
While it introduces a test dependency, other alternatives would introduce
runtime or development overhead.

Link: https://lkml.kernel.org/r/20250114-virtual_address_range-tests-v4-2-6fd7269934a5@linutronix.de
Fixes: 0104096498 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: kernel test robot <oliver.sang@intel.com>
Cc: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25 20:22:44 -08:00
Thomas Weißschuh
a005145b9c selftests/mm: virtual_address_range: mmap() without PROT_WRITE
Patch series "selftests/mm: virtual_address_range: Reduce memory", v4.

The selftest started failing since commit e93d2521b2 ("x86/vdso: Split
virtual clock pages into dedicated mapping") was merged.  While debugging
I stumbled upon some memory usage optimizations.

With these test now runs on a VM with only 60MiB of memory.


This patch (of 4):

When mapping a larger chunk than physical memory is available with
PROT_WRITE and overcommit is disabled, the mapping will fail.  This will
prevent the test from running on systems with less then ~1GiB of memory
and triggering an inscrutinable test failure.  As the mappings are never
written to anyways, the flag can be removed.

Link: https://lkml.kernel.org/r/20250114-virtual_address_range-tests-v4-0-6fd7269934a5@linutronix.de
Link: https://lkml.kernel.org/r/20250114-virtual_address_range-tests-v4-1-6fd7269934a5@linutronix.de
Fixes: 4e5ce33ceb ("selftests/vm: add a test for virtual address range mapping")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Dev Jain <dev.jain@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Cc: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25 20:22:44 -08:00
Hao Ge
bf069012df selftests/mm/cow: modify the incorrect checking parameters
In run_with_memfd_hugetlb(), some error handle have passed incorrect
parameters.  It should be "smem", but it was mistakenly written as "mem".

Let's fix it.

[gehao@kylinos.cn: fix other errant sites, per Anshuman]
  Link: https://lkml.kernel.org/r/20250113050908.93638-1-hao.ge@linux.dev
Link: https://lkml.kernel.org/r/20250113032858.63670-1-hao.ge@linux.dev
Fixes: f8664f3c4a ("selftests/vm: cow: basic COW tests for non-anonymous pages")
Signed-off-by: Hao Ge <gehao@kylinos.cn>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25 20:22:41 -08:00
Zi Yan
dfe61db4a1 selftests/mm: add tests for splitting pmd THPs to all lower orders
Kernel already supports splitting a folio to any lower order. Test it.

[ziy@nvidia.com: no need to test splitting to order-1]
 Link: https://lkml.kernel.org/r/DDA202EA-4664-4F50-A7FD-B00CBB7A624B@nvidia.com
Link: https://lkml.kernel.org/r/20250110235028.96824-2-ziy@nvidia.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Cc: Alexander Zhu <alexlzhu@fb.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Usama Arif <usamaarif642@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25 20:22:40 -08:00
Zi Yan
136c5b40e0 selftests/mm: use selftests framework to print test result
Otherwise the number of tests does not match the reality.

Link: https://lkml.kernel.org/r/20250110235028.96824-1-ziy@nvidia.com
Fixes: 391e869711 ("mm: selftest to verify zero-filled pages are mapped to zeropage")
Signed-off-by: Zi Yan <ziy@nvidia.com>
Cc: Alexander Zhu <alexlzhu@fb.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Usama Arif <usamaarif642@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25 20:22:40 -08:00
Ryan Roberts
b2466bb3b4 selftests/mm: introduce uffd-wp-mremap regression test
Introduce a test that registers a range of memory for
UFFDIO_WRITEPROTECT_MODE_WP without UFFD_FEATURE_EVENT_REMAP.  First check
that the uffd-wp bit is set for every PTE in the range.  Then mremap() the
range to a new location and check that the uffd-wp bit is clear for every
PTE in the range.

Run the test for small folios, all supported THP sizes and all supported
hugetlb sizes, and for swapped out memory, shared and private.

There was previously a bug in the kernel where the uffd-wp bits remained
set in all PTEs for this case, after fixing the kernel, the tests all
pass.

Link: https://lkml.kernel.org/r/20250107144755.1871363-3-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25 20:22:31 -08:00
Donet Tom
901083d8f5 selftests/mm: add new test cases to the migration test
Added three new test cases to the migration tests:

1. Shared anon THP migration test
This test will mmap shared anon memory, madvise it to
MADV_HUGEPAGE, then do migration entry testing. One thread
will move pages back and forth between nodes whilst other
threads try and access them.

2. Private anon hugetlb migration test
This test will mmap private anon hugetlb memory and then
do the migration entry testing.

3. Shared anon hugetlb migration test
This test will mmap shared anon hugetlb memory and then
do the migration entry testing.

Test results
============
 # ./tools/testing/selftests/mm/migration
 TAP version 13
 1..6
 # Starting 6 tests from 1 test cases.
 #  RUN           migration.private_anon ...
 #            OK  migration.private_anon
 ok 1 migration.private_anon
 #  RUN           migration.shared_anon ...
 #            OK  migration.shared_anon
 ok 2 migration.shared_anon
 #  RUN           migration.private_anon_thp ...
 #            OK  migration.private_anon_thp
 ok 3 migration.private_anon_thp
 #  RUN           migration.shared_anon_thp ...
 #            OK  migration.shared_anon_thp
 ok 4 migration.shared_anon_thp
 #  RUN           migration.private_anon_htlb ...
 #            OK  migration.private_anon_htlb
 ok 5 migration.private_anon_htlb
 #  RUN           migration.shared_anon_htlb ...
 #            OK  migration.shared_anon_htlb
 ok 6 migration.shared_anon_htlb
 # PASSED: 6 / 6 tests passed.
 # Totals: pass:6 fail:0 xfail:0 xpass:0 skip:0 error:0
 #

Link: https://lkml.kernel.org/r/20241219102720.4487-1-donettom@linux.ibm.com
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25 20:22:21 -08:00
Kevin Brodsky
08cc4c398e selftests/mm: remove X permission from sigaltstack mapping
There is no reason why the alternate signal stack should be mapped as RWX.
Map it as RW instead.

Link: https://lkml.kernel.org/r/20241209095019.1732120-15-kevin.brodsky@arm.com
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Keith Lucas <keith.lucas@oracle.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13 22:40:56 -08:00
Kevin Brodsky
1c6b1d4889 selftests/mm: skip pkey_sighandler_tests if support is missing
The pkey_sighandler_tests are bound to fail if either the kernel or CPU
doesn't support pkeys.  Skip the tests if pkeys support is missing.

Link: https://lkml.kernel.org/r/20241209095019.1732120-14-kevin.brodsky@arm.com
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Keith Lucas <keith.lucas@oracle.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13 22:40:56 -08:00