linux

mirror of https://github.com/raspberrypi/linux.git synced 2025-12-06 01:49:46 +00:00

Author	SHA1	Message	Date
Chris Li	bba717bbc4	MAINTAINERS: add Chris and Kairui as the swap maintainer We have been collaborating on a systematic effort to clean up and improve the Linux swap system, and might as well take responsibility for it. Link: https://lkml.kernel.org/r/20251102-swap-m-v1-1-582f275d5bce@kernel.org Signed-off-by: Chris Li <chrisl@kernel.org> Acked-by: Kairui Song <kasong@tencent.com> Acked-by: Barry Song <baohua@kernel.org> Acked-by: Baoquan He <bhe@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-09 21:19:46 -08:00
Lance Yang	6f86d0534f	mm/secretmem: fix use-after-free race in fault handler When a page fault occurs in a secret memory file created with `memfd_secret(2)`, the kernel will allocate a new folio for it, mark the underlying page as not-present in the direct map, and add it to the file mapping. If two tasks cause a fault in the same page concurrently, both could end up allocating a folio and removing the page from the direct map, but only one would succeed in adding the folio to the file mapping. The task that failed undoes the effects of its attempt by (a) freeing the folio again and (b) putting the page back into the direct map. However, by doing these two operations in this order, the page becomes available to the allocator again before it is placed back in the direct mapping. If another task attempts to allocate the page between (a) and (b), and the kernel tries to access it via the direct map, it would result in a supervisor not-present page fault. Fix the ordering to restore the direct map before the folio is freed. Link: https://lkml.kernel.org/r/20251031120955.92116-1-lance.yang@linux.dev Fixes: `1507f51255` ("mm: introduce memfd_secret system call to create "secret" memory areas") Signed-off-by: Lance Yang <lance.yang@linux.dev> Reported-by: Google Big Sleep <big-sleep-vuln-reports@google.com> Closes: https://lore.kernel.org/linux-mm/CAEXGt5QeDpiHTu3K9tvjUTPqo+d-=wuCNYPa+6sWKrdQJ-ATdg@mail.gmail.com/ Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-09 21:19:46 -08:00
Catalin Marinas	adfb6609c6	mm/huge_memory: initialise the tags of the huge zero folio On arm64 with MTE enabled, a page mapped as Normal Tagged (PROT_MTE) in user space will need to have its allocation tags initialised. This is normally done in the arm64 set_pte_at() after checking the memory attributes. Such page is also marked with the PG_mte_tagged flag to avoid subsequent clearing. Since this relies on having a struct page, pte_special() mappings are ignored. Commit `d82d09e482` ("mm/huge_memory: mark PMD mappings of the huge zero folio special") maps the huge zero folio special and the arm64 set_pmd_at() will no longer zero the tags. There is no guarantee that the tags are zero, especially if parts of this huge page have been previously tagged. It's fairly easy to detect this by regularly dropping the caches to force the reallocation of the huge zero folio. Allocate the huge zero folio with the __GFP_ZEROTAGS flag. In addition, do not warn in the arm64 __access_remote_tags() when reading tags from the huge zero page. I bundled the arm64 change in here as well since they are both related to the commit mapping the huge zero folio as special. [catalin.marinas@arm.com: handle arch mte_zero_clear_page_tags() code issuing MTE instructions] Link: https://lkml.kernel.org/r/aQi8dA_QpXM8XqrE@arm.com Link: https://lkml.kernel.org/r/20251031170133.280742-1-catalin.marinas@arm.com Fixes: `d82d09e482` ("mm/huge_memory: mark PMD mappings of the huge zero folio special") Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lance Yang <lance.yang@linux.dev> Tested-by: Beleswar Padhi <b-padhi@ti.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Aishwarya TCV <aishwarya.tcv@arm.com> Cc: David Hildenbrand (Red Hat) <david@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-09 21:19:46 -08:00
Edward Adam Davis	9a6b60cb14	nilfs2: avoid having an active sc_timer before freeing sci Because kthread_stop did not stop sc_task properly and returned -EINTR, the sc_timer was not properly closed, ultimately causing the problem [1] reported by syzbot when freeing sci due to the sc_timer not being closed. Because the thread sc_task main function nilfs_segctor_thread() returns 0 when it succeeds, when the return value of kthread_stop() is not 0 in nilfs_segctor_destroy(), we believe that it has not properly closed sc_timer. We use timer_shutdown_sync() to sync wait for sc_timer to shutdown, and set the value of sc_task to NULL under the protection of lock sc_state_lock, so as to avoid the issue caused by sc_timer not being properly shutdowned. [1] ODEBUG: free active (active state 0) object: 00000000dacb411a object type: timer_list hint: nilfs_construction_timeout Call trace: nilfs_segctor_destroy fs/nilfs2/segment.c:2811 [inline] nilfs_detach_log_writer+0x668/0x8cc fs/nilfs2/segment.c:2877 nilfs_put_super+0x4c/0x12c fs/nilfs2/super.c:509 Link: https://lkml.kernel.org/r/20251029225226.16044-1-konishi.ryusuke@gmail.com Fixes: `3f66cc261c` ("nilfs2: use kthread_create and kthread_stop for the log writer thread") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+24d8b70f039151f65590@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=24d8b70f039151f65590 Tested-by: syzbot+24d8b70f039151f65590@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Cc: <stable@vger.kernel.org> [6.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-09 21:19:46 -08:00
Carlos Llamas	7d9f7d390f	scripts/decode_stacktrace.sh: fix build ID and PC source parsing Support for parsing PC source info in stacktraces (e.g. '(P)') was added in commit `2bff77c665` ("scripts/decode_stacktrace.sh: fix decoding of lines with an additional info"). However, this logic was placed after the build ID processing. This incorrect order fails to parse lines containing both elements, e.g.: drm_gem_mmap_obj+0x114/0x200 [drm 03d0564e0529947d67bb2008c3548be77279fd27] (P) This patch fixes the problem by extracting the PC source info first and then processing the module build ID. With this change, the line above is now properly parsed as such: drm_gem_mmap_obj (./include/linux/mmap_lock.h:212 ./include/linux/mm.h:811 drivers/gpu/drm/drm_gem.c:1177) drm (P) While here, also add a brief explanation the build ID section. Link: https://lkml.kernel.org/r/20251030010347.2731925-1-cmllamas@google.com Fixes: `2bff77c665` ("scripts/decode_stacktrace.sh: fix decoding of lines with an additional info") Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Cc: Breno Leitao <leitao@debian.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Rutland <mark.rutland@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matthieu Baerts <matttbe@kernel.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Puranjay Mohan <puranjay@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-09 21:19:45 -08:00
Quanmin Yan	9fd7bb5083	mm/damon/sysfs: change next_update_jiffies to a global variable In DAMON's damon_sysfs_repeat_call_fn(), time_before() is used to compare the current jiffies with next_update_jiffies to determine whether to update the sysfs files at this moment. On 32-bit systems, the kernel initializes jiffies to "-5 minutes" to make jiffies wrap bugs appear earlier. However, this causes time_before() in damon_sysfs_repeat_call_fn() to unexpectedly return true during the first 5 minutes after boot on 32-bit systems (see [1] for more explanation, which fixes another jiffies-related issue before). As a result, DAMON does not update sysfs files during that period. There is also an issue unrelated to the system's word size[2]: if the user stops DAMON just after next_update_jiffies is updated and restarts it after 'refresh_ms' or a longer delay, next_update_jiffies will retain an older value, causing time_before() to return false and the update to happen earlier than expected. Fix these issues by making next_update_jiffies a global variable and initializing it each time DAMON is started. Link: https://lkml.kernel.org/r/20251030020746.967174-3-yanquanmin1@huawei.com Link: https://lkml.kernel.org/r/20250822025057.1740854-1-ekffu200098@gmail.com [1] Link: https://lore.kernel.org/all/20251029013038.66625-1-sj@kernel.org/ [2] Fixes: `d809a7c64b` ("mm/damon/sysfs: implement refresh_ms file internal work") Suggested-by: SeongJae Park <sj@kernel.org> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: ze zuo <zuoze1@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-09 21:19:45 -08:00
Quanmin Yan	2f6ce7e714	mm/damon/stat: change last_refresh_jiffies to a global variable Patch series "mm/damon: fixes for the jiffies-related issues", v2. On 32-bit systems, the kernel initializes jiffies to "-5 minutes" to make jiffies wrap bugs appear earlier. However, this may cause the time_before() series of functions to return unexpected values, resulting in DAMON not functioning as intended. Meanwhile, similar issues exist in some specific user operation scenarios. This patchset addresses these issues. The first patch is about the DAMON_STAT module, and the second patch is about the core layer's sysfs. This patch (of 2): In DAMON_STAT's damon_stat_damon_call_fn(), time_before_eq() is used to avoid unnecessarily frequent stat update. On 32-bit systems, the kernel initializes jiffies to "-5 minutes" to make jiffies wrap bugs appear earlier. However, this causes time_before_eq() in DAMON_STAT to unexpectedly return true during the first 5 minutes after boot on 32-bit systems (see [1] for more explanation, which fixes another jiffies-related issue before). As a result, DAMON_STAT does not update any monitoring results during that period, which becomes more confusing when DAMON_STAT_ENABLED_DEFAULT is enabled. There is also an issue unrelated to the system's word size[2]: if the user stops DAMON_STAT just after last_refresh_jiffies is updated and restarts it after 5 seconds or a longer delay, last_refresh_jiffies will retain an older value, causing time_before_eq() to return false and the update to happen earlier than expected. Fix these issues by making last_refresh_jiffies a global variable and initializing it each time DAMON_STAT is started. Link: https://lkml.kernel.org/r/20251030020746.967174-2-yanquanmin1@huawei.com Link: https://lkml.kernel.org/r/20250822025057.1740854-1-ekffu200098@gmail.com [1] Link: https://lore.kernel.org/all/20251028143250.50144-1-sj@kernel.org/ [2] Fixes: `fabdd1e911` ("mm/damon/stat: calculate and expose estimated memory bandwidth") Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com> Suggested-by: SeongJae Park <sj@kernel.org> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: ze zuo <zuoze1@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-09 21:19:45 -08:00
Martin Kaiser	91a5409002	maple_tree: fix tracepoint string pointers maple_tree tracepoints contain pointers to function names. Such a pointer is saved when a tracepoint logs an event. There's no guarantee that it's still valid when the event is parsed later and the pointer is dereferenced. The kernel warns about these unsafe pointers. event 'ma_read' has unsafe pointer field 'fn' WARNING: kernel/trace/trace.c:3779 at ignore_event+0x1da/0x1e4 Mark the function names as tracepoint_string() to fix the events. One case that doesn't work without my patch would be trace-cmd record to save the binary ringbuffer and trace-cmd report to parse it in userspace. The address of __func__ can't be dereferenced from userspace but tracepoint_string will add an entry to /sys/kernel/tracing/printk_formats Link: https://lkml.kernel.org/r/20251030155537.87972-1-martin@kaiser.cx Fixes: `54a611b605` ("Maple Tree: add new data structure") Signed-off-by: Martin Kaiser <martin@kaiser.cx> Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-09 21:19:45 -08:00
Hao Ge	1abbdf3d57	codetag: debug: handle existing CODETAG_EMPTY in mark_objexts_empty for slabobj_ext When alloc_slab_obj_exts() fails and then later succeeds in allocating a slab extension vector, it calls handle_failed_objexts_alloc() to mark all objects in the vector as empty. As a result all objects in this slab (slabA) will have their extensions set to CODETAG_EMPTY. Later on if this slabA is used to allocate a slabobj_ext vector for another slab (slabB), we end up with the slabB->obj_exts pointing to a slabobj_ext vector that itself has a non-NULL slabobj_ext equal to CODETAG_EMPTY. When slabB gets freed, free_slab_obj_exts() is called to free slabB->obj_exts vector. free_slab_obj_exts() calls mark_objexts_empty(slabB->obj_exts) which will generate a warning because it expects slabobj_ext vectors to have a NULL obj_ext, not CODETAG_EMPTY. Modify mark_objexts_empty() to skip the warning and setting the obj_ext value if it's already set to CODETAG_EMPTY. To quickly detect this WARN, I modified the code from WARN_ON(slab_exts[offs].ref.ct) to BUG_ON(slab_exts[offs].ref.ct == 1); We then obtained this message: [21630.898561] ------------[ cut here ]------------ [21630.898596] kernel BUG at mm/slub.c:2050! [21630.898611] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP [21630.900372] Modules linked in: squashfs isofs vfio_iommu_type1 vhost_vsock vfio vhost_net vmw_vsock_virtio_transport_common vhost tap vhost_iotlb iommufd vsock binfmt_misc nfsv3 nfs_acl nfs lockd grace netfs tls rds dns_resolver tun brd overlay ntfs3 exfat btrfs blake2b_generic xor xor_neon raid6_pq loop sctp ip6_udp_tunnel udp_tunnel nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables rfkill ip_set sunrpc vfat fat joydev sg sch_fq_codel nfnetlink virtio_gpu sr_mod cdrom drm_client_lib virtio_dma_buf drm_shmem_helper drm_kms_helper drm ghash_ce backlight virtio_net virtio_blk virtio_scsi net_failover virtio_console failover virtio_mmio dm_mirror dm_region_hash dm_log dm_multipath dm_mod fuse i2c_dev virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev virtio virtio_ring autofs4 aes_neon_bs aes_ce_blk [last unloaded: hwpoison_inject] [21630.909177] CPU: 3 UID: 0 PID: 3787 Comm: kylin-process-m Kdump: loaded Tainted: G W 6.18.0-rc1+ #74 PREEMPT(voluntary) [21630.910495] Tainted: [W]=WARN [21630.910867] Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022 [21630.911625] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [21630.912392] pc : __free_slab+0x228/0x250 [21630.912868] lr : __free_slab+0x18c/0x250[21630.913334] sp : ffff8000a02f73e0 [21630.913830] x29: ffff8000a02f73e0 x28: fffffdffc43fc800 x27: ffff0000c0011c40 [21630.914677] x26: ffff0000c000cac0 x25: ffff00010fe5e5f0 x24: ffff000102199b40 [21630.915469] x23: 0000000000000003 x22: 0000000000000003 x21: ffff0000c0011c40 [21630.916259] x20: fffffdffc4086600 x19: fffffdffc43fc800 x18: 0000000000000000 [21630.917048] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [21630.917837] x14: 0000000000000000 x13: 0000000000000000 x12: ffff70001405ee66 [21630.918640] x11: 1ffff0001405ee65 x10: ffff70001405ee65 x9 : ffff800080a295dc [21630.919442] x8 : ffff8000a02f7330 x7 : 0000000000000000 x6 : 0000000000003000 [21630.920232] x5 : 0000000024924925 x4 : 0000000000000001 x3 : 0000000000000007 [21630.921021] x2 : 0000000000001b40 x1 : 000000000000001f x0 : 0000000000000001 [21630.921810] Call trace: [21630.922130] __free_slab+0x228/0x250 (P) [21630.922669] free_slab+0x38/0x118 [21630.923079] free_to_partial_list+0x1d4/0x340 [21630.923591] __slab_free+0x24c/0x348 [21630.924024] ___cache_free+0xf0/0x110 [21630.924468] qlist_free_all+0x78/0x130 [21630.924922] kasan_quarantine_reduce+0x114/0x148 [21630.925525] __kasan_slab_alloc+0x7c/0xb0 [21630.926006] kmem_cache_alloc_noprof+0x164/0x5c8 [21630.926699] __alloc_object+0x44/0x1f8 [21630.927153] __create_object+0x34/0xc8 [21630.927604] kmemleak_alloc+0xb8/0xd8 [21630.928052] kmem_cache_alloc_noprof+0x368/0x5c8 [21630.928606] getname_flags.part.0+0xa4/0x610 [21630.929112] getname_flags+0x80/0xd8 [21630.929557] vfs_fstatat+0xc8/0xe0 [21630.929975] __do_sys_newfstatat+0xa0/0x100 [21630.930469] __arm64_sys_newfstatat+0x90/0xd8 [21630.931046] invoke_syscall+0xd4/0x258 [21630.931685] el0_svc_common.constprop.0+0xb4/0x240 [21630.932467] do_el0_svc+0x48/0x68 [21630.932972] el0_svc+0x40/0xe0 [21630.933472] el0t_64_sync_handler+0xa0/0xe8 [21630.934151] el0t_64_sync+0x1ac/0x1b0 [21630.934923] Code: aa1803e0 97ffef2b a9446bf9 17ffff9c (d4210000) [21630.936461] SMP: stopping secondary CPUs [21630.939550] Starting crashdump kernel... [21630.940108] Bye! Link: https://lkml.kernel.org/r/20251029014317.1533488-1-hao.ge@linux.dev Fixes: `09c46563ff` ("codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext allocations") Signed-off-by: Hao Ge <gehao@kylinos.cn> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: Christoph Lameter (Ampere) <cl@gentwo.org> Cc: David Rientjes <rientjes@google.com> Cc: gehao <gehao@kylinos.cn> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-09 21:19:44 -08:00
Dev Jain	04d1c9d60c	mm/mremap: honour writable bit in mremap pte batching Currently mremap folio pte batch ignores the writable bit during figuring out a set of similar ptes mapping the same folio. Suppose that the first pte of the batch is writable while the others are not - set_ptes will end up setting the writable bit on the other ptes, which is a violation of mremap semantics. Therefore, use FPB_RESPECT_WRITE to check the writable bit while determining the pte batch. Link: https://lkml.kernel.org/r/20251028063952.90313-1-dev.jain@arm.com Signed-off-by: Dev Jain <dev.jain@arm.com> Fixes: `f822a9a81a` ("mm: optimize mremap() by PTE batching") Reported-by: David Hildenbrand <david@redhat.com> Debugged-by: David Hildenbrand <david@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Pedro Falcato <pfalcato@suse.de> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Barry Song <baohua@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> [6.17+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-09 21:19:44 -08:00
Peter Oberparleiter	ec4d11fc4b	gcov: add support for GCC 15 Using gcov on kernels compiled with GCC 15 results in truncated 16-byte long .gcda files with no usable data. To fix this, update GCOV_COUNTERS to match the value defined by GCC 15. Tested with GCC 14.3.0 and GCC 15.2.0. Link: https://lkml.kernel.org/r/20251028115125.1319410-1-oberpar@linux.ibm.com Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com> Reported-by: Matthieu Baerts <matttbe@kernel.org> Closes: https://github.com/linux-test-project/lcov/issues/445 Tested-by: Matthieu Baerts <matttbe@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-09 21:19:44 -08:00
Isaac J. Manjarres	0d6c356dd6	mm/mm_init: fix hash table order logging in alloc_large_system_hash() When emitting the order of the allocation for a hash table, alloc_large_system_hash() unconditionally subtracts PAGE_SHIFT from log base 2 of the allocation size. This is not correct if the allocation size is smaller than a page, and yields a negative value for the order as seen below: TCP established hash table entries: 32 (order: -4, 256 bytes, linear) TCP bind hash table entries: 32 (order: -2, 1024 bytes, linear) Use get_order() to compute the order when emitting the hash table information to correctly handle cases where the allocation size is smaller than a page: TCP established hash table entries: 32 (order: 0, 256 bytes, linear) TCP bind hash table entries: 32 (order: 0, 1024 bytes, linear) Link: https://lkml.kernel.org/r/20251028191020.413002-1-isaacmanjarres@google.com Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-09 21:19:44 -08:00
Kiryl Shutsemau	fa04f5b60f	mm/truncate: unmap large folio on split failure Accesses within VMA, but beyond i_size rounded up to PAGE_SIZE are supposed to generate SIGBUS. This behavior might not be respected on truncation. During truncation, the kernel splits a large folio in order to reclaim memory. As a side effect, it unmaps the folio and destroys PMD mappings of the folio. The folio will be refaulted as PTEs and SIGBUS semantics are preserved. However, if the split fails, PMD mappings are preserved and the user will not receive SIGBUS on any accesses within the PMD. Unmap the folio on split failure. It will lead to refault as PTEs and preserve SIGBUS semantics. Make an exception for shmem/tmpfs that for long time intentionally mapped with PMDs across i_size. Link: https://lkml.kernel.org/r/20251027115636.82382-3-kirill@shutemov.name Fixes: `b9a8a4195c` ("truncate,shmem: Handle truncates that split large folios") Signed-off-by: Kiryl Shutsemau <kas@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Christian Brauner <brauner@kernel.org> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-09 21:19:43 -08:00
Kiryl Shutsemau	74207de2ba	mm/memory: do not populate page table entries beyond i_size Patch series "Fix SIGBUS semantics with large folios", v3. Accessing memory within a VMA, but beyond i_size rounded up to the next page size, is supposed to generate SIGBUS. Darrick reported[1] an xfstests regression in v6.18-rc1. generic/749 failed due to missing SIGBUS. This was caused by my recent changes that try to fault in the whole folio where possible: `19773df031` ("mm/fault: try to map the entire file folio in finish_fault()") `357b92761d` ("mm/filemap: map entire large folio faultaround") These changes did not consider i_size when setting up PTEs, leading to xfstest breakage. However, the problem has been present in the kernel for a long time - since huge tmpfs was introduced in 2016. The kernel happily maps PMD-sized folios as PMD without checking i_size. And huge=always tmpfs allocates PMD-size folios on any writes. I considered this corner case when I implemented a large tmpfs, and my conclusion was that no one in their right mind should rely on receiving a SIGBUS signal when accessing beyond i_size. I cannot imagine how it could be useful for the workload. But apparently filesystem folks care a lot about preserving strict SIGBUS semantics. Generic/749 was introduced last year with reference to POSIX, but no real workloads were mentioned. It also acknowledged the tmpfs deviation from the test case. POSIX indeed says[3]: References within the address range starting at pa and continuing for len bytes to whole pages following the end of an object shall result in delivery of a SIGBUS signal. The patchset fixes the regression introduced by recent changes as well as more subtle SIGBUS breakage due to split failure on truncation. This patch (of 2): Accesses within VMA, but beyond i_size rounded up to PAGE_SIZE are supposed to generate SIGBUS. Recent changes attempted to fault in full folio where possible. They did not respect i_size, which led to populating PTEs beyond i_size and breaking SIGBUS semantics. Darrick reported generic/749 breakage because of this. However, the problem existed before the recent changes. With huge=always tmpfs, any write to a file leads to PMD-size allocation. Following the fault-in of the folio will install PMD mapping regardless of i_size. Fix filemap_map_pages() and finish_fault() to not install: - PTEs beyond i_size; - PMD mappings across i_size; Make an exception for shmem/tmpfs that for long time intentionally mapped with PMDs across i_size. Link: https://lkml.kernel.org/r/20251027115636.82382-1-kirill@shutemov.name Link: https://lkml.kernel.org/r/20251027115636.82382-2-kirill@shutemov.name Signed-off-by: Kiryl Shutsemau <kas@kernel.org> Fixes: `6795801366` ("xfs: Support large folios") Reported-by: "Darrick J. Wong" <djwong@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-09 21:19:43 -08:00
Wei Yang	895b4c0c79	fs/proc: fix uaf in proc_readdir_de() Pde is erased from subdir rbtree through rb_erase(), but not set the node to EMPTY, which may result in uaf access. We should use RB_CLEAR_NODE() set the erased node to EMPTY, then pde_subdir_next() will return NULL to avoid uaf access. We found an uaf issue while using stress-ng testing, need to run testcase getdent and tun in the same time. The steps of the issue is as follows: 1) use getdent to traverse dir /proc/pid/net/dev_snmp6/, and current pde is tun3; 2) in the [time windows] unregister netdevice tun3 and tun2, and erase them from rbtree. erase tun3 first, and then erase tun2. the pde(tun2) will be released to slab; 3) continue to getdent process, then pde_subdir_next() will return pde(tun2) which is released, it will case uaf access. CPU 0 \| CPU 1 ------------------------------------------------------------------------- traverse dir /proc/pid/net/dev_snmp6/ \| unregister_netdevice(tun->dev) //tun3 tun2 sys_getdents64() \| iterate_dir() \| proc_readdir() \| proc_readdir_de() \| snmp6_unregister_dev() pde_get(de); \| proc_remove() read_unlock(&proc_subdir_lock); \| remove_proc_subtree() \| write_lock(&proc_subdir_lock); [time window] \| rb_erase(&root->subdir_node, &parent->subdir); \| write_unlock(&proc_subdir_lock); read_lock(&proc_subdir_lock); \| next = pde_subdir_next(de); \| pde_put(de); \| de = next; //UAF \| rbtree of dev_snmp6 \| pde(tun3) / \ NULL pde(tun2) Link: https://lkml.kernel.org/r/20251025024233.158363-1-albin_yang@163.com Signed-off-by: Wei Yang <albinwyang@tencent.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: wangzijie <wangzijie1@honor.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-09 21:19:43 -08:00
Zi Yan	fa5a061700	mm/huge_memory: preserve PG_has_hwpoisoned if a folio is split to >0 order folio split clears PG_has_hwpoisoned, but the flag should be preserved in after-split folios containing pages with PG_hwpoisoned flag if the folio is split to >0 order folios. Scan all pages in a to-be-split folio to determine which after-split folios need the flag. An alternatives is to change PG_has_hwpoisoned to PG_maybe_hwpoisoned to avoid the scan and set it on all after-split folios, but resulting false positive has undesirable negative impact. To remove false positive, caller of folio_test_has_hwpoisoned() and folio_contain_hwpoisoned_page() needs to do the scan. That might be causing a hassle for current and future callers and more costly than doing the scan in the split code. More details are discussed in [1]. This issue can be exposed via: 1. splitting a has_hwpoisoned folio to >0 order from debugfs interface; 2. truncating part of a has_hwpoisoned folio in truncate_inode_partial_folio(). And later accesses to a hwpoisoned page could be possible due to the missing has_hwpoisoned folio flag. This will lead to MCE errors. Link: https://lore.kernel.org/all/CAHbLzkoOZm0PXxE9qwtF4gKR=cpRXrSrJ9V9Pm2DJexs985q4g@mail.gmail.com/ [1] Link: https://lkml.kernel.org/r/20251023030521.473097-1-ziy@nvidia.com Fixes: `c010d47f10` ("mm: thp: split huge page to any lower order pages") Signed-off-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Yang Shi <yang@os.amperecomputing.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Lance Yang <lance.yang@linux.dev> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Cc: Pankaj Raghav <kernel@pankajraghav.com> Cc: Barry Song <baohua@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Luis Chamberalin <mcgrof@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-09 21:19:43 -08:00
Pedro Demarchi Gomes	f5548c318d	ksm: use range-walk function to jump over holes in scan_get_next_rmap_item Currently, scan_get_next_rmap_item() walks every page address in a VMA to locate mergeable pages. This becomes highly inefficient when scanning large virtual memory areas that contain mostly unmapped regions, causing ksmd to use large amount of cpu without deduplicating much pages. This patch replaces the per-address lookup with a range walk using walk_page_range(). The range walker allows KSM to skip over entire unmapped holes in a VMA, avoiding unnecessary lookups. This problem was previously discussed in [1]. Consider the following test program which creates a 32 TiB mapping in the virtual address space but only populates a single page: #include <unistd.h> #include <stdio.h> #include <sys/mman.h> /* 32 TiB / const size_t size = 32ul 1024 * 1024 * 1024 * 1024; int main() { char area = mmap(NULL, size, PROT_READ \| PROT_WRITE, MAP_NORESERVE \| MAP_PRIVATE \| MAP_ANON, -1, 0); if (area == MAP_FAILED) { perror("mmap() failed\n"); return -1; } / Populate a single page such that we get an anon_vma. / area = 0; /* Enable KSM. */ madvise(area, size, MADV_MERGEABLE); pause(); return 0; } $ ./ksm-sparse & $ echo 1 > /sys/kernel/mm/ksm/run Without this patch ksmd uses 100% of the cpu for a long time (more then 1 hour in my test machine) scanning all the 32 TiB virtual address space that contain only one mapped page. This makes ksmd essentially deadlocked not able to deduplicate anything of value. With this patch ksmd walks only the one mapped page and skips the rest of the 32 TiB virtual address space, making the scan fast using little cpu. Link: https://lkml.kernel.org/r/20251023035841.41406-1-pedrodemargomes@gmail.com Link: https://lkml.kernel.org/r/20251022153059.22763-1-pedrodemargomes@gmail.com Link: https://lore.kernel.org/linux-mm/423de7a3-1c62-4e72-8e79-19a6413e420c@redhat.com/ [1] Fixes: `31dbd01f31` ("ksm: Kernel SamePage Merging") Signed-off-by: Pedro Demarchi Gomes <pedrodemargomes@gmail.com> Co-developed-by: David Hildenbrand <david@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: craftfever <craftfever@airmail.cc> Closes: https://lkml.kernel.org/r/020cf8de6e773bb78ba7614ef250129f11a63781@murena.io Suggested-by: David Hildenbrand <david@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: xu xin <xu.xin16@zte.com.cn> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-09 21:19:42 -08:00
Aleksei Nikiforov	7e76b75e5a	mm/kmsan: fix kmsan kmalloc hook when no stack depots are allocated yet If no stack depot is allocated yet, due to masking out __GFP_RECLAIM flags kmsan called from kmalloc cannot allocate stack depot. kmsan fails to record origin and report issues. This may result in KMSAN failing to report issues. Reusing flags from kmalloc without modifying them should be safe for kmsan. For example, such chain of calls is possible: test_uninit_kmalloc -> kmalloc -> __kmalloc_cache_noprof -> slab_alloc_node -> slab_post_alloc_hook -> kmsan_slab_alloc -> kmsan_internal_poison_memory. Only when it is called in a context without flags present should __GFP_RECLAIM flags be masked. With this change all kmsan tests start working reliably. Eric reported: : Yes, KMSAN seems to be at least partially broken currently. Besides the : fact that the kmsan KUnit test is currently failing (which I reported at : https://lore.kernel.org/r/20250911175145.GA1376@sol), I've confirmed that : the poly1305 KUnit test causes a KMSAN warning with Aleksei's patch : applied but does not cause a warning without it. The warning did get : reached via syzbot somehow : (https://lore.kernel.org/r/751b3d80293a6f599bb07770afcef24f623c7da0.1761026343.git.xiaopei01@kylinos.cn/), : so KMSAN must still work in some cases. But it didn't work for me. Link: https://lkml.kernel.org/r/20250930115600.709776-2-aleksei.nikiforov@linux.ibm.com Link: https://lkml.kernel.org/r/20251022030213.GA35717@sol Fixes: `97769a53f1` ("mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation") Signed-off-by: Aleksei Nikiforov <aleksei.nikiforov@linux.ibm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Tested-by: Eric Biggers <ebiggers@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Marco Elver <elver@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-09 21:19:42 -08:00
Kairui Song	fc745ff317	mm/shmem: fix THP allocation and fallback loop The order check and fallback loop is updating the index value on every loop. This will cause the index to be wrongly aligned by a larger value while the loop shrinks the order. This may result in inserting and returning a folio of the wrong index and cause data corruption with some userspace workloads [1]. [kasong@tencent.com: introduce a temporary variable to improve code] Link: https://lkml.kernel.org/r/20251023065913.36925-1-ryncsn@gmail.com Link: https://lore.kernel.org/linux-mm/CAMgjq7DqgAmj25nDUwwu1U2cSGSn8n4-Hqpgottedy0S6YYeUw@mail.gmail.com/ [1] Link: https://lkml.kernel.org/r/20251022105719.18321-1-ryncsn@gmail.com Link: https://lore.kernel.org/linux-mm/CAMgjq7DqgAmj25nDUwwu1U2cSGSn8n4-Hqpgottedy0S6YYeUw@mail.gmail.com/ [1] Fixes: `e7a2ab7b3b` ("mm: shmem: add mTHP support for anonymous shmem") Closes: https://lore.kernel.org/linux-mm/CAMgjq7DqgAmj25nDUwwu1U2cSGSn8n4-Hqpgottedy0S6YYeUw@mail.gmail.com/ Signed-off-by: Kairui Song <kasong@tencent.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Barry Song <baohua@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-09 21:19:42 -08:00
Pasha Tatashin	fa759cd75b	kho: allocate metadata directly from the buddy allocator KHO allocates metadata for its preserved memory map using the slab allocator via kzalloc(). This metadata is temporary and is used by the next kernel during early boot to find preserved memory. A problem arises when KFENCE is enabled. kzalloc() calls can be randomly intercepted by kfence_alloc(), which services the allocation from a dedicated KFENCE memory pool. This pool is allocated early in boot via memblock. When booting via KHO, the memblock allocator is restricted to a "scratch area", forcing the KFENCE pool to be allocated within it. This creates a conflict, as the scratch area is expected to be ephemeral and overwriteable by a subsequent kexec. If KHO metadata is placed in this KFENCE pool, it leads to memory corruption when the next kernel is loaded. To fix this, modify KHO to allocate its metadata directly from the buddy allocator instead of slab. Link: https://lkml.kernel.org/r/20251021000852.2924827-4-pasha.tatashin@soleen.com Fixes: `fc33e4b44b` ("kexec: enable KHO support for memory preservation") Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: David Matlack <dmatlack@google.com> Cc: Alexander Graf <graf@amazon.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Samiullah Khawaja <skhawaja@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-09 21:19:42 -08:00
Pasha Tatashin	a2fff99f92	kho: increase metadata bitmap size to PAGE_SIZE KHO memory preservation metadata is preserved in 512 byte chunks which requires their allocation from slab allocator. Slabs are not safe to be used with KHO because of kfence, and because partial slabs may lead leaks to the next kernel. Change the size to be PAGE_SIZE. The kfence specifically may cause memory corruption, where it randomly provides slab objects that can be within the scratch area. The reason for that is that kfence allocates its objects prior to KHO scratch is marked as CMA region. While this change could potentially increase metadata overhead on systems with sparsely preserved memory, this is being mitigated by ongoing work to reduce sparseness during preservation via 1G guest pages. Furthermore, this change aligns with future work on a stateless KHO, which will also use page-sized bitmaps for its radix tree metadata. Link: https://lkml.kernel.org/r/20251021000852.2924827-3-pasha.tatashin@soleen.com Fixes: `fc33e4b44b` ("kexec: enable KHO support for memory preservation") Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Matlack <dmatlack@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Samiullah Khawaja <skhawaja@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-09 21:19:41 -08:00
Pasha Tatashin	e38f65d317	kho: warn and fail on metadata or preserved memory in scratch area Patch series "KHO: kfence + KHO memory corruption fix", v3. This series fixes a memory corruption bug in KHO that occurs when KFENCE is enabled. The root cause is that KHO metadata, allocated via kzalloc(), can be randomly serviced by kfence_alloc(). When a kernel boots via KHO, the early memblock allocator is restricted to a "scratch area". This forces the KFENCE pool to be allocated within this scratch area, creating a conflict. If KHO metadata is subsequently placed in this pool, it gets corrupted during the next kexec operation. Google is using KHO and have had obscure crashes due to this memory corruption, with stacks all over the place. I would prefer this fix to be properly backported to stable so we can also automatically consume it once we switch to the upstream KHO. Patch 1/3 introduces a debug-only feature (CONFIG_KEXEC_HANDOVER_DEBUG) that adds checks to detect and fail any operation that attempts to place KHO metadata or preserved memory within the scratch area. This serves as a validation and diagnostic tool to confirm the problem without affecting production builds. Patch 2/3 Increases bitmap to PAGE_SIZE, so buddy allocator can be used. Patch 3/3 Provides the fix by modifying KHO to allocate its metadata directly from the buddy allocator instead of slab. This bypasses the KFENCE interception entirely. This patch (of 3): It is invalid for KHO metadata or preserved memory regions to be located within the KHO scratch area, as this area is overwritten when the next kernel is loaded, and used early in boot by the next kernel. This can lead to memory corruption. Add checks to kho_preserve_* and KHO's internal metadata allocators (xa_load_or_alloc, new_chunk) to verify that the physical address of the memory does not overlap with any defined scratch region. If an overlap is detected, the operation will fail and a WARN_ON is triggered. To avoid performance overhead in production kernels, these checks are enabled only when CONFIG_KEXEC_HANDOVER_DEBUG is selected. [rppt@kernel.org: fix KEXEC_HANDOVER_DEBUG Kconfig dependency] Link: https://lkml.kernel.org/r/aQHUyyFtiNZhx8jo@kernel.org [pasha.tatashin@soleen.com: build fix] Link: https://lkml.kernel.org/r/CA+CK2bBnorfsTymKtv4rKvqGBHs=y=MjEMMRg_tE-RME6n-zUw@mail.gmail.com Link: https://lkml.kernel.org/r/20251021000852.2924827-1-pasha.tatashin@soleen.com Link: https://lkml.kernel.org/r/20251021000852.2924827-2-pasha.tatashin@soleen.com Fixes: `fc33e4b44b` ("kexec: enable KHO support for memory preservation") Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Mike Rapoport <rppt@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Matlack <dmatlack@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Samiullah Khawaja <skhawaja@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-09 21:19:41 -08:00
Zi Yan	77008e1b2e	mm/huge_memory: do not change split_huge_page() target order silently Page cache folios from a file system that support large block size (LBS) can have minimal folio order greater than 0, thus a high order folio might not be able to be split down to order-0. Commit `e220917fa5` ("mm: split a folio in minimum folio order chunks") bumps the target order of split_huge_page() to the minimum allowed order when splitting a LBS folio. This causes confusion for some split_huge_page*() callers like memory failure handling code, since they expect after-split folios all have order-0 when split succeeds but in reality get min_order_for_split() order folios and give warnings. Fix it by failing a split if the folio cannot be split to the target order. Rename try_folio_split() to try_folio_split_to_order() to reflect the added new_order parameter. Remove its unused list parameter. [The test poisons LBS folios, which cannot be split to order-0 folios, and also tries to poison all memory. The non split LBS folios take more memory than the test anticipated, leading to OOM. The patch fixed the kernel warning and the test needs some change to avoid OOM.] Link: https://lkml.kernel.org/r/20251017013630.139907-1-ziy@nvidia.com Fixes: `e220917fa5` ("mm: split a folio in minimum folio order chunks") Signed-off-by: Zi Yan <ziy@nvidia.com> Reported-by: syzbot+e6367ea2fdab6ed46056@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/68d2c943.a70a0220.1b52b.02b3.GAE@google.com/ Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-11-09 21:19:41 -08:00
Linus Torvalds	e9a6fb0bcd	Linux 6.18-rc5	2025-11-09 15:10:19 -08:00
Linus Torvalds	f850568efe	Merge tag 'i2c-for-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fix from Wolfram Sang: "Two reverts merged into one commit to handle a regression caused by a wrong cleanup because the underlying implications were unclear" * tag 'i2c-for-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: muxes: pca954x: Fix broken reset-gpio usage	2025-11-09 09:29:44 -08:00
Linus Torvalds	3461e958c1	Merge tag 'kbuild-fixes-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux Pull Kbuild fixes from Nathan Chancellor: - Strip trailing padding bytes from modules.builtin.modinfo to fix error during modules_install with certain versions of kmod - Drop unused static inline function warning in .c files with clang from W=1 to W=2 - Ensure kernel-doc.py invocations use the PYTHON3 make variable to ensure user's choice of Python interpreter is always respected * tag 'kbuild-fixes-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux: kbuild: Let kernel-doc.py use PYTHON3 override compiler_types: Move unused static inline functions warning to W=2 kbuild: Strip trailing padding bytes from modules.builtin.modinfo	2025-11-09 09:22:08 -08:00
Jean Delvare	002621a4df	kbuild: Let kernel-doc.py use PYTHON3 override It is possible to force a specific version of python to be used when building the kernel by passing PYTHON3= on the make command line. However kernel-doc.py is currently called with python3 hard-coded and thus ignores this setting. Use $(PYTHON3) to run $(KERNELDOC) so that the desired version of python is used. Signed-off-by: Jean Delvare <jdelvare@suse.de> Reviewed-by: Nicolas Schier <nsc@kernel.org> Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://patch.msgid.link/20251107192933.2bfe9e57@endymion Signed-off-by: Nathan Chancellor <nathan@kernel.org>	2025-11-08 19:42:22 -07:00
Linus Torvalds	439fc29dfd	Merge tag 'drm-fixes-2025-11-09' of https://gitlab.freedesktop.org/drm/kernel Pull drm fix from Dave Airlie: "Brown paper bag, the dma mask fix which I applied and actually looked through for bad things, actually broke newer GPUs, there might be some latent part in the boot path that is assuming 32-bit still, but we will figure that out elsewhere. nouveau: - revert DMA mask change" * tag 'drm-fixes-2025-11-09' of https://gitlab.freedesktop.org/drm/kernel: Revert "drm/nouveau: set DMA mask before creating the flush page"	2025-11-08 15:37:03 -08:00
Linus Torvalds	41d318c47f	Merge tag 'rtc-6.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux Pull RTC fixes from Alexandre Belloni: "The two reverts are for patches that I shouldn't have applied. The rx8025 patch fixes an issue present since 2022: - cpcap, tps6586x: revert incorrect irq enable/disable balance fix - rx8025: fix incorrect register reference" * tag 'rtc-6.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: rtc: rx8025: fix incorrect register reference Revert "rtc: cpcap: Fix initial enable_irq/disable_irq balance" Revert "rtc: tps6586x: Fix initial enable_irq/disable_irq balance"	2025-11-08 15:34:23 -08:00
Yuta Hayama	162f24cbb0	rtc: rx8025: fix incorrect register reference This code is intended to operate on the CTRL1 register, but ctrl[1] is actually CTRL2. Correctly, ctrl[0] is CTRL1. Signed-off-by: Yuta Hayama <hayama@lineo.co.jp> Fixes: `71af915650` ("rtc: rx8025: fix 12/24 hour mode detection on RX-8035") Cc: stable@vger.kernel.org Link: https://patch.msgid.link/eae5f479-5d28-4a37-859d-d54794e7628c@lineo.co.jp Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>	2025-11-08 20:56:12 +01:00
Linus Torvalds	7bb4d65125	Merge tag 'v6.18rc4-SMB-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull smb client fixes from Steve French: - Fix change notify packet validation check - Refcount fix (e.g. rename error paths) - Fix potential UAF due to missing locks on directory lease refcount * tag 'v6.18rc4-SMB-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: client: validate change notify buffer before copy smb: client: fix refcount leak in smb2_set_path_attr smb: client: fix potential UAF in smb2_close_cached_fid()	2025-11-08 10:17:30 -08:00
Linus Torvalds	0d7bee10be	Merge tag 'x86-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: - Fix AMD PCI root device caching regression that triggers on certain firmware variants - Fix the zen5_rdseed_microcode[] array to be NULL-terminated - Add more AMD models to microcode signature checking * tag 'x86-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode/AMD: Add more known models to entry sign checking x86/CPU/AMD: Add missing terminator for zen5_rdseed_microcode x86/amd_node: Fix AMD root device caching	2025-11-08 09:01:11 -08:00
Linus Torvalds	b5c0946029	Merge tag 'sched-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: "Fix a group-throttling bug in the fair scheduler" * tag 'sched-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/fair: Prevent cfs_rq from being unthrottled with zero runtime_remaining	2025-11-08 08:59:05 -08:00
Linus Torvalds	133262cae9	Merge tag 'perf-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf event fix from Ingo Molnar: "Fix a system hang caused by cpu-clock events deadlock" * tag 'perf-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix system hang caused by cpu-clock usage	2025-11-08 08:54:13 -08:00
Linus Torvalds	e6f55fe790	Merge tag 'locking-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Ingo Molnar: "Fix (well, cut in half) a futex performance regression on PowerPC" * tag 'locking-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Optimize per-cpu reference counting	2025-11-08 08:51:22 -08:00
Linus Torvalds	3636cfa745	Merge tag 'io_uring-6.18-20251107' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull io_uring fix from Jens Axboe: "Single fix in there, fixing an overflow in calculating the needed segments for converting into a bvec array" * tag 'io_uring-6.18-20251107' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: io_uring: fix regbuf vector size truncation	2025-11-08 08:47:31 -08:00
Linus Torvalds	e284d5118a	Merge tag 'xfs-fixes-6.18-rc5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fixes from Carlos Maiolino: "This contain fixes for the RT and zoned allocator, and a few fixes for atomic writes" * tag 'xfs-fixes-6.18-rc5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: free xfs_busy_extents structure when no RT extents are queued xfs: fix zone selection in xfs_select_open_zone_mru xfs: fix a rtgroup leak when xfs_init_zone fails xfs: fix various problems in xfs_atomic_write_cow_iomap_begin xfs: fix delalloc write failures in software-provided atomic writes	2025-11-08 08:43:01 -08:00
Dave Airlie	4113361590	Revert "drm/nouveau: set DMA mask before creating the flush page" This reverts commit `ebe7556050`. Tested the latest kernel on my GB203 and this seems to break it somehow. Nov 09 04:16:14 bighp kernel: nouveau 0000:02:00.0: gsp: GSP-FMC boot failed (mbox: 0x0000000b) Nov 09 04:16:14 bighp kernel: nouveau 0000:02:00.0: gsp: init failed, -5 Nov 09 04:16:14 bighp kernel: nouveau 0000:02:00.0: init failed with -5 Nov 09 04:16:14 bighp kernel: nouveau: drm:00000000:00000080: init failed with -5 Nov 09 04:16:14 bighp kernel: nouveau 0000:02:00.0: drm: Device allocation failed: -5 Nov 09 04:16:14 bighp kernel: nouveau 0000:02:00.0: probe with driver nouveau failed with error -5 Not sure why, I went over the patch and thought it should have worked, but there must be some 32-bit problem maybe in the FMC boot path. Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-11-08 19:41:09 +10:00
Pavel Begunkov	146eb58629	io_uring: fix regbuf vector size truncation There is a report of io_estimate_bvec_size() truncating the calculated number of segments that leads to corruption issues. Check it doesn't overflow "int"s used later. Rough but simple, can be improved on top. Cc: stable@vger.kernel.org Fixes: `9ef4cbbcb4` ("io_uring: add infra for importing vectored reg buffers") Reported-by: Google Big Sleep <big-sleep-vuln-reports+bigsleep-458654612@google.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Günther Noack <gnoack@google.com> Tested-by: Günther Noack <gnoack@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-11-07 17:17:13 -07:00
Linus Torvalds	e811c33b1f	Merge tag 'drm-fixes-2025-11-08' of https://gitlab.freedesktop.org/drm/kernel Pull drm fixes from Dave Airlie: "Back from travel, thanks to Simona for handling things. regular fixes, seems about the right size, but spread out a bit. amdgpu has the usual range of fixes, xe has a few fixes, and nouveau has a couple of fixes, one for blackwell modifiers on 8/16 bit surfaces. Otherwise a few small fixes for mediatek, sched, imagination and pixpaper. sched: - Fix deadlock amdgpu: - Reset fixes - Misc fixes - Panel scaling fixes - HDMI fix - S0ix fixes - Hibernation fix - Secure display fix - Suspend fix - MST fix amdkfd: - Process cleanup fix xe: - Fix missing synchronization on unbind - Fix device shutdown when doing FLR - Fix user fence signaling order i915: - Avoid lock inversion when pinning to GGTT on CHV/BXT+VTD - Fix conversion between clock ticks and nanoseconds mediatek: - Disable AFBC support on Mediatek DRM driver - Add pm_runtime support for GCE power control imagination: - kconfig: Fix dependencies nouveau: - Set DMA mask earlier - Advertize correct modifiers for GB20x pixpaper: - kconfig: Fix dependencies" * tag 'drm-fixes-2025-11-08' of https://gitlab.freedesktop.org/drm/kernel: (26 commits) drm/xe: Enforce correct user fence signaling order using drm/xe: Do clean shutdown also when using flr drm/xe: Move declarations under conditional branch drm/xe/guc: Synchronize Dead CT worker with unbind drm/amd/display: Enable mst when it's detected but yet to be initialized drm/amdgpu: Fix wait after reset sequence in S3 drm/amd: Fix suspend failure with secure display TA drm/amdgpu: fix gpu page fault after hibernation on PF passthrough drm/tiny: pixpaper: add explicit dependency on MMU drm/nouveau: Advertise correct modifiers on GB20x drm: define NVIDIA DRM format modifiers for GB20x drm/nouveau: set DMA mask before creating the flush page drm/sched: Fix deadlock in drm_sched_entity_kill_jobs_cb drm/amd/display: Fix NULL deref in debugfs odm_combine_segments drm/amdkfd: Don't clear PT after process killed drm/amdgpu/smu: Handle S0ix for vangogh drm/amdgpu: Drop PMFW RLC notifier from amdgpu_device_suspend() drm/amd/display: Fix black screen with HDMI outputs drm/amd/display: Don't stretch non-native images by default in eDP drm/amd/pm: fix missing device_attr cleanup in amdgpu_pm_sysfs_init() ...	2025-11-07 14:51:11 -08:00
Dave Airlie	d439acbbfb	Merge tag 'drm-xe-fixes-2025-11-07' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes Driver Changes: - Fix missing synchronization on unbind (Balasubramani Vivekanandan) - Fix device shutdown when doing FLR (Jouni Högander) - Fix user fence signaling order (Matthew Brost) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patch.msgid.link/mvfyflloncy76a7nmkatpj6f2afddavwsibz3y4u4wo6gznro5@rdulkuh5wvje	2025-11-08 07:39:54 +10:00
Linus Torvalds	38a2c275c3	Merge tag 'parisc-for-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fix from Helge Deller: - fix crash triggered by unaligned access in parisc unwinder * tag 'parisc-for-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Avoid crash due to unaligned access in unwinder	2025-11-07 13:19:18 -08:00
Linus Torvalds	a2e33fb926	Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd fixes from Jason Gunthorpe: - Syzkaller found a case where maths overflows can cause divide by 0 - Typo in a compiler bug warning fix in the selftests broke the selftests - type1 compatability had a mismatch when unmapping an already unmapped range, it should succeed * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: iommufd: Make vfio_compat's unmap succeed if the range is already empty iommufd/selftest: Fix ioctl return value in _test_cmd_trigger_vevents() iommufd: Don't overflow during division for dirty tracking	2025-11-07 13:13:09 -08:00
Peter Zijlstra	9818af18db	compiler_types: Move unused static inline functions warning to W=2 Per Nathan, clang catches unused "static inline" functions in C files since commit `6863f5643d` ("kbuild: allow Clang to find unused static inline functions for W=1 build"). Linus said: > So I entirely ignore W=1 issues, because I think so many of the extra > warnings are bogus. > > But if this one in particular is causing more problems than most - > some teams do seem to use W=1 as part of their test builds - it's fine > to send me a patch that just moves bad warnings to W=2. > > And if anybody uses W=2 for their test builds, that's THEIR problem.. Here is the change to bump the warning from W=1 to W=2. Fixes: `6863f5643d` ("kbuild: allow Clang to find unused static inline functions for W=1 build") Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20251106105000.2103276-1-andriy.shevchenko@linux.intel.com [nathan: Adjust comment as well] Signed-off-by: Nathan Chancellor <nathan@kernel.org>	2025-11-07 11:19:53 -07:00
Joshua Rogers	4012abe8a7	smb: client: validate change notify buffer before copy SMB2_change_notify called smb2_validate_iov() but ignored the return code, then kmemdup()ed using server provided OutputBufferOffset/Length. Check the return of smb2_validate_iov() and bail out on error. Discovered with help from the ZeroPath security tooling. Signed-off-by: Joshua Rogers <linux@joshua.hu> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Cc: stable@vger.kernel.org Fixes: `e3e9463414` ("smb3: improve SMB3 change notification support") Signed-off-by: Steve French <stfrench@microsoft.com>	2025-11-07 10:15:43 -06:00
Linus Torvalds	da32d155f4	Merge tag 'gpio-fixes-for-v6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - use the firmware node of the GPIO chip, not its label for software node lookup - fix invalid pointer access in GPIO debugfs - drop unused functions from gpio-tb10x - fix a regression in gpio-aggregator: restore the set_config() callback in the driver - correct schema $id path in ti,twl4030 DT bindings * tag 'gpio-fixes-for-v6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpio: tb10x: Drop unused tb10x_set_bits() function gpio: aggregator: restore the set_config operation gpiolib: fix invalid pointer access in debugfs gpio: swnode: don't use the swnode's name as the key for GPIO lookup dt-bindings: gpio: ti,twl4030: Correct the schema $id path	2025-11-07 08:10:55 -08:00
Linus Torvalds	5b95a50001	Merge tag 'trace-v6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Check for reader catching up in ring_buffer_map_get_reader() If the reader catches up to the writer in the memory mapped ring buffer then calling rb_get_reader_page() will return NULL as there's no pages left. But this isn't checked for before calling rb_get_reader_page() and the return of NULL causes a warning. If it is detected that the reader caught up to the writer, then simply exit the routine - Fix memory leak in histogram create_field_var() The couple of the error paths in create_field_var() did not properly clean up what was allocated. Make sure everything is freed properly on error - Fix help message of tools latency_collector The help message incorrectly stated that "-t" was the same as "--threads" whereas "--threads" is actually represented by "-e" * tag 'trace-v6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/tools: Fix incorrcet short option in usage text for --threads tracing: Fix memory leaks in create_field_var() ring-buffer: Do not warn in ring_buffer_map_get_reader() when reader catches up	2025-11-07 08:07:11 -08:00
Linus Torvalds	a80abfbb10	Merge tag 'slab-for-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab fix from Vlastimil Babka: - Fix for potential infinite loop in kmalloc_nolock() when debugging is enabled for the cache (Vlastimil Babka) * tag 'slab-for-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: slab: prevent infinite loop in kmalloc_nolock() with debugging	2025-11-07 08:01:58 -08:00
Linus Torvalds	9dc520632a	Merge tag 'io_uring-6.18-20251106' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull io_uring fixes from Jens Axboe: - Remove the sync refill API that was added in this release, in anticipation of doing it in a better way for the next release - Fix type extension for calculating size off nr_pages, like we do in other spots * tag 'io_uring-6.18-20251106' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: io_uring: fix types for region size calulation io_uring/zcrx: remove sync refill uapi	2025-11-07 07:52:45 -08:00
Linus Torvalds	11a6afabb4	Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "All fixes in the UFS driver. The big contributor to the diffstats is the Intel controller S0ix/S3 fix which has to special case the suspend/resume patch for intel controllers in ufshcd-pci.c" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ufs: core: Fix invalid probe error return value scsi: ufs: ufs-pci: Set UFSHCD_QUIRK_PERFORM_LINK_STARTUP_ONCE for Intel ADL scsi: ufs: core: Add a quirk to suppress link_startup_again scsi: ufs: ufs-pci: Fix S0ix/S3 for Intel controllers scsi: ufs: core: Revert "Make HID attributes visible" scsi: ufs: core: Reduce link startup failure logging scsi: ufs: core: Fix a race condition related to the "hid" attribute group scsi: ufs: ufs-qcom: Fix UFS OCP issue during UFS power down (PC=3)	2025-11-07 07:47:08 -08:00

1 2 3 4 5 ...

1397906 Commits