Go to file
Vlastimil Babka fd6db58867 slab: fix barn NULL pointer dereference on memoryless nodes
Phil reported a boot failure once sheaves become used in commits
59faa4da7c ("maple_tree: use percpu sheaves for maple_node_cache") and
3accabda4d ("mm, vma: use percpu sheaves for vm_area_struct cache"):

 BUG: kernel NULL pointer dereference, address: 0000000000000040
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: Oops: 0000 [#1] SMP NOPTI
 CPU: 21 UID: 0 PID: 818 Comm: kworker/u398:0 Not tainted 6.17.0-rc3.slab+ #5 PREEMPT(voluntary)
 Hardware name: Dell Inc. PowerEdge R7425/02MJ3T, BIOS 1.26.0 07/30/2025
 RIP: 0010:__pcs_replace_empty_main+0x44/0x1d0
 Code: ec 08 48 8b 46 10 48 8b 76 08 48 85 c0 74 0b 8b 48 18 85 c9 0f 85 e5 00 00 00 65 48 63 05 e4 ee 50 02 49 8b 84 c6 e0 00 00 00 <4c> 8b 68 40 4c 89 ef e8 b0 81 ff ff 48 89 c5 48 85 c0 74 1d 48 89
 RSP: 0018:ffffd2d10950bdb0 EFLAGS: 00010246
 RAX: 0000000000000000 RBX: ffff8a775dab74b0 RCX: 00000000ffffffff
 RDX: 0000000000000cc0 RSI: ffff8a6800804000 RDI: ffff8a680004e300
 RBP: ffffd2d10950be40 R08: 0000000000000060 R09: ffffffffb9367388
 R10: 00000000000149e8 R11: ffff8a6f87a38000 R12: 0000000000000cc0
 R13: 0000000000000cc0 R14: ffff8a680004e300 R15: 00000000000000c0
 FS:  0000000000000000(0000) GS:ffff8a77a3541000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000040 CR3: 0000000e1aa24000 CR4: 00000000003506f0
 Call Trace:
  <TASK>
  ? srso_return_thunk+0x5/0x5f
  ? vm_area_alloc+0x1e/0x60
  kmem_cache_alloc_noprof+0x4ec/0x5b0
  vm_area_alloc+0x1e/0x60
  create_init_stack_vma+0x26/0x210
  alloc_bprm+0x139/0x200
  kernel_execve+0x4a/0x140
  call_usermodehelper_exec_async+0xd0/0x190
  ? __pfx_call_usermodehelper_exec_async+0x10/0x10
  ret_from_fork+0xf0/0x110
  ? __pfx_call_usermodehelper_exec_async+0x10/0x10
  ret_from_fork_asm+0x1a/0x30
  </TASK>
 Modules linked in:
 CR2: 0000000000000040
 ---[ end trace 0000000000000000 ]---
 RIP: 0010:__pcs_replace_empty_main+0x44/0x1d0
 Code: ec 08 48 8b 46 10 48 8b 76 08 48 85 c0 74 0b 8b 48 18 85 c9 0f 85 e5 00 00 00 65 48 63 05 e4 ee 50 02 49 8b 84 c6 e0 00 00 00 <4c> 8b 68 40 4c 89 ef e8 b0 81 ff ff 48 89 c5 48 85 c0 74 1d 48 89
 RSP: 0018:ffffd2d10950bdb0 EFLAGS: 00010246
 RAX: 0000000000000000 RBX: ffff8a775dab74b0 RCX: 00000000ffffffff
 RDX: 0000000000000cc0 RSI: ffff8a6800804000 RDI: ffff8a680004e300
 RBP: ffffd2d10950be40 R08: 0000000000000060 R09: ffffffffb9367388
 R10: 00000000000149e8 R11: ffff8a6f87a38000 R12: 0000000000000cc0
 R13: 0000000000000cc0 R14: ffff8a680004e300 R15: 00000000000000c0
 FS:  0000000000000000(0000) GS:ffff8a77a3541000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000040 CR3: 0000000e1aa24000 CR4: 00000000003506f0
 Kernel panic - not syncing: Fatal exception
 Kernel Offset: 0x36a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
 ---[ end Kernel panic - not syncing: Fatal exception ]---

And noted "this is an AMD EPYC 7401 with 8 NUMA nodes configured such
that memory is only on 2 of them."

 # numactl --hardware
 available: 8 nodes (0-7)
 node 0 cpus: 0 8 16 24 32 40 48 56 64 72 80 88
 node 0 size: 0 MB
 node 0 free: 0 MB
 node 1 cpus: 2 10 18 26 34 42 50 58 66 74 82 90
 node 1 size: 31584 MB
 node 1 free: 30397 MB
 node 2 cpus: 4 12 20 28 36 44 52 60 68 76 84 92
 node 2 size: 0 MB
 node 2 free: 0 MB
 node 3 cpus: 6 14 22 30 38 46 54 62 70 78 86 94
 node 3 size: 0 MB
 node 3 free: 0 MB
 node 4 cpus: 1 9 17 25 33 41 49 57 65 73 81 89
 node 4 size: 0 MB
 node 4 free: 0 MB
 node 5 cpus: 3 11 19 27 35 43 51 59 67 75 83 91
 node 5 size: 32214 MB
 node 5 free: 31625 MB
 node 6 cpus: 5 13 21 29 37 45 53 61 69 77 85 93
 node 6 size: 0 MB
 node 6 free: 0 MB
 node 7 cpus: 7 15 23 31 39 47 55 63 71 79 87 95
 node 7 size: 0 MB
 node 7 free: 0 MB

Linus decoded the stacktrace to get_barn() and get_node() and determined
that kmem_cache->node[numa_mem_id()] is NULL.

The problem is due to a wrong assumption that memoryless nodes only
exist on systems with CONFIG_HAVE_MEMORYLESS_NODES, where numa_mem_id()
points to the nearest node that has memory. SLUB has been allocating its
kmem_cache_node structures only on nodes with memory and so it does with
struct node_barn.

For kmem_cache_node, get_partial_node() checks if get_node() result is
not NULL, which I assumed was for protection from a bogus node id passed
to kmalloc_node() but apparently it's also for systems where
numa_mem_id() (used when no specific node is given) might return a
memoryless node.

Fix the sheaves code the same way by checking the result of get_node()
and bailing out if it's NULL. Note that cpus on such memoryless nodes
will have degraded sheaves performance, which can be improved later,
preferably by making numa_mem_id() work properly on such systems.

Fixes: 2d517aa09b ("slab: add opt-in caching layer of percpu sheaves")
Reported-and-tested-by: Phil Auld <pauld@redhat.com>
Closes: https://lore.kernel.org/all/20251010151116.GA436967@pauld.westford.csb/
Analyzed-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/all/CAHk-%3Dwg1xK%2BBr%3DFJ5QipVhzCvq7uQVPt5Prze6HDhQQ%3DQD_BcQ@mail.gmail.com/
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-10-11 10:47:57 +02:00
2022-09-28 09:02:20 +02:00
2025-02-19 14:53:27 -07:00
2024-03-18 03:36:32 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the reStructuredText markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.
Description
No description provided
Readme 4.7 GiB
Languages
C 97.7%
Assembly 1.3%
Shell 0.3%
Makefile 0.3%
Python 0.2%
Other 0.1%