Patch series "Maple tree mas_{next,prev}_range() and cleanup", v4.
This patchset contains a number of clean ups to the code to make it more
usable (next/prev range), the addition of debug output formatting, the
addition of printing the maple state information in the WARN_ON/BUG_ON
code.
There is also work done here to keep nodes active during iterations to
reduce the necessity of re-walking the tree.
Finally, there is a new interface added to move to the next or previous
range in the tree, even if it is empty.
The organisation of the patches is as follows:
0001-0004 - Small clean ups
0005-0018 - Additional debug options and WARN_ON/BUG_ON changes
0019 - Test module __init and __exit addition
0020-0021 - More functional clean ups
0022-0026 - Changes to keep nodes active
0027-0034 - Add new mas_{prev,next}_range()
0035 - Use new mas_{prev,next}_range() in mmap_region()
This patch (of 35):
Static analyser of the maple tree code noticed that the split variable is
being used to dereference into an array prior to checking the variable
itself. Fix this issue by changing the order of the statement to check
the variable first.
Link: https://lkml.kernel.org/r/20230518145544.1722059-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20230518145544.1722059-2-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: David Binderman <dcb314@hotmail.com>
Reviewed-by: Peng Zhang<zhangpeng.00@bytedance.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Vernon Yang <vernon2gm@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Check the write offset end bounds before using it as the offset into the
pivot array. This avoids a possible out-of-bounds access on the pivot
array if the write extends to the last slot in the node, in which case the
node maximum should be used as the end pivot.
akpm: this doesn't affect any current callers, but new users of mapletree
may encounter this problem if backported into earlier kernels, so let's
fix it in -stable kernels in case of this.
Link: https://lkml.kernel.org/r/20230506024752.2550-1-zhangpeng.00@bytedance.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In the case of reverse allocation, mas->index and mas->last do not point
to the correct allocation range, which will cause users to get incorrect
allocation results, so fix it. If the user does not use it in a specific
way, this bug will not be triggered.
This is a bug, but only VMA uses it now, the way VMA is used now will
not trigger it. There is a possibility that a user will trigger it in
the future.
Also re-check whether the size is still satisfied after the lower bound
was increased, which is a corner case and is incorrect in previous
versions.
Link: https://lkml.kernel.org/r/20230419093625.99201-1-zhangpeng.00@bytedance.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In mas_alloc_nodes(), "node->node_count = 0" means to initialize the
node_count field of the new node, but the node may not be a new node. It
may be a node that existed before and node_count has a value, setting it
to 0 will cause a memory leak. At this time, mas->alloc->total will be
greater than the actual number of nodes in the linked list, which may
cause many other errors. For example, out-of-bounds access in
mas_pop_node(), and mas_pop_node() may return addresses that should not be
used. Fix it by initializing node_count only for new nodes.
Also, by the way, an if-else statement was removed to simplify the code.
Link: https://lkml.kernel.org/r/20230411041005.26205-1-zhangpeng.00@bytedance.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
There is a concurrency bug that may cause the wrong value to be loaded
when a CPU is modifying the maple tree.
CPU1:
mtree_insert_range()
mas_insert()
mas_store_root()
...
mas_root_expand()
...
rcu_assign_pointer(mas->tree->ma_root, mte_mk_root(mas->node));
ma_set_meta(node, maple_leaf_64, 0, slot); <---IP
CPU2:
mtree_load()
mtree_lookup_walk()
ma_data_end();
When CPU1 is about to execute the instruction pointed to by IP, the
ma_data_end() executed by CPU2 may return the wrong end position, which
will cause the value loaded by mtree_load() to be wrong.
An example of triggering the bug:
Add mdelay(100) between rcu_assign_pointer() and ma_set_meta() in
mas_root_expand().
static DEFINE_MTREE(tree);
int work(void *p) {
unsigned long val;
for (int i = 0 ; i< 30; ++i) {
val = (unsigned long)mtree_load(&tree, 8);
mdelay(5);
pr_info("%lu",val);
}
return 0;
}
mt_init_flags(&tree, MT_FLAGS_USE_RCU);
mtree_insert(&tree, 0, (void*)12345, GFP_KERNEL);
run_thread(work)
mtree_insert(&tree, 1, (void*)56789, GFP_KERNEL);
In RCU mode, mtree_load() should always return the value before or after
the data structure is modified, and in this example mtree_load(&tree, 8)
may return 56789 which is not expected, it should always return NULL. Fix
it by put ma_set_meta() before rcu_assign_pointer().
Link: https://lkml.kernel.org/r/20230314124203.91572-4-zhangpeng.00@bytedance.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dereferencing RCU objects within the RCU callback without the RCU check
has caused lockdep to complain. Fix the RCU dereferencing by using the
RCU callback lock to ensure the operation is safe.
Also stop creating a new lock to use for dereferencing during destruction
of the tree or subtree. Instead, pass through a pointer to the tree that
has the lock that is held for RCU dereferencing checking. It also does
not make sense to use the maple state in the freeing scenario as the tree
walk is a special case where the tree no longer has the normal encodings
and parent pointers.
Link: https://lkml.kernel.org/r/20230227173632.3292573-8-surenb@google.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
During the development of the maple tree, the strategy of freeing multiple
nodes changed and, in the process, the pivots were reused to store
pointers to dead nodes. To ensure the readers see accurate pivots, the
writers need to mark the nodes as dead and call smp_wmb() to ensure any
readers can identify the node as dead before using the pivot values.
There were two places where the old method of marking the node as dead
without smp_wmb() were being used, which resulted in RCU readers seeing
the wrong pivot value before seeing the node was dead. Fix this race
condition by using mte_set_node_dead() which has the smp_wmb() call to
ensure the race is closed.
Add a WARN_ON() to the ma_free_rcu() call to ensure all nodes being freed
are marked as dead to ensure there are no other call paths besides the two
updated paths.
This is necessary for the RCU mode of the maple tree.
Link: https://lkml.kernel.org/r/20230227173632.3292573-6-surenb@google.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "Fix mas_skip_node() for mas_empty_area()", v2.
mas_empty_area() was incorrectly returning an error when there was room.
The issue was tracked down to mas_skip_node() using the incorrect
end-of-slot count. Instead of using the nodes hard limit, the limit of
data should be used.
mas_skip_node() was also setting the min and max to that of the child
node, which was unnecessary. Within these limits being set, there was
also a bug that corrupted the maple state's max if the offset was set to
the maximum node pivot. The bug was without consequence unless there was
a sufficient gap in the next child node which would cause an error to be
returned.
This patch set fixes these errors by removing the limit setting from
mas_skip_node() and uses the mas_data_end() for slot limits, and adds
tests for all failures discovered.
This patch (of 2):
mas_skip_node() is used to move the maple state to the node with a higher
limit. It does this by walking up the tree and increasing the slot count.
Since slot count may not be able to be increased, it may need to walk up
multiple times to find room to walk right to a higher limit node. The
limit of slots that was being used was the node limit and not the last
location of data in the node. This would cause the maple state to be
shifted outside actual data and enter an error state, thus returning
-EBUSY.
The result of the incorrect error state means that mas_awalk() would
return an error instead of finding the allocation space.
The fix is to use mas_data_end() in mas_skip_node() to detect the nodes
data end point and continue walking the tree up until it is safe to move
to a node with a higher limit.
The walk up the tree also sets the maple state limits so remove the buggy
code from mas_skip_node(). Setting the limits had the unfortunate side
effect of triggering another bug if the parent node was full and the there
was no suitable gap in the second last child, but room in the next child.
mas_skip_node() may also be passed a maple state in an error state from
mas_anode_descend() when no allocations are available. Return on such an
error state immediately.
Link: https://lkml.kernel.org/r/20230307180247.2220303-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20230307180247.2220303-2-Liam.Howlett@oracle.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Snild Dolkow <snild@sony.com>
Link: https://lore.kernel.org/linux-mm/cb8dc31a-fef2-1d09-f133-e9f7b9f9e77a@sony.com/
Tested-by: Snild Dolkow <snild@sony.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
gcc-10 changed the way inlining works to be less aggressive, but older
versions run into an oversized stack frame warning whenever
CONFIG_KASAN_STACK is enabled, as that forces variables from inlined
callees to be non-overlapping:
lib/maple_tree.c: In function 'mas_wr_bnode':
lib/maple_tree.c:4320:1: error: the frame size of 1424 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
Change the annotations on mas_store_b_node() and mas_commit_b_node()
to explicitly forbid inlining in this configuration, which is
the same behavior that newer versions already have.
Link: https://lkml.kernel.org/r/20230214103030.1051950-1-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Vernon Yang <vernon2gm@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>