Commit Graph

8650 Commits

Author SHA1 Message Date
Andy Shevchenko
8d2b2281ae mac_pton: Clean up the header inclusions
Since hex_to_bin() is provided by hex.h there is no need to require
kernel.h. Replace the latter by the former and add missing export.h.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20230604132858.6650-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-06 13:18:32 +02:00
Andy Shevchenko
b2f10148ec kobject: Use return value of strreplace()
Since strreplace() returns the pointer to the string itself,
we may use it directly in the code.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230605170553.7835-4-andriy.shevchenko@linux.intel.com
2023-06-05 15:31:12 -07:00
Andy Shevchenko
d01a77afd6 lib/string_helpers: Change returned value of the strreplace()
It's more useful to return the pointer to the string itself
with strreplace(), so it may be used like

	attr->name = strreplace(name, '/', '_');

While at it, amend the kernel documentation.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230605170553.7835-3-andriy.shevchenko@linux.intel.com
2023-06-05 15:31:12 -07:00
Andrzej Hajda
acd8f0e5d7 lib/ref_tracker: remove warnings in case of allocation failure
Library can handle allocation failures. To avoid allocation warnings
__GFP_NOWARN has been added everywhere. Moreover GFP_ATOMIC has been
replaced with GFP_NOWAIT in case of stack allocation on tracker free
call.

Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-05 15:28:42 -07:00
Andrzej Hajda
227c6c8323 lib/ref_tracker: add printing to memory buffer
Similar to stack_(depot|trace)_snprint the patch
adds helper to printing stats to memory buffer.
It will be helpful in case of debugfs.

Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-05 15:28:42 -07:00
Andrzej Hajda
b6d7c0eb2d lib/ref_tracker: improve printing stats
In case the library is tracking busy subsystem, simply
printing stack for every active reference will spam log
with long, hard to read, redundant stack traces. To improve
readabilty following changes have been made:
- reports are printed per stack_handle - log is more compact,
- added display name for ref_tracker_dir - it will differentiate
  multiple subsystems,
- stack trace is printed indented, in the same printk call,
- info about dropped references is printed as well.

Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-05 15:28:42 -07:00
Andrzej Hajda
7a113ff635 lib/ref_tracker: add unlocked leak print helper
To have reliable detection of leaks, caller must be able to check under
the same lock both: tracked counter and the leaks. dir.lock is natural
candidate for such lock and unlocked print helper can be called with this
lock taken.
As a bonus we can reuse this helper in ref_tracker_dir_exit.

Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-05 15:28:42 -07:00
Peter Zijlstra
224d80c584 types: Introduce [us]128
Introduce [us]128 (when available). Unlike [us]64, ensure they are
always naturally aligned.

This also enables 128bit wide atomics (which require natural
alignment) such as cmpxchg128().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20230531132323.385005581@infradead.org
2023-06-05 09:36:35 +02:00
David Gow
260755184c kunit: Move kunit_abort() call out of kunit_do_failed_assertion()
KUnit aborts the current thread when an assertion fails. Currently, this
is done conditionally as part of the kunit_do_failed_assertion()
function, but this hides the kunit_abort() call from the compiler
(particularly if it's in another module). This, in turn, can lead to
both suboptimal code generation (the compiler can't know if
kunit_do_failed_assertion() will return), and to static analysis tools
like smatch giving false positives.

Moving the kunit_abort() call into the macro should give the compiler
and tools a better chance at understanding what's going on. Doing so
requires exporting kunit_abort(), though it's recommended to continue to
use assertions in lieu of aborting directly.

In addition, kunit_abort() and kunit_do_failed_assertion() are renamed
to make it clear they they're intended for internal KUnit use, to:
__kunit_do_failed_assertion() and __kunit_abort()

Suggested-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: David Gow <davidgow@google.com>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Daniel Latypov <dlatypov@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-06-01 13:04:46 -06:00
Alexander Potapenko
f9cfb1910e string: use __builtin_memcpy() in strlcpy/strlcat
lib/string.c is built with -ffreestanding, which prevents the compiler
from replacing certain functions with calls to their library versions.

On the other hand, this also prevents Clang and GCC from instrumenting
calls to memcpy() when building with KASAN, KCSAN or KMSAN:
 - KASAN normally replaces memcpy() with __asan_memcpy() with the
   additional cc-param,asan-kernel-mem-intrinsic-prefix=1;
 - KCSAN and KMSAN replace memcpy() with __tsan_memcpy() and
   __msan_memcpy() by default.

To let the tools catch memory accesses from strlcpy/strlcat, replace
the calls to memcpy() with __builtin_memcpy(), which KASAN, KCSAN and
KMSAN are able to replace even in -ffreestanding mode.

This preserves the behavior in normal builds (__builtin_memcpy() ends up
being replaced with memcpy()), and does not introduce new instrumentation
in unwanted places, as strlcpy/strlcat are already instrumented.

Suggested-by: Marco Elver <elver@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Link: https://lore.kernel.org/all/20230224085942.1791837-1-elver@google.com/
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230530083911.1104336-1-glider@google.com
2023-06-01 11:24:50 -07:00
Mirsad Goran Todorovac
48e1560230 test_firmware: fix the memory leak of the allocated firmware buffer
The following kernel memory leak was noticed after running
tools/testing/selftests/firmware/fw_run_tests.sh:

[root@pc-mtodorov firmware]# cat /sys/kernel/debug/kmemleak
.
.
.
unreferenced object 0xffff955389bc3400 (size 1024):
  comm "test_firmware-0", pid 5451, jiffies 4294944822 (age 65.652s)
  hex dump (first 32 bytes):
    47 48 34 35 36 37 0a 00 00 00 00 00 00 00 00 00  GH4567..........
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff962f5dec>] slab_post_alloc_hook+0x8c/0x3c0
    [<ffffffff962fcca4>] __kmem_cache_alloc_node+0x184/0x240
    [<ffffffff962704de>] kmalloc_trace+0x2e/0xc0
    [<ffffffff9665b42d>] test_fw_run_batch_request+0x9d/0x180
    [<ffffffff95fd813b>] kthread+0x10b/0x140
    [<ffffffff95e033e9>] ret_from_fork+0x29/0x50
unreferenced object 0xffff9553c334b400 (size 1024):
  comm "test_firmware-1", pid 5452, jiffies 4294944822 (age 65.652s)
  hex dump (first 32 bytes):
    47 48 34 35 36 37 0a 00 00 00 00 00 00 00 00 00  GH4567..........
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff962f5dec>] slab_post_alloc_hook+0x8c/0x3c0
    [<ffffffff962fcca4>] __kmem_cache_alloc_node+0x184/0x240
    [<ffffffff962704de>] kmalloc_trace+0x2e/0xc0
    [<ffffffff9665b42d>] test_fw_run_batch_request+0x9d/0x180
    [<ffffffff95fd813b>] kthread+0x10b/0x140
    [<ffffffff95e033e9>] ret_from_fork+0x29/0x50
unreferenced object 0xffff9553c334f000 (size 1024):
  comm "test_firmware-2", pid 5453, jiffies 4294944822 (age 65.652s)
  hex dump (first 32 bytes):
    47 48 34 35 36 37 0a 00 00 00 00 00 00 00 00 00  GH4567..........
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff962f5dec>] slab_post_alloc_hook+0x8c/0x3c0
    [<ffffffff962fcca4>] __kmem_cache_alloc_node+0x184/0x240
    [<ffffffff962704de>] kmalloc_trace+0x2e/0xc0
    [<ffffffff9665b42d>] test_fw_run_batch_request+0x9d/0x180
    [<ffffffff95fd813b>] kthread+0x10b/0x140
    [<ffffffff95e033e9>] ret_from_fork+0x29/0x50
unreferenced object 0xffff9553c3348400 (size 1024):
  comm "test_firmware-3", pid 5454, jiffies 4294944822 (age 65.652s)
  hex dump (first 32 bytes):
    47 48 34 35 36 37 0a 00 00 00 00 00 00 00 00 00  GH4567..........
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff962f5dec>] slab_post_alloc_hook+0x8c/0x3c0
    [<ffffffff962fcca4>] __kmem_cache_alloc_node+0x184/0x240
    [<ffffffff962704de>] kmalloc_trace+0x2e/0xc0
    [<ffffffff9665b42d>] test_fw_run_batch_request+0x9d/0x180
    [<ffffffff95fd813b>] kthread+0x10b/0x140
    [<ffffffff95e033e9>] ret_from_fork+0x29/0x50
[root@pc-mtodorov firmware]#

Note that the size 1024 corresponds to the size of the test firmware
buffer. The actual number of the buffers leaked is around 70-110,
depending on the test run.

The cause of the leak is the following:

request_partial_firmware_into_buf() and request_firmware_into_buf()
provided firmware buffer isn't released on release_firmware(), we
have allocated it and we are responsible for deallocating it manually.
This is introduced in a number of context where previously only
release_firmware() was called, which was insufficient.

Reported-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Fixes: 7feebfa487 ("test_firmware: add support for request_firmware_into_buf")
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Dan Carpenter <error27@gmail.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Russ Weight <russell.h.weight@intel.com>
Cc: Tianfei zhang <tianfei.zhang@intel.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Zhengchao Shao <shaozhengchao@huawei.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: linux-kernel@vger.kernel.org
Cc: Kees Cook <keescook@chromium.org>
Cc: Scott Branden <sbranden@broadcom.com>
Cc: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: linux-kselftest@vger.kernel.org
Cc: stable@vger.kernel.org # v5.4
Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Link: https://lore.kernel.org/r/20230509084746.48259-3-mirsad.todorovac@alu.unizg.hr
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-31 20:31:07 +01:00
Mirsad Goran Todorovac
be37bed754 test_firmware: fix a memory leak with reqs buffer
Dan Carpenter spotted that test_fw_config->reqs will be leaked if
trigger_batched_requests_store() is called two or more times.
The same appears with trigger_batched_requests_async_store().

This bug wasn't trigger by the tests, but observed by Dan's visual
inspection of the code.

The recommended workaround was to return -EBUSY if test_fw_config->reqs
is already allocated.

Fixes: 7feebfa487 ("test_firmware: add support for request_firmware_into_buf")
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Russ Weight <russell.h.weight@intel.com>
Cc: Tianfei Zhang <tianfei.zhang@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-kselftest@vger.kernel.org
Cc: stable@vger.kernel.org # v5.4
Suggested-by: Dan Carpenter <error27@gmail.com>
Suggested-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20230509084746.48259-2-mirsad.todorovac@alu.unizg.hr
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-31 20:31:07 +01:00
Mirsad Goran Todorovac
4acfe3dfde test_firmware: prevent race conditions by a correct implementation of locking
Dan Carpenter spotted a race condition in a couple of situations like
these in the test_firmware driver:

static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
        u8 val;
        int ret;

        ret = kstrtou8(buf, 10, &val);
        if (ret)
                return ret;

        mutex_lock(&test_fw_mutex);
        *(u8 *)cfg = val;
        mutex_unlock(&test_fw_mutex);

        /* Always return full write size even if we didn't consume all */
        return size;
}

static ssize_t config_num_requests_store(struct device *dev,
                                         struct device_attribute *attr,
                                         const char *buf, size_t count)
{
        int rc;

        mutex_lock(&test_fw_mutex);
        if (test_fw_config->reqs) {
                pr_err("Must call release_all_firmware prior to changing config\n");
                rc = -EINVAL;
                mutex_unlock(&test_fw_mutex);
                goto out;
        }
        mutex_unlock(&test_fw_mutex);

        rc = test_dev_config_update_u8(buf, count,
                                       &test_fw_config->num_requests);

out:
        return rc;
}

static ssize_t config_read_fw_idx_store(struct device *dev,
                                        struct device_attribute *attr,
                                        const char *buf, size_t count)
{
        return test_dev_config_update_u8(buf, count,
                                         &test_fw_config->read_fw_idx);
}

The function test_dev_config_update_u8() is called from both the locked
and the unlocked context, function config_num_requests_store() and
config_read_fw_idx_store() which can both be called asynchronously as
they are driver's methods, while test_dev_config_update_u8() and siblings
change their argument pointed to by u8 *cfg or similar pointer.

To avoid deadlock on test_fw_mutex, the lock is dropped before calling
test_dev_config_update_u8() and re-acquired within test_dev_config_update_u8()
itself, but alas this creates a race condition.

Having two locks wouldn't assure a race-proof mutual exclusion.

This situation is best avoided by the introduction of a new, unlocked
function __test_dev_config_update_u8() which can be called from the locked
context and reducing test_dev_config_update_u8() to:

static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
        int ret;

        mutex_lock(&test_fw_mutex);
        ret = __test_dev_config_update_u8(buf, size, cfg);
        mutex_unlock(&test_fw_mutex);

        return ret;
}

doing the locking and calling the unlocked primitive, which enables both
locked and unlocked versions without duplication of code.

The similar approach was applied to all functions called from the locked
and the unlocked context, which safely mitigates both deadlocks and race
conditions in the driver.

__test_dev_config_update_bool(), __test_dev_config_update_u8() and
__test_dev_config_update_size_t() unlocked versions of the functions
were introduced to be called from the locked contexts as a workaround
without releasing the main driver's lock and thereof causing a race
condition.

The test_dev_config_update_bool(), test_dev_config_update_u8() and
test_dev_config_update_size_t() locked versions of the functions
are being called from driver methods without the unnecessary multiplying
of the locking and unlocking code for each method, and complicating
the code with saving of the return value across lock.

Fixes: 7feebfa487 ("test_firmware: add support for request_firmware_into_buf")
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Russ Weight <russell.h.weight@intel.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tianfei Zhang <tianfei.zhang@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-kselftest@vger.kernel.org
Cc: stable@vger.kernel.org # v5.4
Suggested-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Link: https://lore.kernel.org/r/20230509084746.48259-1-mirsad.todorovac@alu.unizg.hr
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-31 20:31:07 +01:00
Su Hui
0d2da4b595 bpf/tests: Use struct_size()
Use struct_size() instead of hand writing it. This is less verbose and
more informative.

Signed-off-by: Su Hui <suhui@nfschina.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20230531043251.989312-1-suhui@nfschina.com
2023-05-31 12:58:38 +02:00
Arnd Bergmann
26f15e5de1 ubsan: add prototypes for internal functions
Most of the functions in ubsan that are only called from generated
code don't have a prototype, which W=1 builds warn about:

lib/ubsan.c:226:6: error: no previous prototype for '__ubsan_handle_divrem_overflow' [-Werror=missing-prototypes]
lib/ubsan.c:307:6: error: no previous prototype for '__ubsan_handle_type_mismatch' [-Werror=missing-prototypes]
lib/ubsan.c:321:6: error: no previous prototype for '__ubsan_handle_type_mismatch_v1' [-Werror=missing-prototypes]
lib/ubsan.c:335:6: error: no previous prototype for '__ubsan_handle_out_of_bounds' [-Werror=missing-prototypes]
lib/ubsan.c:352:6: error: no previous prototype for '__ubsan_handle_shift_out_of_bounds' [-Werror=missing-prototypes]
lib/ubsan.c:394:6: error: no previous prototype for '__ubsan_handle_builtin_unreachable' [-Werror=missing-prototypes]
lib/ubsan.c:404:6: error: no previous prototype for '__ubsan_handle_load_invalid_value' [-Werror=missing-prototypes]

Add prototypes for all of these to lib/ubsan.h, and remove the
one that was already present in ubsan.c.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Fangrui Song <maskray@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230517125102.930491-1-arnd@kernel.org
2023-05-30 16:42:01 -07:00
Linus Torvalds
d8f14b84fe Merge tag 'core-debugobjects-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull debugobjects fixes from Thomas Gleixner:
 "Two fixes for debugobjects:

   - Prevent the allocation path from waking up kswapd.

     That's a long standing issue due to the GFP_ATOMIC allocation flag.
     As debug objects can be invoked from pretty much any context waking
     kswapd can end up in arbitrary lock chains versus the waitqueue
     lock

   - Correct the explicit lockdep wait-type violation in
     debug_object_fill_pool()"

* tag 'core-debugobjects-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  debugobjects: Don't wake up kswapd from fill_pool()
  debugobjects,locking: Annotate debug_object_fill_pool() wait type violation
2023-05-28 07:15:33 -04:00
Kees Cook
d67790ddf0 overflow: Add struct_size_t() helper
While struct_size() is normally used in situations where the structure
type already has a pointer instance, there are places where no variable
is available. In the past, this has been worked around by using a typed
NULL first argument, but this is a bit ugly. Add a helper to do this,
and replace the handful of instances of the code pattern with it.

Instances were found with this Coccinelle script:

@struct_size_t@
identifier STRUCT, MEMBER;
expression COUNT;
@@

-       struct_size((struct STRUCT *)\(0\|NULL\),
+       struct_size_t(struct STRUCT,
                MEMBER, COUNT)

Suggested-by: Christoph Hellwig <hch@infradead.org>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: HighPoint Linux Team <linux@highpoint-tech.com>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumit Saxena <sumit.saxena@broadcom.com>
Cc: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Cc: Don Brace <don.brace@microchip.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Guo Xuenan <guoxuenan@huawei.com>
Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Daniel Latypov <dlatypov@google.com>
Cc: kernel test robot <lkp@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org
Cc: linux-nvme@lists.infradead.org
Cc: linux-scsi@vger.kernel.org
Cc: megaraidlinux.pdl@broadcom.com
Cc: storagedev@microchip.com
Cc: linux-xfs@vger.kernel.org
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20230522211810.never.421-kees@kernel.org
2023-05-26 13:52:19 -07:00
Michal Wajdeczko
b1eaa8b2a5 kunit: Update kunit_print_ok_not_ok function
There is no need use opaque test_or_suite pointer and is_test flag
as we don't use anything from the suite struct. Always expect test
pointer and use NULL as indication that provided results are from
the suite so we can treat them differently.

Since results could be from nested tests, like parameterized tests,
add explicit level parameter to properly indent output messages and
thus allow to reuse this function from other places.

While around, remove small code duplication near skip directive.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: David Gow <davidgow@google.com>
Cc: Rae Moar <rmoar@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-05-26 08:44:09 -06:00
Michal Wajdeczko
b08f75b9bb kunit: Fix reporting of the skipped parameterized tests
Logs from the parameterized tests that were skipped don't include
SKIP directive thus they are displayed as PASSED. Fix that.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: David Gow <davidgow@google.com>
Reviewed-by: Rae Moar <rmoar@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-05-26 08:44:03 -06:00
Michal Wajdeczko
d273b72846 kunit/test: Add example test showing parameterized testing
Use of parameterized testing is documented [1] but such use case
is not present in demo kunit test. Add small subtest for that.

[1] https://kernel.org/doc/html/latest/dev-tools/kunit/usage.html#parameterized-testing

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: David Gow <davidgow@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-05-26 08:43:57 -06:00
Noah Goldstein
688eb8191b x86/csum: Improve performance of csum_partial
1) Add special case for len == 40 as that is the hottest value. The
   nets a ~8-9% latency improvement and a ~30% throughput improvement
   in the len == 40 case.

2) Use multiple accumulators in the 64-byte loop. This dramatically
   improves ILP and results in up to a 40% latency/throughput
   improvement (better for more iterations).

Results from benchmarking on Icelake. Times measured with rdtsc()
 len   lat_new   lat_old      r    tput_new  tput_old      r
   8      3.58      3.47  1.032        3.58      3.51  1.021
  16      4.14      4.02  1.028        3.96      3.78  1.046
  24      4.99      5.03  0.992        4.23      4.03  1.050
  32      5.09      5.08  1.001        4.68      4.47  1.048
  40      5.57      6.08  0.916        3.05      4.43  0.690
  48      6.65      6.63  1.003        4.97      4.69  1.059
  56      7.74      7.72  1.003        5.22      4.95  1.055
  64      6.65      7.22  0.921        6.38      6.42  0.994
  96      9.43      9.96  0.946        7.46      7.54  0.990
 128      9.39     12.15  0.773        8.90      8.79  1.012
 200     12.65     18.08  0.699       11.63     11.60  1.002
 272     15.82     23.37  0.677       14.43     14.35  1.005
 440     24.12     36.43  0.662       21.57     22.69  0.951
 952     46.20     74.01  0.624       42.98     53.12  0.809
1024     47.12     78.24  0.602       46.36     58.83  0.788
1552     72.01    117.30  0.614       71.92     96.78  0.743
2048     93.07    153.25  0.607       93.28    137.20  0.680
2600    114.73    194.30  0.590      114.28    179.32  0.637
3608    156.34    268.41  0.582      154.97    254.02  0.610
4096    175.01    304.03  0.576      175.89    292.08  0.602

There is no such thing as a free lunch, however, and the special case
for len == 40 does add overhead to the len != 40 cases. This seems to
amount to be ~5% throughput and slightly less in terms of latency.

Testing:
Part of this change is a new kunit test. The tests check all
alignment X length pairs in [0, 64) X [0, 512).
There are three cases.
    1) Precomputed random inputs/seed. The expected results where
       generated use the generic implementation (which is assumed to be
       non-buggy).
    2) An input of all 1s. The goal of this test is to catch any case
       a carry is missing.
    3) An input that never carries. The goal of this test si to catch
       any case of incorrectly carrying.

More exhaustive tests that test all alignment X length pairs in
[0, 8192) X [0, 8192] on random data are also available here:
https://github.com/goldsteinn/csum-reproduction

The reposity also has the code for reproducing the above benchmark
numbers.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20230511011002.935690-1-goldstein.w.n%40gmail.com
2023-05-25 10:55:18 -07:00
David Gow
57e3cded99 kunit: kmalloc_array: Use kunit_add_action()
The kunit_add_action() function is much simpler and cleaner to use that
the full KUnit resource API for simple things like the
kunit_kmalloc_array() functionality.

Replacing it allows us to get rid of a number of helper functions, and
leaves us with no uses of kunit_alloc_resource(), which has some
usability problems and is going to have its behaviour modified in an
upcoming patch.

Note that we need to use kunit_defer_trigger_all() to implement
kunit_kfree().

Reviewed-by: Benjamin Berg <benjamin.berg@intel.com>
Reviewed-by: Maxime Ripard <maxime@cerno.tech>
Tested-by: Maxime Ripard <maxime@cerno.tech>
Signed-off-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-05-25 08:53:07 -06:00
David Gow
00e63f8afc kunit: executor_test: Use kunit_add_action()
Now we have the kunit_add_action() function, we can use it to implement
kfree_at_end() and free_subsuite_at_end() without the need for extra
helper functions.

Reviewed-by: Benjamin Berg <benjamin.berg@intel.com>
Reviewed-by: Maxime Ripard <maxime@cerno.tech>
Tested-by: Maxime Ripard <maxime@cerno.tech>
Signed-off-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-05-25 08:53:01 -06:00
David Gow
b9dce8a1ed kunit: Add kunit_add_action() to defer a call until test exit
Many uses of the KUnit resource system are intended to simply defer
calling a function until the test exits (be it due to success or
failure). The existing kunit_alloc_resource() function is often used for
this, but was awkward to use (requiring passing NULL init functions, etc),
and returned a resource without incrementing its reference count, which
-- while okay for this use-case -- could cause problems in others.

Instead, introduce a simple kunit_add_action() API: a simple function
(returning nothing, accepting a single void* argument) can be scheduled
to be called when the test exits. Deferred actions are called in the
opposite order to that which they were registered.

This mimics the devres API, devm_add_action(), and also provides
kunit_remove_action(), to cancel a deferred action, and
kunit_release_action() to trigger one early.

This is implemented as a resource under the hood, so the ordering
between resource cleanup and deferred functions is maintained.

Reviewed-by: Benjamin Berg <benjamin.berg@intel.com>
Reviewed-by: Maxime Ripard <maxime@cerno.tech>
Tested-by: Maxime Ripard <maxime@cerno.tech>
Signed-off-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-05-25 08:52:55 -06:00
David Howells
3fc40265ae iov_iter: Kill ITER_PIPE
The ITER_PIPE-type iterator was only used by generic_file_splice_read() and
that has been replaced and removed.  This leaves ITER_PIPE unused - so
remove it too.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: David Hildenbrand <david@redhat.com>
cc: John Hubbard <jhubbard@nvidia.com>
cc: linux-mm@kvack.org
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20230522135018.2742245-31-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-24 08:42:17 -06:00
Tetsuo Handa
eb799279fb debugobjects: Don't wake up kswapd from fill_pool()
syzbot is reporting a lockdep warning in fill_pool() because the allocation
from debugobjects is using GFP_ATOMIC, which is (__GFP_HIGH | __GFP_KSWAPD_RECLAIM)
and therefore tries to wake up kswapd, which acquires kswapd_wait::lock.

Since fill_pool() might be called with arbitrary locks held, fill_pool()
should not assume that acquiring kswapd_wait::lock is safe.

Use __GFP_HIGH instead and remove __GFP_NORETRY as it is pointless for
!__GFP_DIRECT_RECLAIM allocation.

Fixes: 3ac7fe5a4a ("infrastructure to debug (dynamic) objects")
Reported-by: syzbot <syzbot+fe0c72f0ccbb93786380@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/6577e1fa-b6ee-f2be-2414-a2b51b1c5e30@I-love.SAKURA.ne.jp
Closes: https://syzkaller.appspot.com/bug?extid=fe0c72f0ccbb93786380
2023-05-22 14:52:58 +02:00
Herbert Xu
6c19f3bfff crypto: lib/sha256 - Use generic code from sha256_base
Instead of duplicating the sha256 block processing code, reuse
the common code from crypto/sha256_base.h.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-05-19 16:45:43 +08:00
Herbert Xu
70d391a863 crypto: lib/sha256 - Remove redundant and unused sha224_update
The function sha224_update is exactly the same as sha256_update.
Moreover it's not even used in the kernel so it can be removed.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-05-19 16:45:43 +08:00
Linus Torvalds
f4a8871f9f Merge tag 'mm-hotfixes-stable-2023-05-18-15-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
 "Eight hotfixes. Four are cc:stable, the other four are for post-6.4
  issues, or aren't considered suitable for backporting"

* tag 'mm-hotfixes-stable-2023-05-18-15-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  MAINTAINERS: Cleanup Arm Display IP maintainers
  MAINTAINERS: repair pattern in DIALOG SEMICONDUCTOR DRIVERS
  nilfs2: fix use-after-free bug of nilfs_root in nilfs_evict_inode()
  mm: fix zswap writeback race condition
  mm: kfence: fix false positives on big endian
  zsmalloc: move LRU update from zs_map_object() to zs_malloc()
  mm: shrinkers: fix race condition on debugfs cleanup
  maple_tree: make maple state reusable after mas_empty_area()
2023-05-18 17:06:04 -07:00
Tejun Heo
6363845005 workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanism
Workqueue now automatically marks per-cpu work items that hog CPU for too
long as CPU_INTENSIVE, which excludes them from concurrency management and
prevents stalling other concurrency-managed work items. If a work function
keeps running over the thershold, it likely needs to be switched to use an
unbound workqueue.

This patch adds a debug mechanism which tracks the work functions which
trigger the automatic CPU_INTENSIVE mechanism and report them using
pr_warn() with exponential backoff.

v3: Documentation update.

v2: Drop bouncing to kthread_worker for printing messages. It was to avoid
    introducing circular locking dependency through printk but not effective
    as it still had pool lock -> wci_lock -> printk -> pool lock loop. Let's
    just print directly using printk_deferred().

Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
2023-05-17 17:02:08 -10:00
Peng Zhang
0257d9908d maple_tree: make maple state reusable after mas_empty_area()
Make mas->min and mas->max point to a node range instead of a leaf entry
range.  This allows mas to still be usable after mas_empty_area() returns.
Users would get unexpected results from other operations on the maple
state after calling the affected function.

For example, x86 MAP_32BIT mmap() acts as if there is no suitable gap when
there should be one.

Link: https://lkml.kernel.org/r/20230505145829.74574-1-zhangpeng.00@bytedance.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reported-by: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
Reported-by: Tad <support@spotco.us>
Reported-by: Michael Keyes <mgkeyes@vigovproductions.net>
  Link: https://lore.kernel.org/linux-mm/32f156ba80010fd97dbaf0a0cdfc84366608624d.camel@intel.com/
  Link: https://lore.kernel.org/linux-mm/e6108286ac025c268964a7ead3aab9899f9bc6e9.camel@spotco.us/
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-17 15:24:32 -07:00
Nick Desaulniers
08e4044243 ubsan: remove cc-option test for UBSAN_TRAP
-fsanitize-undefined-trap-on-error has been supported since GCC 5.1 and
Clang 3.2.  The minimum supported version of these according to
Documentation/process/changes.rst is 5.1 and 11.0.0 respectively. Drop
this cc-option check.

Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230407215406.768464-1-ndesaulniers@google.com
2023-05-17 12:01:54 -07:00
Kees Cook
3bf301e1ab string: Add Kunit tests for strcat() family
Add tests to make sure the strcat() family of functions behave
correctly.

Signed-off-by: Kees Cook <keescook@chromium.org>
2023-05-16 14:08:02 -07:00
Kees Cook
a9dc8d0442 fortify: Allow KUnit test to build without FORTIFY
In order for CI systems to notice all the skipped tests related to
CONFIG_FORTIFY_SOURCE, allow the FORTIFY_SOURCE KUnit tests to build
with or without CONFIG_FORTIFY_SOURCE.

Signed-off-by: Kees Cook <keescook@chromium.org>
2023-05-16 14:07:49 -07:00
Kees Cook
2d47c6956a ubsan: Tighten UBSAN_BOUNDS on GCC
The use of -fsanitize=bounds on GCC will ignore some trailing arrays,
leaving a gap in coverage. Switch to using -fsanitize=bounds-strict to
match Clang's stricter behavior.

Cc: Marco Elver <elver@google.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nicolas Schier <nicolas@fjasle.eu>
Cc: Tom Rix <trix@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: linux-kbuild@vger.kernel.org
Cc: llvm@lists.linux.dev
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230405022356.gonna.338-kees@kernel.org
2023-05-16 13:57:14 -07:00
David Gow
a5ce66ad29 kunit: example: Provide example exit functions
Add an example .exit and .suite_exit function to the KUnit example
suite. Given exit functions are a bit more subtle than init functions
(due to running in a different kthread, and running even after tests or
test init functions fail), providing an easy place to experiment with
them is useful.

Reviewed-by: Rae Moar <rmoar@google.com>
Signed-off-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-05-11 18:17:41 -06:00
David Gow
55e8c1b49a kunit: Always run cleanup from a test kthread
KUnit tests run in a kthread, with the current->kunit_test pointer set
to the test's context. This allows the kunit_get_current_test() and
kunit_fail_current_test() macros to work. Normally, this pointer is
still valid during test shutdown (i.e., the suite->exit function, and
any resource cleanup). However, if the test has exited early (e.g., due
to a failed assertion), the cleanup is done in the parent KUnit thread,
which does not have an active context.

Instead, in the event test terminates early, run the test exit and
cleanup from a new 'cleanup' kthread, which sets current->kunit_test,
and better isolates the rest of KUnit from issues which arise in test
cleanup.

If a test cleanup function itself aborts (e.g., due to an assertion
failing), there will be no further attempts to clean up: an error will
be logged and the test failed. For example:
	 # example_simple_test: test aborted during cleanup. continuing without cleaning up

This should also make it easier to get access to the KUnit context,
particularly from within resource cleanup functions, which may, for
example, need access to data in test->priv.

Reviewed-by: Benjamin Berg <benjamin.berg@intel.com>
Reviewed-by: Maxime Ripard <maxime@cerno.tech>
Tested-by: Maxime Ripard <maxime@cerno.tech>
Signed-off-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-05-11 18:17:24 -06:00
Linus Torvalds
6e27831b91 Merge tag 'net-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
 "Including fixes from netfilter.

  Current release - regressions:

   - mtk_eth_soc: fix NULL pointer dereference

  Previous releases - regressions:

   - core:
      - skb_partial_csum_set() fix against transport header magic value
      - fix load-tearing on sk->sk_stamp in sock_recv_cmsgs().
      - annotate sk->sk_err write from do_recvmmsg()
      - add vlan_get_protocol_and_depth() helper

   - netlink: annotate accesses to nlk->cb_running

   - netfilter: always release netdev hooks from notifier

  Previous releases - always broken:

   - core: deal with most data-races in sk_wait_event()

   - netfilter: fix possible bug_on with enable_hooks=1

   - eth: bonding: fix send_peer_notif overflow

   - eth: xpcs: fix incorrect number of interfaces

   - eth: ipvlan: fix out-of-bounds caused by unclear skb->cb

   - eth: stmmac: Initialize MAC_ONEUS_TIC_COUNTER register"

* tag 'net-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (31 commits)
  af_unix: Fix data races around sk->sk_shutdown.
  af_unix: Fix a data race of sk->sk_receive_queue->qlen.
  net: datagram: fix data-races in datagram_poll()
  net: mscc: ocelot: fix stat counter register values
  ipvlan:Fix out-of-bounds caused by unclear skb->cb
  docs: networking: fix x25-iface.rst heading & index order
  gve: Remove the code of clearing PBA bit
  tcp: add annotations around sk->sk_shutdown accesses
  net: add vlan_get_protocol_and_depth() helper
  net: pcs: xpcs: fix incorrect number of interfaces
  net: deal with most data-races in sk_wait_event()
  net: annotate sk->sk_err write from do_recvmmsg()
  netlink: annotate accesses to nlk->cb_running
  kselftest: bonding: add num_grat_arp test
  selftests: forwarding: lib: add netns support for tc rule handle stats get
  Documentation: bonding: fix the doc of peer_notif_delay
  bonding: fix send_peer_notif overflow
  net: ethernet: mtk_eth_soc: fix NULL pointer dereference
  selftests: nft_flowtable.sh: check ingress/egress chain too
  selftests: nft_flowtable.sh: monitor result file sizes
  ...
2023-05-11 08:42:47 -05:00
Roy Novich
162bd18eb5 linux/dim: Do nothing if no time delta between samples
Add return value for dim_calc_stats. This is an indication for the
caller if curr_stats was assigned by the function. Avoid using
curr_stats uninitialized over {rdma/net}_dim, when no time delta between
samples. Coverity reported this potential use of an uninitialized
variable.

Fixes: 4c4dbb4a73 ("net/mlx5e: Move dynamic interrupt coalescing code to include/linux")
Fixes: cb3c7fd4f8 ("net/mlx5e: Support adaptive RX coalescing")
Signed-off-by: Roy Novich <royno@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Link: https://lore.kernel.org/r/20230507135743.138993-1-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-05-09 11:06:45 +02:00
Linus Torvalds
17784de648 Merge tag 'core-debugobjects-2023-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull debugobjects fix from Thomas Gleixner:
 "A single fix for debugobjects:

  The recent fix to ensure atomicity of lookup and allocation
  inadvertently broke the pool refill mechanism, so that debugobject
  OOMs now in certain situations. The reason is that the functions which
  got updated no longer invoke debug_objecs_init(), which is now the
  only place to care about refilling the tracking object pool.

  Restore the original behaviour by adding explicit refill opportunities
  to those places"

* tag 'core-debugobjects-2023-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  debugobject: Ensure pool refill (again)
2023-05-07 11:04:26 -07:00
Linus Torvalds
15fb96a35d Merge tag 'mm-stable-2023-05-03-16-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton:

 - Some DAMON cleanups from Kefeng Wang

 - Some KSM work from David Hildenbrand, to make the PR_SET_MEMORY_MERGE
   ioctl's behavior more similar to KSM's behavior.

[ Andrew called these "final", but I suspect we'll have a series fixing
  up the fact that the last commit in the dmapools series in the
  previous pull seems to have unintentionally just reverted all the
  other commits in the same series..   - Linus ]

* tag 'mm-stable-2023-05-03-16-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm: hwpoison: coredump: support recovery from dump_user_range()
  mm/page_alloc: add some comments to explain the possible hole in __pageblock_pfn_to_page()
  mm/ksm: move disabling KSM from s390/gmap code to KSM code
  selftests/ksm: ksm_functional_tests: add prctl unmerge test
  mm/ksm: unmerge and clear VM_MERGEABLE when setting PR_SET_MEMORY_MERGE=0
  mm/damon/paddr: fix missing folio_sz update in damon_pa_young()
  mm/damon/paddr: minor refactor of damon_pa_mark_accessed_or_deactivate()
  mm/damon/paddr: minor refactor of damon_pa_pageout()
2023-05-04 13:09:43 -07:00
Kefeng Wang
245f092268 mm: hwpoison: coredump: support recovery from dump_user_range()
dump_user_range() is used to copy the user page to a coredump file, but if
a hardware memory error occurred during copy, which called from
__kernel_write_iter() in dump_user_range(), it crashes,

  CPU: 112 PID: 7014 Comm: mca-recover Not tainted 6.3.0-rc2 #425

  pc : __memcpy+0x110/0x260
  lr : _copy_from_iter+0x3bc/0x4c8
  ...
  Call trace:
   __memcpy+0x110/0x260
   copy_page_from_iter+0xcc/0x130
   pipe_write+0x164/0x6d8
   __kernel_write_iter+0x9c/0x210
   dump_user_range+0xc8/0x1d8
   elf_core_dump+0x308/0x368
   do_coredump+0x2e8/0xa40
   get_signal+0x59c/0x788
   do_signal+0x118/0x1f8
   do_notify_resume+0xf0/0x280
   el0_da+0x130/0x138
   el0t_64_sync_handler+0x68/0xc0
   el0t_64_sync+0x188/0x190

Generally, the '->write_iter' of file ops will use copy_page_from_iter()
and copy_page_from_iter_atomic(), change memcpy() to copy_mc_to_kernel()
in both of them to handle #MC during source read, which stop coredump
processing and kill the task instead of kernel panic, but the source
address may not always a user address, so introduce a new copy_mc flag in
struct iov_iter{} to indicate that the iter could do a safe memory copy,
also introduce the helpers to set/cleck the flag, for now, it's only used
in coredump's dump_user_range(), but it could expand to any other
scenarios to fix the similar issue.

Link: https://lkml.kernel.org/r/20230417045323.11054-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Tong Tiangen <tongtiangen@huawei.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-02 17:21:50 -07:00
Peter Zijlstra
0cce06ba85 debugobjects,locking: Annotate debug_object_fill_pool() wait type violation
There is an explicit wait-type violation in debug_object_fill_pool()
for PREEMPT_RT=n kernels which allows them to more easily fill the
object pool and reduce the chance of allocation failures.

Lockdep's wait-type checks are designed to check the PREEMPT_RT
locking rules even for PREEMPT_RT=n kernels and object to this, so
create a lockdep annotation to allow this to stand.

Specifically, create a 'lock' type that overrides the inner wait-type
while it is held -- allowing one to temporarily raise it, such that
the violation is hidden.

Reported-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Qi Zheng <zhengqi.arch@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Qi Zheng <zhengqi.arch@bytedance.com>
Link: https://lkml.kernel.org/r/20230429100614.GA1489784@hirez.programming.kicks-ass.net
2023-05-02 14:48:14 +02:00
Thomas Gleixner
0af462f19e debugobject: Ensure pool refill (again)
The recent fix to ensure atomicity of lookup and allocation inadvertently
broke the pool refill mechanism.

Prior to that change debug_objects_activate() and debug_objecs_assert_init()
invoked debug_objecs_init() to set up the tracking object for statically
initialized objects. That's not longer the case and debug_objecs_init() is
now the only place which does pool refills.

Depending on the number of statically initialized objects this can be
enough to actually deplete the pool, which was observed by Ido via a
debugobjects OOM warning.

Restore the old behaviour by adding explicit refill opportunities to
debug_objects_activate() and debug_objecs_assert_init().

Fixes: 63a759694e ("debugobject: Prevent init race with static objects")
Reported-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/871qk05a9d.ffs@tglx
2023-05-02 10:07:04 +02:00
Linus Torvalds
10de638d8e Merge tag 's390-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:

 - Add support for stackleak feature. Also allow specifying
   architecture-specific stackleak poison function to enable faster
   implementation. On s390, the mvc-based implementation helps decrease
   typical overhead from a factor of 3 to just 25%

 - Convert all assembler files to use SYM* style macros, deprecating the
   ENTRY() macro and other annotations. Select ARCH_USE_SYM_ANNOTATIONS

 - Improve KASLR to also randomize module and special amode31 code base
   load addresses

 - Rework decompressor memory tracking to support memory holes and
   improve error handling

 - Add support for protected virtualization AP binding

 - Add support for set_direct_map() calls

 - Implement set_memory_rox() and noexec module_alloc()

 - Remove obsolete overriding of mem*() functions for KASAN

 - Rework kexec/kdump to avoid using nodat_stack to call purgatory

 - Convert the rest of the s390 code to use flexible-array member
   instead of a zero-length array

 - Clean up uaccess inline asm

 - Enable ARCH_HAS_MEMBARRIER_SYNC_CORE

 - Convert to using CONFIG_FUNCTION_ALIGNMENT and enable
   DEBUG_FORCE_FUNCTION_ALIGN_64B

 - Resolve last_break in userspace fault reports

 - Simplify one-level sysctl registration

 - Clean up branch prediction handling

 - Rework CPU counter facility to retrieve available counter sets just
   once

 - Other various small fixes and improvements all over the code

* tag 's390-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (118 commits)
  s390/stackleak: provide fast __stackleak_poison() implementation
  stackleak: allow to specify arch specific stackleak poison function
  s390: select ARCH_USE_SYM_ANNOTATIONS
  s390/mm: use VM_FLUSH_RESET_PERMS in module_alloc()
  s390: wire up memfd_secret system call
  s390/mm: enable ARCH_HAS_SET_DIRECT_MAP
  s390/mm: use BIT macro to generate SET_MEMORY bit masks
  s390/relocate_kernel: adjust indentation
  s390/relocate_kernel: use SYM* macros instead of ENTRY(), etc.
  s390/entry: use SYM* macros instead of ENTRY(), etc.
  s390/purgatory: use SYM* macros instead of ENTRY(), etc.
  s390/kprobes: use SYM* macros instead of ENTRY(), etc.
  s390/reipl: use SYM* macros instead of ENTRY(), etc.
  s390/head64: use SYM* macros instead of ENTRY(), etc.
  s390/earlypgm: use SYM* macros instead of ENTRY(), etc.
  s390/mcount: use SYM* macros instead of ENTRY(), etc.
  s390/crc32le: use SYM* macros instead of ENTRY(), etc.
  s390/crc32be: use SYM* macros instead of ENTRY(), etc.
  s390/crypto,chacha: use SYM* macros instead of ENTRY(), etc.
  s390/amode31: use SYM* macros instead of ENTRY(), etc.
  ...
2023-04-30 11:43:31 -07:00
Linus Torvalds
d579c468d7 Merge tag 'trace-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:

 - User events are finally ready!

   After lots of collaboration between various parties, we finally
   locked down on a stable interface for user events that can also work
   with user space only tracing.

   This is implemented by telling the kernel (or user space library, but
   that part is user space only and not part of this patch set), where
   the variable is that the application uses to know if something is
   listening to the trace.

   There's also an interface to tell the kernel about these events,
   which will show up in the /sys/kernel/tracing/events/user_events/
   directory, where it can be enabled.

   When it's enabled, the kernel will update the variable, to tell the
   application to start writing to the kernel.

   See https://lwn.net/Articles/927595/

 - Cleaned up the direct trampolines code to simplify arm64 addition of
   direct trampolines.

   Direct trampolines use the ftrace interface but instead of jumping to
   the ftrace trampoline, applications (mostly BPF) can register their
   own trampoline for performance reasons.

 - Some updates to the fprobe infrastructure. fprobes are more efficient
   than kprobes, as it does not need to save all the registers that
   kprobes on ftrace do. More work needs to be done before the fprobes
   will be exposed as dynamic events.

 - More updates to references to the obsolete path of
   /sys/kernel/debug/tracing for the new /sys/kernel/tracing path.

 - Add a seq_buf_do_printk() helper to seq_bufs, to print a large buffer
   line by line instead of all at once.

   There are users in production kernels that have a large data dump
   that originally used printk() directly, but the data dump was larger
   than what printk() allowed as a single print.

   Using seq_buf() to do the printing fixes that.

 - Add /sys/kernel/tracing/touched_functions that shows all functions
   that was every traced by ftrace or a direct trampoline. This is used
   for debugging issues where a traced function could have caused a
   crash by a bpf program or live patching.

 - Add a "fields" option that is similar to "raw" but outputs the fields
   of the events. It's easier to read by humans.

 - Some minor fixes and clean ups.

* tag 'trace-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (41 commits)
  ring-buffer: Sync IRQ works before buffer destruction
  tracing: Add missing spaces in trace_print_hex_seq()
  ring-buffer: Ensure proper resetting of atomic variables in ring_buffer_reset_online_cpus
  recordmcount: Fix memory leaks in the uwrite function
  tracing/user_events: Limit max fault-in attempts
  tracing/user_events: Prevent same address and bit per process
  tracing/user_events: Ensure bit is cleared on unregister
  tracing/user_events: Ensure write index cannot be negative
  seq_buf: Add seq_buf_do_printk() helper
  tracing: Fix print_fields() for __dyn_loc/__rel_loc
  tracing/user_events: Set event filter_type from type
  ring-buffer: Clearly check null ptr returned by rb_set_head_page()
  tracing: Unbreak user events
  tracing/user_events: Use print_format_fields() for trace output
  tracing/user_events: Align structs with tabs for readability
  tracing/user_events: Limit global user_event count
  tracing/user_events: Charge event allocs to cgroups
  tracing/user_events: Update documentation for ABI
  tracing/user_events: Use write ABI in example
  tracing/user_events: Add ABI self-test
  ...
2023-04-28 15:57:53 -07:00
Linus Torvalds
f20730efbd Merge tag 'smp-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull SMP cross-CPU function-call updates from Ingo Molnar:

 - Remove diagnostics and adjust config for CSD lock diagnostics

 - Add a generic IPI-sending tracepoint, as currently there's no easy
   way to instrument IPI origins: it's arch dependent and for some major
   architectures it's not even consistently available.

* tag 'smp-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  trace,smp: Trace all smp_function_call*() invocations
  trace: Add trace_ipi_send_cpu()
  sched, smp: Trace smp callback causing an IPI
  smp: reword smp call IPI comment
  treewide: Trace IPIs sent via smp_send_reschedule()
  irq_work: Trace self-IPIs sent via arch_irq_work_raise()
  smp: Trace IPIs sent via arch_send_call_function_ipi_mask()
  sched, smp: Trace IPIs sent via send_call_function_single_ipi()
  trace: Add trace_ipi_send_cpumask()
  kernel/smp: Make csdlock_debug= resettable
  locking/csd_lock: Remove per-CPU data indirection from CSD lock debugging
  locking/csd_lock: Remove added data from CSD lock debugging
  locking/csd_lock: Add Kconfig option for csd_debug default
2023-04-28 15:03:43 -07:00
Linus Torvalds
33afd4b763 Merge tag 'mm-nonmm-stable-2023-04-27-16-01' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
 "Mainly singleton patches all over the place.

  Series of note are:

   - updates to scripts/gdb from Glenn Washburn

   - kexec cleanups from Bjorn Helgaas"

* tag 'mm-nonmm-stable-2023-04-27-16-01' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (50 commits)
  mailmap: add entries for Paul Mackerras
  libgcc: add forward declarations for generic library routines
  mailmap: add entry for Oleksandr
  ocfs2: reduce ioctl stack usage
  fs/proc: add Kthread flag to /proc/$pid/status
  ia64: fix an addr to taddr in huge_pte_offset()
  checkpatch: introduce proper bindings license check
  epoll: rename global epmutex
  scripts/gdb: add GDB convenience functions $lx_dentry_name() and $lx_i_dentry()
  scripts/gdb: create linux/vfs.py for VFS related GDB helpers
  uapi/linux/const.h: prefer ISO-friendly __typeof__
  delayacct: track delays from IRQ/SOFTIRQ
  scripts/gdb: timerlist: convert int chunks to str
  scripts/gdb: print interrupts
  scripts/gdb: raise error with reduced debugging information
  scripts/gdb: add a Radix Tree Parser
  lib/rbtree: use '+' instead of '|' for setting color.
  proc/stat: remove arch_idle_time()
  checkpatch: check for misuse of the link tags
  checkpatch: allow Closes tags with links
  ...
2023-04-27 19:57:00 -07:00
Linus Torvalds
7fa8a8ee94 Merge tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:

 - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
   switching from a user process to a kernel thread.

 - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj
   Raghav.

 - zsmalloc performance improvements from Sergey Senozhatsky.

 - Yue Zhao has found and fixed some data race issues around the
   alteration of memcg userspace tunables.

 - VFS rationalizations from Christoph Hellwig:
     - removal of most of the callers of write_one_page()
     - make __filemap_get_folio()'s return value more useful

 - Luis Chamberlain has changed tmpfs so it no longer requires swap
   backing. Use `mount -o noswap'.

 - Qi Zheng has made the slab shrinkers operate locklessly, providing
   some scalability benefits.

 - Keith Busch has improved dmapool's performance, making part of its
   operations O(1) rather than O(n).

 - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
   permitting userspace to wr-protect anon memory unpopulated ptes.

 - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive
   rather than exclusive, and has fixed a bunch of errors which were
   caused by its unintuitive meaning.

 - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
   which causes minor faults to install a write-protected pte.

 - Vlastimil Babka has done some maintenance work on vma_merge():
   cleanups to the kernel code and improvements to our userspace test
   harness.

 - Cleanups to do_fault_around() by Lorenzo Stoakes.

 - Mike Rapoport has moved a lot of initialization code out of various
   mm/ files and into mm/mm_init.c.

 - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
   DRM, but DRM doesn't use it any more.

 - Lorenzo has also coverted read_kcore() and vread() to use iterators
   and has thereby removed the use of bounce buffers in some cases.

 - Lorenzo has also contributed further cleanups of vma_merge().

 - Chaitanya Prakash provides some fixes to the mmap selftesting code.

 - Matthew Wilcox changes xfs and afs so they no longer take sleeping
   locks in ->map_page(), a step towards RCUification of pagefaults.

 - Suren Baghdasaryan has improved mmap_lock scalability by switching to
   per-VMA locking.

 - Frederic Weisbecker has reworked the percpu cache draining so that it
   no longer causes latency glitches on cpu isolated workloads.

 - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
   logic.

 - Liu Shixin has changed zswap's initialization so we no longer waste a
   chunk of memory if zswap is not being used.

 - Yosry Ahmed has improved the performance of memcg statistics
   flushing.

 - David Stevens has fixed several issues involving khugepaged,
   userfaultfd and shmem.

 - Christoph Hellwig has provided some cleanup work to zram's IO-related
   code paths.

 - David Hildenbrand has fixed up some issues in the selftest code's
   testing of our pte state changing.

 - Pankaj Raghav has made page_endio() unneeded and has removed it.

 - Peter Xu contributed some rationalizations of the userfaultfd
   selftests.

 - Yosry Ahmed has fixed an issue around memcg's page recalim
   accounting.

 - Chaitanya Prakash has fixed some arm-related issues in the
   selftests/mm code.

 - Longlong Xia has improved the way in which KSM handles hwpoisoned
   pages.

 - Peter Xu fixes a few issues with uffd-wp at fork() time.

 - Stefan Roesch has changed KSM so that it may now be used on a
   per-process and per-cgroup basis.

* tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits)
  mm,unmap: avoid flushing TLB in batch if PTE is inaccessible
  shmem: restrict noswap option to initial user namespace
  mm/khugepaged: fix conflicting mods to collapse_file()
  sparse: remove unnecessary 0 values from rc
  mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area()
  hugetlb: pte_alloc_huge() to replace huge pte_alloc_map()
  maple_tree: fix allocation in mas_sparse_area()
  mm: do not increment pgfault stats when page fault handler retries
  zsmalloc: allow only one active pool compaction context
  selftests/mm: add new selftests for KSM
  mm: add new KSM process and sysfs knobs
  mm: add new api to enable ksm per process
  mm: shrinkers: fix debugfs file permissions
  mm: don't check VMA write permissions if the PTE/PMD indicates write permissions
  migrate_pages_batch: fix statistics for longterm pin retry
  userfaultfd: use helper function range_in_vma()
  lib/show_mem.c: use for_each_populated_zone() simplify code
  mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list()
  fs/buffer: convert create_page_buffers to folio_create_buffers
  fs/buffer: add folio_create_empty_buffers helper
  ...
2023-04-27 19:42:02 -07:00
Linus Torvalds
8ccd54fe45 Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
 "virtio,vhost,vdpa: features, fixes, and cleanups:

   - reduction in interrupt rate in virtio

   - perf improvement for VDUSE

   - scalability for vhost-scsi

   - non power of 2 ring support for packed rings

   - better management for mlx5 vdpa

   - suspend for snet

   - VIRTIO_F_NOTIFICATION_DATA

   - shared backend with vdpa-sim-blk

   - user VA support in vdpa-sim

   - better struct packing for virtio

  and fixes, cleanups all over the place"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (52 commits)
  vhost_vdpa: fix unmap process in no-batch mode
  MAINTAINERS: make me a reviewer of VIRTIO CORE AND NET DRIVERS
  tools/virtio: fix build caused by virtio_ring changes
  virtio_ring: add a struct device forward declaration
  vdpa_sim_blk: support shared backend
  vdpa_sim: move buffer allocation in the devices
  vdpa/snet: use likely/unlikely macros in hot functions
  vdpa/snet: implement kick_vq_with_data callback
  virtio-vdpa: add VIRTIO_F_NOTIFICATION_DATA feature support
  virtio: add VIRTIO_F_NOTIFICATION_DATA feature support
  vdpa/snet: support the suspend vDPA callback
  vdpa/snet: support getting and setting VQ state
  MAINTAINERS: add vringh.h to Virtio Core and Net Drivers
  vringh: address kdoc warnings
  vdpa: address kdoc warnings
  virtio_ring: don't update event idx on get_buf
  vdpa_sim: add support for user VA
  vdpa_sim: replace the spinlock with a mutex to protect the state
  vdpa_sim: use kthread worker
  vdpa_sim: make devices agnostic for work management
  ...
2023-04-27 17:05:34 -07:00