Commit Graph

1308187 Commits

Author SHA1 Message Date
NeilBrown
b0d87dbd8b NFSD: Refactor nfsd_setuser_and_check_port()
There are several places where __fh_verify unconditionally dereferences
rqstp to check that the connection is suitably secure.  They look at
rqstp->rq_xprt which is not meaningful in the target use case of
"localio" NFS in which the client talks directly to the local server.

Prepare these to always succeed when rqstp is NULL.

Signed-off-by: NeilBrown <neilb@suse.de>
Co-developed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:30 -04:00
NeilBrown
0a183f24a7 NFSD: Handle @rqstp == NULL in check_nfsd_access()
LOCALIO-initiated open operations are not running in an nfsd thread
and thus do not have an associated svc_rqst context.

Signed-off-by: NeilBrown <neilb@suse.de>
Co-developed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:29 -04:00
Mike Snitzer
1545e488b1 nfs: factor out {encode,decode}_opaque_fixed to nfs_xdr.h
Eliminates duplicate functions in various files to allow for
additional callers.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:29 -04:00
Mike Snitzer
1fcb16674e nfs_common: factor out nfs4_errtbl and nfs4_stat_to_errno
Common nfs4_stat_to_errno() is used by fs/nfs/nfs4xdr.c and will be
used by fs/nfs/localio.c

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:29 -04:00
Mike Snitzer
4806ded4c1 nfs_common: factor out nfs_errtbl and nfs_stat_to_errno
Common nfs_stat_to_errno() is used by both fs/nfs/nfs2xdr.c and
fs/nfs/nfs3xdr.c

Will also be used by fs/nfsd/localio.c

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:29 -04:00
Dan Aloni
dfb07e990a nfs: add 'noalignwrite' option for lock-less 'lost writes' prevention
There are some applications that write to predefined non-overlapping
file offsets from multiple clients and therefore don't need to rely on
file locking. However, if these applications want non-aligned offsets
and sizes they need to either use locks or risk data corruption, as the
NFS client defaults to extending writes to whole pages.

This commit adds a new mount option `noalignwrite`, which allows to turn
that off and avoid the need of locking, as long as these applications
don't overlap on offsets.

Signed-off-by: Dan Aloni <dan.aloni@vastdata.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:13 -04:00
Li Lingfeng
6d26c5e4d8 nfs: fix the comment of nfs_get_root
The comment for nfs_get_root() needs to be updated as it would also be
used by NFS4 as follows:
@x[
    nfs_get_root+1
    nfs_get_tree_common+1819
    nfs_get_tree+2594
    vfs_get_tree+73
    fc_mount+23
    do_nfs4_mount+498
    nfs4_try_get_tree+134
    nfs_get_tree+2562
    vfs_get_tree+73
    path_mount+2776
    do_mount+226
    __se_sys_mount+343
    __x64_sys_mount+106
    do_syscall_64+69
    entry_SYSCALL_64_after_hwframe+97
, mount.nfs4]: 1

Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:13 -04:00
Roi Azarzar
615e693b14 NFSv4.2: Fix detection of "Proxying of Times" server support
According to draft-ietf-nfsv4-delstid-07:
   If a server informs the client via the fattr4_open_arguments
   attribute that it supports
   OPEN_ARGS_SHARE_ACCESS_WANT_DELEG_TIMESTAMPS and it returns a valid
   delegation stateid for an OPEN operation which sets the
   OPEN4_SHARE_ACCESS_WANT_DELEG_TIMESTAMPS flag, then it MUST query the
   client via a CB_GETATTR for the fattr4_time_deleg_access (see
   Section 5.2) attribute and fattr4_time_deleg_modify attribute (see
   Section 5.2).

Thus, we should look that the server supports proxying of times via
OPEN4_SHARE_ACCESS_WANT_DELEG_TIMESTAMPS.

We want to be extra pedantic and continue to check that FATTR4_TIME_DELEG_ACCESS
and FATTR4_TIME_DELEG_MODIFY are set. The server needs to expose both for the
client to correctly detect "Proxying of Times" support.

Signed-off-by: Roi Azarzar <roi.azarzar@vastdata.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Fixes: dcb3c20f74 ("NFSv4: Add a capability for delegated attributes")
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:13 -04:00
Trond Myklebust
af94dca79b NFSv4: Fail mounts if the lease setup times out
If the server is down when the client is trying to mount, so that the
calls to exchange_id or create_session fail, then we should allow the
mount system call to fail rather than hang and block other mount/umount
calls.

Reported-by: Oleksandr Tymoshenko <ovt@google.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:13 -04:00
Zhaoyang Huang
03e02b9417 fs: nfs: fix missing refcnt by replacing folio_set_private by folio_attach_private
This patch is inspired by a code review of fs codes which aims at
folio's extra refcnt that could introduce unwanted behavious when
judging refcnt, such as[1].That is, the folio passed to
mapping_evict_folio carries the refcnts from find_lock_entries,
page_cache, corresponding to PTEs and folio's private if has. However,
current code doesn't take the refcnt for folio's private which could
have mapping_evict_folio miss the one to only PTE and lead to
call filemap_release_folio wrongly.

[1]
long mapping_evict_folio(struct address_space *mapping, struct folio *folio)
{
...
//current code will misjudge here if there is one pte on the folio which
is be deemed as the one as folio's private
        if (folio_ref_count(folio) >
                        folio_nr_pages(folio) + folio_has_private(folio) + 1)
                return 0;
        if (!filemap_release_folio(folio, 0))
                return 0;

        return remove_mapping(mapping, folio);
}

Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:13 -04:00
Gaosheng Cui
40c80881eb nfs: Remove obsoleted declaration for nfs_read_prepare
The nfs_read_prepare() have been removed since
commit a4cdda5911 ("NFS: Create a common pgio_rpc_prepare function"),
and now it is useless, so remove it.

Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:13 -04:00
Hongbo Li
64a3ab9967 net/sunrpc: make use of the helper macro LIST_HEAD()
list_head can be initialized automatically with LIST_HEAD()
instead of calling INIT_LIST_HEAD(). Here we can simplify
the code.

Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:13 -04:00
Siddh Raman Pant
2e001972e8 SUNRPC: clnt.c: Remove misleading comment
destroy_wait doesn't store all RPC clients. There was a list named
"all_clients" above it, which got moved to struct sunrpc_net in 2012,
but the comment was never removed.

Fixes: 70abc49b4f ("SUNRPC: make SUNPRC clients list per network namespace context")
Signed-off-by: Siddh Raman Pant <siddh.raman.pant@oracle.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:13 -04:00
Stephen Brennan
0b108e8379 SUNRPC: convert RPC_TASK_* constants to enum
The RPC_TASK_* constants are defined as macros, which means that most
kernel builds will not contain their definitions in the debuginfo.
However, it's quite useful for debuggers to be able to view the task
state constant and interpret it correctly. Conversion to an enum will
ensure the constants are present in debuginfo and can be interpreted by
debuggers without needing to hard-code them and track their changes.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:13 -04:00
Kunwu Chan
9090a7f786 SUNRPC: Fix -Wformat-truncation warning
Increase size of the servername array to avoid truncated output warning.

net/sunrpc/clnt.c:582:75: error:‘%s’ directive output may be truncated
writing up to 107 bytes into a region of size 48
[-Werror=format-truncation=]
  582 |                   snprintf(servername, sizeof(servername), "%s",
      |                                                             ^~

net/sunrpc/clnt.c:582:33: note:‘snprintf’ output
between 1 and 108 bytes into a destination of size 48
  582 |                     snprintf(servername, sizeof(servername), "%s",
      |                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  583 |                                          sun->sun_path);

Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Suggested-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:13 -04:00
Thorsten Blum
e343678ee9 nfs: Remove unnecessary NULL check before kfree()
Since kfree() already checks if its argument is NULL, an additional
check before calling kfree() is unnecessary and can be removed.

Remove it and thus also the following Coccinelle/coccicheck warning
reported by ifnullfree.cocci:

  WARNING: NULL check before some freeing functions is not needed

Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:12 -04:00
Thorsten Blum
bb8e4ce500 nfs: Annotate struct nfs_cache_array with __counted_by()
Add the __counted_by compiler attribute to the flexible array member
array to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and
CONFIG_FORTIFY_SOURCE.

Increment size before adding a new struct to the array.

Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:12 -04:00
NeilBrown
d98f722725 nfs: simplify and guarantee owner uniqueness.
I have evidence of an Linux NFS client getting NFS4ERR_BAD_SEQID to a
v4.0 LOCK request to a Linux server (which had fixed the problem with
RELEASE_LOCKOWNER bug fixed).
The LOCK request presented a "new" lock owner so there are two seq ids
in the request: that for the open file, and that for the new lock.
Given the context I am confident that the new lock owner was reported to
have the wrong seqid.  As lock owner identifiers are reused, the server
must still have a lock owner active which the client thinks is no longer
active.

I wasn't able to determine a root-cause but the simplest fix seems to be
to ensure lock owners are always unique much as open owners are (thanks
to a time stamp).  The easiest way to ensure uniqueness is with a 64bit
counter for each server.  That will never cycle (if updated once a
nanosecond the last 584 years.  A single NFS server would not handle
open/lock requests nearly that fast, and a Linux node is unlikely to
have an uptime approaching that).

This patch removes the 2 ida and instead uses a per-server
atomic64_t to provide uniqueness.

Note that the lock owner already encodes the id as 64 bits even though
it is a 32bit value.  So changing to a 64bit value does not change the
encoding of the lock owner.  The open owner encoding is now 4 bytes
larger.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:12 -04:00
Li Lingfeng
8f6a7c9467 nfs: fix memory leak in error path of nfs4_do_reclaim
Commit c77e22834a ("NFSv4: Fix a potential sleep while atomic in
nfs4_do_reclaim()") separate out the freeing of the state owners from
nfs4_purge_state_owners() and finish it outside the rcu lock.
However, the error path is omitted. As a result, the state owners in
"freeme" will not be released.
Fix it by adding freeing in the error path.

Fixes: c77e22834a ("NFSv4: Fix a potential sleep while atomic in nfs4_do_reclaim()")
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Cc: stable@vger.kernel.org # v5.3+
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:12 -04:00
Linus Torvalds
18ba603446 Merge tag 'nfsd-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
 "Notable features of this release include:

   - Pre-requisites for automatically determining the RPC server thread
     count

   - Clean-up and preparation for supporting LOCALIO, which will be
     merged via the NFS client tree

   - Enhancements and fixes to NFSv4.2 COPY offload

   - A new Python-based tool for generating kernel SunRPC XDR encoding
     and decoding functions, added as an aid for prototyping features in
     protocols based on the Linux kernel's SunRPC implementation

  As always I am grateful to the NFSD contributors, reviewers, testers,
  and bug reporters who participated during this cycle"

* tag 'nfsd-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (57 commits)
  xdrgen: Prevent reordering of encoder and decoder functions
  xdrgen: typedefs should use the built-in string and opaque functions
  xdrgen: Fix return code checking in built-in XDR decoders
  tools: Add xdrgen
  nfsd: fix delegation_blocked() to block correctly for at least 30 seconds
  nfsd: fix initial getattr on write delegation
  nfsd: untangle code in nfsd4_deleg_getattr_conflict()
  nfsd: enforce upper limit for namelen in __cld_pipe_inprogress_downcall()
  nfsd: return -EINVAL when namelen is 0
  NFSD: Wrap async copy operations with trace points
  NFSD: Clean up extra whitespace in trace_nfsd_copy_done
  NFSD: Record the callback stateid in copy tracepoints
  NFSD: Display copy stateids with conventional print formatting
  NFSD: Limit the number of concurrent async COPY operations
  NFSD: Async COPY result needs to return a write verifier
  nfsd: avoid races with wake_up_var()
  nfsd: use clear_and_wake_up_bit()
  sunrpc: xprtrdma: Use ERR_CAST() to return
  NFSD: Annotate struct pnfs_block_deviceaddr with __counted_by()
  nfsd: call cache_put if xdr_reserve_space returns NULL
  ...
2024-09-23 12:01:45 -07:00
Anna Schumaker
8c04a6d6e0 Merge tag 'nfsd-6.12' into linux-next-with-localio
NFSD 6.12 Release Notes

Notable features of this release include:

- Pre-requisites for automatically determining the RPC server thread
  count
- Clean-up and preparation for supporting LOCALIO, which will be
  merged via the NFS client tree
- Enhancements and fixes to NFSv4.2 COPY offload
- A new Python-based tool for generating kernel SunRPC XDR encoding
  and decoding functions, added as an aid for prototyping features
  in protocols based on the Linux kernel's SunRPC implementation.

As always I am grateful to the NFSD contributors, reviewers,
testers, and bug reporters who participated during this cycle.
2024-09-23 15:00:07 -04:00
Linus Torvalds
721068dec4 Merge tag 'gfs2-v6.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 update from Andreas Gruenbacher:

 - Convert the writepage address space operation to writepages (Matthew
   Wilcox)

 - A syzkaller fix (by Julian Sun) and a minor cleanup (Andreas
   Gruenbacher)

* tag 'gfs2-v6.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Remove gfs2_aspace_writepage()
  gfs2: Remove gfs2_jdata_writepage()
  gfs2: Remove __gfs2_writepage()
  gfs2: Add gfs2_aspace_writepages()
  gfs2: fix double destroy_workqueue error
  gfs2: Minor gfs2_glock_cb cleanup
2024-09-23 11:55:17 -07:00
Linus Torvalds
a1fb2fcbb6 Merge tag 'for-6.12-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:

 - fix dangling pointer to rb-tree of defragmented inodes after cleanup

 - a followup fix to handle concurrent lseek on the same fd that could
   leak memory under some conditions

 - fix wrong root id reported in tree checker when verifying dref

* tag 'for-6.12-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix use-after-free on rbtree that tracks inodes for auto defrag
  btrfs: tree-checker: fix the wrong output of data backref objectid
  btrfs: fix race setting file private on concurrent lseek using same fd
2024-09-23 11:49:02 -07:00
Masahiro Yamada
fa911d1f37 kbuild: doc: replace "gcc" in external module description
Avoid "gcc" since it is not the only compiler supported by Kbuild.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <n.schier@avm.de>
2024-09-24 03:07:21 +09:00
Masahiro Yamada
2eb5d7f242 kbuild: doc: describe the -C option precisely for external module builds
Building external modules is typically done using this command:

  $ make -C <KERNEL_DIR> M=<EXTMOD_DIR>

Here, <KERNEL_DIR> refers to the output directory where the kernel was
built, not the kernel source directory.

When the kernel is built in the source tree, there is no ambiguity, as
the output directory and the source directory are the same.

If the kernel was built in a separate build directory, <KERNEL_DIR>
should be the kernel output directory. Otherwise, Kbuild cannot locate
necessary build artifacts. This has been the method for building
external modules against a pre-built kernel in a separate directory
for over 20 years. [1]

If you pass the kernel source directory to the -C option, you must also
specify the kernel build directory using the O= option. This approach
works as well, though it results in a slightly longer command:

  $ make -C <KERNEL_SOURCE_DIR> O=<KERNEL_BUILD_DIR> M=<EXTMOD_DIR>

Some people mistakenly believe that O= should specify a build directory
for external modules when used together with M=. This commit adds more
clarification to Documentation/kbuild/kbuild.rst.

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=e321b2ec2eb2993b3d0116e5163c78ad923e3c54

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <n.schier@avm.de>
2024-09-24 03:07:21 +09:00
Masahiro Yamada
e873fb9482 kbuild: doc: remove the description about shipped files
The use of shipped files is discouraged in the upstream kernel these
days. [1]

Downstream Makefiles have the freedom to use shipped files or other
options to handle binaries, but this should not be advertised in the
upstream document.

[1]: https://lore.kernel.org/all/CAHk-=wgSEi_ZrHdqr=20xv+d6dr5G895CbOAi8ok+7-CQUN=fQ@mail.gmail.com/

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <n.schier@avm.de>
2024-09-24 03:07:21 +09:00
Masahiro Yamada
803d505952 kbuild: doc: drop section numbering, use references in modules.rst
Do similar to commit 1a4c1c9df7 ("docs/kbuild/makefiles: drop section
numbering, use references").

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-09-24 03:07:21 +09:00
Masahiro Yamada
7813cd68ea kbuild: doc: throw out the local table of contents in modules.rst
Do similar to commit 5e8f0ba38a ("docs/kbuild/makefiles: throw out the
local table of contents").

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-09-24 03:07:21 +09:00
Masahiro Yamada
a866eda43f kbuild: doc: remove outdated description of the limitation on -I usage
Kbuild used to manipulate header search paths, enforcing the odd
limitation of "no space after -I".

Commit cdd750bfb1 ("kbuild: remove 'addtree' and 'flags' magic for
header search paths") stopped doing that. This limitation no longer
exists. Instead, you need to accurately specify the header search path.
(In this case, $(src)/include)

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <n.schier@avm.de>
2024-09-24 03:07:21 +09:00
Masahiro Yamada
1a59bd3ca5 kbuild: doc: remove description about grepping CONFIG options
This description was added 20 years ago [1]. It does not convey any
useful information except for a feeling of nostalgia.

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=65e433436b5794ae056d22ddba60fe9194bba007

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <n.schier@avm.de>
2024-09-24 03:07:20 +09:00
Masahiro Yamada
062a1481cf kbuild: doc: update the description about Kbuild/Makefile split
The phrase "In newer versions of the kernel" was added 14 years ago, by
commit efdf02cf06 ("Documentation/kbuild: major edit of modules.txt
sections 1-4"). This feature is no longer new, so remove it and update
the paragraph.

Example 3 was written 20 years ago [1]. There is no need to note about
backward compatibility with such an old build system. Remove Example 3
entirely.

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=65e433436b5794ae056d22ddba60fe9194bba007

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <n.schier@avm.de>
2024-09-24 03:07:20 +09:00
Masahiro Yamada
fc1c79be45 kbuild: remove unnecessary export of RUST_LIB_SRC
If RUST_LIB_SRC is defined in the top-level Makefile (via an environment
variable or command line), it is already exported.

The only situation where it is defined but not exported is when the
top-level Makefile is wrapped by another Makefile (e.g., GNUmakefile).
I cannot think of any other use cases.

I know some people use this tip to define custom variables. However,
even in that case, you can export it directly in the wrapper Makefile.

Example GNUmakefile:

    export RUST_LIB_SRC = /path/to/your/sysroot/lib/rustlib/src/rust/library
    include Makefile

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
2024-09-24 03:06:52 +09:00
Linus Torvalds
d0359e4ca0 Merge tag 'fs_for_v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull quota and isofs updates from Jan Kara:
 "A few small cleanups in quota and isofs"

* tag 'fs_for_v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  isofs: Annotate struct SL_component with __counted_by()
  quota: remove unnecessary error code translation in dquot_quota_enable
  quota: remove redundant return at end of void function
  quota: remove unneeded return value of register_quota_format
  quota: avoid missing put_quota_format when DQUOT_SUSPENDED is passed
2024-09-23 10:49:28 -07:00
Linus Torvalds
b3f391fddf Merge tag 'bcachefs-2024-09-21' of git://evilpiepirate.org/bcachefs
Pull bcachefs updates from Kent Overstreet:

 - rcu_pending, btree key cache rework: this solves lock contenting in
   the key cache, eliminating the biggest source of the srcu lock hold
   time warnings, and drastically improving performance on some metadata
   heavy workloads - on multithreaded creates we're now 3-4x faster than
   xfs.

 - We're now using an rhashtable instead of the system inode hash table;
   this is another significant performance improvement on multithreaded
   metadata workloads, eliminating more lock contention.

 - for_each_btree_key_in_subvolume_upto(): new helper for iterating over
   keys within a specific subvolume, eliminating a lot of open coded
   "subvolume_get_snapshot()" and also fixing another source of srcu
   lock time warnings, by running each loop iteration in its own
   transaction (as the existing for_each_btree_key() does).

 - More work on btree_trans locking asserts; we now assert that we don't
   hold btree node locks when trans->locked is false, which is important
   because we don't use lockdep for tracking individual btree node
   locks.

 - Some cleanups and improvements in the bset.c btree node lookup code,
   from Alan.

 - Rework of btree node pinning, which we use in backpointers fsck. The
   old hacky implementation, where the shrinker just skipped over nodes
   in the pinned range, was causing OOMs; instead we now use another
   shrinker with a much higher seeks number for pinned nodes.

 - Rebalance now uses BCH_WRITE_ONLY_SPECIFIED_DEVS; this fixes an issue
   where rebalance would sometimes fall back to allocating from the full
   filesystem, which is not what we want when it's trying to move data
   to a specific target.

 - Use __GFP_ACCOUNT, GFP_RECLAIMABLE for btree node, key cache
   allocations.

 - Idmap mounts are now supported (Hongbo Li)

 - Rename whiteouts are now supported (Hongbo Li)

 - Erasure coding can now handle devices being marked as failed, or
   forcibly removed. We still need the evacuate path for erasure coding,
   but it's getting very close to ready for people to start using.

* tag 'bcachefs-2024-09-21' of git://evilpiepirate.org/bcachefs: (99 commits)
  bcachefs: return err ptr instead of null in read sb clean
  bcachefs: Remove duplicated include in backpointers.c
  bcachefs: Don't drop devices with stripe pointers
  bcachefs: bch2_ec_stripe_head_get() now checks for change in rw devices
  bcachefs: bch_fs.rw_devs_change_count
  bcachefs: bch2_dev_remove_stripes()
  bcachefs: bch2_trigger_ptr() calculates sectors even when no device
  bcachefs: improve error messages in bch2_ec_read_extent()
  bcachefs: improve error message on too few devices for ec
  bcachefs: improve bch2_new_stripe_to_text()
  bcachefs: ec_stripe_head.nr_created
  bcachefs: bch_stripe.disk_label
  bcachefs: stripe_to_mem()
  bcachefs: EIO errcode cleanup
  bcachefs: Rework btree node pinning
  bcachefs: split up btree cache counters for live, freeable
  bcachefs: btree cache counters should be size_t
  bcachefs: Don't count "skipped access bit" as touched in btree cache scan
  bcachefs: Failed devices no longer require mounting in degraded mode
  bcachefs: bch2_dev_rcu_noerror()
  ...
2024-09-23 10:05:41 -07:00
Andrea Righi
431844b65f sched_ext: Provide a sysfs enable_seq counter
As discussed during the distro-centric session within the sched_ext
Microconference at LPC 2024, introduce a sequence counter that is
incremented every time a BPF scheduler is loaded.

This feature can help distributions in diagnosing potential performance
regressions by identifying systems where users are running (or have ran)
custom BPF schedulers.

Example:

 arighi@virtme-ng~> cat /sys/kernel/sched_ext/enable_seq
 0
 arighi@virtme-ng~> sudo scx_simple
 local=1 global=0
 ^CEXIT: unregistered from user space
 arighi@virtme-ng~> cat /sys/kernel/sched_ext/enable_seq
 1

In this way user-space tools (such as Ubuntu's apport and similar) are
able to gather and include this information in bug reports.

Cc: Giovanni Gherdovich <giovanni.gherdovich@suse.com>
Cc: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Cc: Marcelo Henrique Cerri <marcelo.cerri@canonical.com>
Cc: Phil Auld <pauld@redhat.com>
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-09-23 06:53:02 -10:00
Tejun Heo
62d3726d4c sched_ext: Fix build when !CONFIG_STACKTRACE
a2f4b16e73 ("sched_ext: Build fix on !CONFIG_STACKTRACE[_SUPPORT]") tried
fixing build when !CONFIG_STACKTRACE but didn't so fully. Also put
stack_trace_print() and stack_trace_save() inside CONFIG_STACKTRACE to fix
build when !CONFIG_STACKTRACE.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202409220642.fDW2OmWc-lkp@intel.com/
2024-09-23 06:45:22 -10:00
Linus Torvalds
f8ffbc365f Merge tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull 'struct fd' updates from Al Viro:
 "Just the 'struct fd' layout change, with conversion to accessor
  helpers"

* tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  add struct fd constructors, get rid of __to_fd()
  struct fd: representation change
  introduce fd_file(), convert all accessors to it.
2024-09-23 09:35:36 -07:00
Linus Torvalds
f8eb5bd9a8 mm: fix build on 32-bit targets without MAX_PHYSMEM_BITS
The merge resolution to deal with the conflict between commits
ea72ce5da2 ("x86/kaslr: Expose and use the end of the physical memory
address space") and 99185c10d5 ("resource, kunit: add test case for
region_intersects()") ended up being broken in configurations didn't
define a MAX_PHYSMEM_BITS and that had a 32-bit 'phys_addr_t'.

The fallback to using all bits set (ie "(-1ULL)") ended up causing a
build error:

    kernel/resource.c: In function ‘gfr_start’:
    include/linux/minmax.h:93:30: error: conversion from ‘long long unsigned int’ to ‘resource_size_t’ {aka ‘unsigned int’} changes value from ‘18446744073709551615’ to ‘4294967295’ [-Werror=overflow]

this was reported by Geert for m68k, but he points out that it happens
on other 32-bit architectures too, eg mips, xtensa, parisc, and powerpc.

Limiting 'PHYSMEM_END' to a 'phys_addr_t' (which is the same as
'resource_size_t') fixes the build, but Geert points out that it will
then cause a silent overflow in mm/sparse.c:

	unsigned long max_sparsemem_pfn = (PHYSMEM_END + 1) >> PAGE_SHIFT;

so we actually do want PHYSMEM_END to be defined a 64-bit type - just
not all ones, and not larger than 'phys_addr_t'.

The proper fix is probably to not have some kind of default fallback at
all, but just make sure every architecture has a valid MAX_PHYSMEM_BITS.
But in the meantime, this just applies the rule that PHYSMEM_END is the
largest value that fits in a 'phys_addr_t', but does not have the high
bit set in 64 bits.

Ugly, ugly.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-09-23 08:58:31 -07:00
Jaegeuk Kim
ae87b9c2dc f2fs: allow F2FS_IPU_NOCACHE for pinned file
This patch allows f2fs to submit bios of in-place writes on pinned file.

Reviewed-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-23 15:42:13 +00:00
Pat Somaru
edf1c586e9 sched, sched_ext: Disable SM_IDLE/rq empty path when scx_enabled()
Disable the rq empty path when scx is enabled. SCX must consult the BPF
scheduler (via the dispatch path in balance) to determine if rq is empty.

This fixes stalls when scx is enabled.

Signed-off-by: Pat Somaru <patso@likewhatevs.io>
Fixes: 3dcac251b0 ("sched/core: Introduce SM_IDLE and an idle re-entry fast-path in __schedule()")
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-09-23 05:40:53 -10:00
Yu Liao
7ebd84d627 sched: Put task_group::idle under CONFIG_GROUP_SCHED_WEIGHT
When build with CONFIG_GROUP_SCHED_WEIGHT && !CONFIG_FAIR_GROUP_SCHED,
the idle member is not defined:

kernel/sched/ext.c:3701:16: error: 'struct task_group' has no member named 'idle'
  3701 |         if (!tg->idle)
       |                ^~

Fix this by putting 'idle' under new CONFIG_GROUP_SCHED_WEIGHT.

tj: Move idle field upward to avoid breaking up CONFIG_FAIR_GROUP_SCHED block.

Fixes: e179e80c5d ("sched: Introduce CONFIG_GROUP_SCHED_WEIGHT")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202409220859.UiCAoFOW-lkp@intel.com/
Signed-off-by: Yu Liao <liaoyu15@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-09-23 05:24:12 -10:00
Chen Ni
91dba615c3 mfd: atc260x: Convert a bunch of commas to semicolons
Replace a comma between expression statements by a semicolon.

Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Reviewed-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Link: https://lore.kernel.org/r/20240902085019.4111445-1-nichen@iscas.ac.cn
Signed-off-by: Lee Jones <lee@kernel.org>
2024-09-23 16:20:55 +01:00
Mukesh Ojha
abd4107a1d dt-bindings: mfd: qcom,tcsr: Add compatible for sa8775p
Document the compatible for sa8775p SoC.

Reviewed-by: Elliot Berman <quic_eberman@quicinc.com>
Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://lore.kernel.org/r/20240830133908.2246139-1-quic_mojha@quicinc.com
Signed-off-by: Lee Jones <lee@kernel.org>
2024-09-23 16:20:55 +01:00
Ilpo Järvinen
db6a186505 mfd: intel-lpss: Add Intel Panther Lake LPSS PCI IDs
Add Intel Panther Lake-H/P PCI IDs.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20240829095719.1557-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Lee Jones <lee@kernel.org>
2024-09-23 16:20:54 +01:00
Ilpo Järvinen
6112597f5b mfd: intel-lpss: Add Intel Arrow Lake-H LPSS PCI IDs
Add Intel Arrow Lake-H PCI IDs.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20240829095719.1557-2-ilpo.jarvinen@linux.intel.com
Signed-off-by: Lee Jones <lee@kernel.org>
2024-09-23 16:20:54 +01:00
Detlev Casanova
33d05f2abf dt-bindings: mfd: syscon: Add rk3576 QoS register compatible
Document rk3576 compatible for QoS registers.

Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/01020191998a2fd4-4d7b091c-9c4c-4067-b8d9-fe7482074d6d-000000@eu-west-1.amazonses.com
Signed-off-by: Lee Jones <lee@kernel.org>
2024-09-23 16:20:54 +01:00
Haibo Chen
9ca84b355d dt-bindings: mfd: adp5585: Add parsing of hogs
Allow parsing GPIO controller children nodes with GPIO hogs.

Signed-off-by: Haibo Chen <haibo.chen@nxp.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240828030405.2851611-1-haibo.chen@nxp.com
Signed-off-by: Lee Jones <lee@kernel.org>
2024-09-23 16:20:54 +01:00
Rob Herring (Arm)
04bb1800e6 mfd: tc3589x: Drop vendorless compatible string from match table
There's no need to list "tc3589x" in the DT match table. The I2C core
will strip any vendor prefix and match against the i2c_device_id table
which has an "tc3589x" entry.

Probably "tc3589x" and TC3589X_UNKNOWN could be removed altogether.
Use of that compatible was only on some STE platforms and was dropped
in 2013. There were ABI breaks in 2014 claiming no DTs in the wild. See
commit 1637d480f8 ("pinctrl: nomadik: force-convert to generic config
bindings").

Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://lore.kernel.org/r/20240826191300.1410222-1-robh@kernel.org
Signed-off-by: Lee Jones <lee@kernel.org>
2024-09-23 16:20:54 +01:00
Jinjie Ruan
015d188002 mfd: qcom-spmi-pmic: Use for_each_child_of_node_scoped()
Avoids the need for manual cleanup of_node_put() in early exits
from the loop.

Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Link: https://lore.kernel.org/r/20240826092734.2899562-3-ruanjinjie@huawei.com
Signed-off-by: Lee Jones <lee@kernel.org>
2024-09-23 16:20:54 +01:00
Jinjie Ruan
0db28e963a mfd: max77620: Use for_each_child_of_node_scoped()
Avoids the need for manual cleanup of_node_put() in early exits
from the loop.

Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Link: https://lore.kernel.org/r/20240826092734.2899562-2-ruanjinjie@huawei.com
Signed-off-by: Lee Jones <lee@kernel.org>
2024-09-23 16:20:54 +01:00