linux

mirror of https://github.com/raspberrypi/linux.git synced 2025-12-06 01:49:46 +00:00

Author	SHA1	Message	Date
NeilBrown	1cff14b7fc	nfsd: ensure SEQUENCE replay sends a valid reply. nfsd4_enc_sequence_replay() uses nfsd4_encode_operation() to encode a new SEQUENCE reply when replaying a request from the slot cache - only ops after the SEQUENCE are replayed from the cache in ->sl_data. However it does this in nfsd4_replay_cache_entry() which is called before nfsd4_sequence() has filled in reply fields. This means that in the replayed SEQUENCE reply: maxslots will be whatever the client sent target_maxslots will be -1 (assuming init to zero, and nfsd4_encode_sequence() subtracts 1) status_flags will be zero The incorrect maxslots value, in particular, can cause the client to think the slot table has been reduced in size so it can discard its knowledge of current sequence number of the later slots, though the server has not discarded those slots. When the client later wants to use a later slot, it can get NFS4ERR_SEQ_MISORDERED from the server. This patch moves the setup of the reply into a new helper function and call it before nfsd4_replay_cache_entry() is called. Only one of the updated fields was used after this point - maxslots. So the nfsd4_sequence struct has been extended to have separate maxslots for the request and the response. Reported-by: Olga Kornievskaia <okorniev@redhat.com> Closes: https://lore.kernel.org/linux-nfs/20251010194449.10281-1-okorniev@redhat.com/ Tested-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: NeilBrown <neil@brown.name> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-11-10 09:31:52 -05:00
Chuck Lever	c96573c0d7	NFSD: Never cache a COMPOUND when the SEQUENCE operation fails RFC 8881 normatively mandates that operations where the initial SEQUENCE operation in a compound fails must not modify the slot's replay cache. nfsd4_cache_this() doesn't prevent such caching. So when SEQUENCE fails, cstate.data_offset is not set, allowing read_bytes_from_xdr_buf() to access uninitialized memory. Reported-by: rtm@csail.mit.edu Closes: https://lore.kernel.org/linux-nfs/c3628d57-94ae-48cf-8c9e-49087a28cec9@oracle.com/T/#t Fixes: `468de9e54a` ("nfsd41: expand solo sequence check") Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-11-10 09:31:52 -05:00
Chuck Lever	ff8141e49c	NFSD: Skip close replay processing if XDR encoding fails The replay logic added by commit `9411b1d4c7` ("nfsd4: cleanup handling of nfsv4.0 closed stateid's") cannot be done if encoding failed due to a short send buffer; there's no guarantee that the operation encoder has actually encoded the data that is being copied to the replay cache. Reported-by: rtm@csail.mit.edu Closes: https://lore.kernel.org/linux-nfs/c3628d57-94ae-48cf-8c9e-49087a28cec9@oracle.com/T/#t Fixes: `9411b1d4c7` ("nfsd4: cleanup handling of nfsv4.0 closed stateid's") Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-11-10 09:31:52 -05:00
Olga Kornievskaia	4aa17144d5	NFSD: free copynotify stateid in nfs4_free_ol_stateid() Typically copynotify stateid is freed either when parent's stateid is being close/freed or in nfsd4_laundromat if the stateid hasn't been used in a lease period. However, in case when the server got an OPEN (which created a parent stateid), followed by a COPY_NOTIFY using that stateid, followed by a client reboot. New client instance while doing CREATE_SESSION would force expire previous state of this client. It leads to the open state being freed thru release_openowner-> nfs4_free_ol_stateid() and it finds that it still has copynotify stateid associated with it. We currently print a warning and is triggerred WARNING: CPU: 1 PID: 8858 at fs/nfsd/nfs4state.c:1550 nfs4_free_ol_stateid+0xb0/0x100 [nfsd] This patch, instead, frees the associated copynotify stateid here. If the parent stateid is freed (without freeing the copynotify stateids associated with it), it leads to the list corruption when laundromat ends up freeing the copynotify state later. [ 1626.839430] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP [ 1626.842828] Modules linked in: nfnetlink_queue nfnetlink_log bluetooth cfg80211 rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd nfs_acl lockd grace nfs_localio ext4 crc16 mbcache jbd2 overlay uinput snd_seq_dummy snd_hrtimer qrtr rfkill vfat fat uvcvideo snd_hda_codec_generic videobuf2_vmalloc videobuf2_memops snd_hda_intel uvc snd_intel_dspcfg videobuf2_v4l2 videobuf2_common snd_hda_codec snd_hda_core videodev snd_hwdep snd_seq mc snd_seq_device snd_pcm snd_timer snd soundcore sg loop auth_rpcgss vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vmw_vmci vsock xfs 8021q garp stp llc mrp nvme ghash_ce e1000e nvme_core sr_mod nvme_keyring nvme_auth cdrom vmwgfx drm_ttm_helper ttm sunrpc dm_mirror dm_region_hash dm_log iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse dm_multipath dm_mod nfnetlink [ 1626.855594] CPU: 2 UID: 0 PID: 199 Comm: kworker/u24:33 Kdump: loaded Tainted: G B W 6.17.0-rc7+ #22 PREEMPT(voluntary) [ 1626.857075] Tainted: [B]=BAD_PAGE, [W]=WARN [ 1626.857573] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS VMW201.00V.24006586.BA64.2406042154 06/04/2024 [ 1626.858724] Workqueue: nfsd4 laundromat_main [nfsd] [ 1626.859304] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 1626.860010] pc : __list_del_entry_valid_or_report+0x148/0x200 [ 1626.860601] lr : __list_del_entry_valid_or_report+0x148/0x200 [ 1626.861182] sp : ffff8000881d7a40 [ 1626.861521] x29: ffff8000881d7a40 x28: 0000000000000018 x27: ffff0000c2a98200 [ 1626.862260] x26: 0000000000000600 x25: 0000000000000000 x24: ffff8000881d7b20 [ 1626.862986] x23: ffff0000c2a981e8 x22: 1fffe00012410e7d x21: ffff0000920873e8 [ 1626.863701] x20: ffff0000920873e8 x19: ffff000086f22998 x18: 0000000000000000 [ 1626.864421] x17: 20747562202c3839 x16: 3932326636383030 x15: 3030666666662065 [ 1626.865092] x14: 6220646c756f6873 x13: 0000000000000001 x12: ffff60004fd9e4a3 [ 1626.865713] x11: 1fffe0004fd9e4a2 x10: ffff60004fd9e4a2 x9 : dfff800000000000 [ 1626.866320] x8 : 00009fffb0261b5e x7 : ffff00027ecf2513 x6 : 0000000000000001 [ 1626.866938] x5 : ffff00027ecf2510 x4 : ffff60004fd9e4a3 x3 : 0000000000000000 [ 1626.867553] x2 : 0000000000000000 x1 : ffff000096069640 x0 : 000000000000006d [ 1626.868167] Call trace: [ 1626.868382] __list_del_entry_valid_or_report+0x148/0x200 (P) [ 1626.868876] _free_cpntf_state_locked+0xd0/0x268 [nfsd] [ 1626.869368] nfs4_laundromat+0x6f8/0x1058 [nfsd] [ 1626.869813] laundromat_main+0x24/0x60 [nfsd] [ 1626.870231] process_one_work+0x584/0x1050 [ 1626.870595] worker_thread+0x4c4/0xc60 [ 1626.870893] kthread+0x2f8/0x398 [ 1626.871146] ret_from_fork+0x10/0x20 [ 1626.871422] Code: aa1303e1 aa1403e3 910e8000 97bc55d7 (d4210000) [ 1626.871892] SMP: stopping secondary CPUs Reported-by: rtm@csail.mit.edu Closes: https://lore.kernel.org/linux-nfs/d8f064c1-a26f-4eed-b4f0-1f7f608f415f@oracle.com/T/#t Fixes: `624322f1ad` ("NFSD add COPY_NOTIFY operation") Cc: stable@vger.kernel.org Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-11-10 09:31:52 -05:00
Olga Kornievskaia	4d3dbc2386	nfsd: add missing FATTR4_WORD2_CLONE_BLKSIZE from supported attributes RFC 7862 Section 4.1.2 says that if the server supports CLONE it MUST support clone_blksize attribute. Fixes: `d6ca7d2643` ("NFSD: Implement FATTR4_CLONE_BLKSIZE attribute") Cc: stable@vger.kernel.org Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-11-04 11:02:31 -05:00
NeilBrown	8a7348a9ed	nfsd: fix refcount leak in nfsd_set_fh_dentry() nfsd exports a "pseudo root filesystem" which is used by NFSv4 to find the various exported filesystems using LOOKUP requests from a known root filehandle. NFSv3 uses the MOUNT protocol to find those exported filesystems and so is not given access to the pseudo root filesystem. If a v3 (or v2) client uses a filehandle from that filesystem, nfsd_set_fh_dentry() will report an error, but still stores the export in "struct svc_fh" even though it also drops the reference (exp_put()). This means that when fh_put() is called an extra reference will be dropped which can lead to use-after-free and possible denial of service. Normal NFS usage will not provide a pseudo-root filehandle to a v3 client. This bug can only be triggered by the client synthesising an incorrect filehandle. To fix this we move the assignments to the svc_fh later, after all possible error cases have been detected. Reported-and-tested-by: tianshuo han <hantianshuo233@gmail.com> Fixes: `ef7f6c4904` ("nfsd: move V4ROOT version check to nfsd_set_fh_dentry()") Signed-off-by: NeilBrown <neil@brown.name> Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-11-04 11:02:31 -05:00
Chuck Lever	3e7f011c25	Revert "NFSD: Remove the cap on number of operations per NFSv4 COMPOUND" I've found that pynfs COMP6 now leaves the connection or lease in a strange state, which causes CLOSE9 to hang indefinitely. I've dug into it a little, but I haven't been able to root-cause it yet. However, I bisected to commit `48aab1606f` ("NFSD: Remove the cap on number of operations per NFSv4 COMPOUND"). Tianshuo Han also reports a potential vulnerability when decoding an NFSv4 COMPOUND. An attacker can place an arbitrarily large op count in the COMPOUND header, which results in: [ 51.410584] nfsd: vmalloc error: size 1209533382144, exceeds total pages, mode:0xdc0(GFP_KERNEL\|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0 when NFSD attempts to allocate the COMPOUND op array. Let's restore the operation-per-COMPOUND limit, but increased to 200 for now. Reported-by: tianshuo han <hantianshuo233@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org Tested-by: Tianshuo Han <hantianshuo233@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-10-21 11:03:50 -04:00
Nathan Chancellor	29cdfb4950	nfsd: Avoid strlen conflict in nfsd4_encode_components_esc() There is an error building nfs4xdr.c with CONFIG_SUNRPC_DEBUG_TRACE=y and CONFIG_FORTIFY_SOURCE=n due to the local variable strlen conflicting with the function strlen(): In file included from include/linux/cpumask.h:11, from arch/x86/include/asm/paravirt.h:21, from arch/x86/include/asm/irqflags.h:102, from include/linux/irqflags.h:18, from include/linux/spinlock.h:59, from include/linux/mmzone.h:8, from include/linux/gfp.h:7, from include/linux/slab.h:16, from fs/nfsd/nfs4xdr.c:37: fs/nfsd/nfs4xdr.c: In function 'nfsd4_encode_components_esc': include/linux/kernel.h:321:46: error: called object 'strlen' is not a function or function pointer 321 \| __trace_puts(_THIS_IP_, str, strlen(str)); \ \| ^~~~~~ include/linux/kernel.h:265:17: note: in expansion of macro 'trace_puts' 265 \| trace_puts(fmt); \ \| ^~~~~~~~~~ include/linux/sunrpc/debug.h:34:41: note: in expansion of macro 'trace_printk' 34 \| # define __sunrpc_printk(fmt, ...) trace_printk(fmt, ##__VA_ARGS__) \| ^~~~~~~~~~~~ include/linux/sunrpc/debug.h:42:17: note: in expansion of macro '__sunrpc_printk' 42 \| __sunrpc_printk(fmt, ##__VA_ARGS__); \ \| ^~~~~~~~~~~~~~~ include/linux/sunrpc/debug.h:25:9: note: in expansion of macro 'dfprintk' 25 \| dfprintk(FACILITY, fmt, ##__VA_ARGS__) \| ^~~~~~~~ fs/nfsd/nfs4xdr.c:2646:9: note: in expansion of macro 'dprintk' 2646 \| dprintk("nfsd4_encode_components(%s)\n", components); \| ^~~~~~~ fs/nfsd/nfs4xdr.c:2643:13: note: declared here 2643 \| int strlen, count=0; \| ^~~~~~ This dprintk() instance is not particularly useful, so just remove it altogether to get rid of the immediate strlen() conflict. At the same time, eliminate the local strlen variable to avoid potential conflicts with strlen() in the future. Fixes: `ec7d8e68ef` ("sunrpc: add a Kconfig option to redirect dfprintk() output to trace buffer") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-10-21 11:03:19 -04:00
Chuck Lever	abb1f08a21	NFSD: Fix crash in nfsd4_read_release() When tracing is enabled, the trace_nfsd_read_done trace point crashes during the pynfs read.testNoFh test. Fixes: `15a8b55dbb` ("nfsd: call op_release, even when op_func returns an error") Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-10-21 11:03:19 -04:00
Chuck Lever	4f76435fd5	NFSD: Define actions for the new time_deleg FATTR4 attributes NFSv4 clients won't send legitimate GETATTR requests for these new attributes because they are intended to be used only with CB_GETATTR and SETATTR. But NFSD has to do something besides crashing if it ever sees a GETATTR request that queries these attributes. RFC 8881 Section 18.7.3 states: > The server MUST return a value for each attribute that the client > requests if the attribute is supported by the server for the > target file system. If the server does not support a particular > attribute on the target file system, then it MUST NOT return the > attribute value and MUST NOT set the attribute bit in the result > bitmap. The server MUST return an error if it supports an > attribute on the target but cannot obtain its value. In that case, > no attribute values will be returned. Further, RFC 9754 Section 5 states: > These new attributes are invalid to be used with GETATTR, VERIFY, > and NVERIFY, and they can only be used with CB_GETATTR and SETATTR > by a client holding an appropriate delegation. Thus there does not appear to be a specific server response mandated by specification. Taking the guidance that querying these attributes via GETATTR is "invalid", NFSD will return nfserr_inval, failing the request entirely. Reported-by: Robert Morris <rtm@csail.mit.edu> Closes: https://lore.kernel.org/linux-nfs/7819419cf0cb50d8130dc6b747765d2b8febc88a.camel@kernel.org/T/#t Fixes: `51c0d4f7e3` ("nfsd: add support for FATTR4_OPEN_ARGUMENTS") Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-10-21 11:03:19 -04:00
Chuck Lever	4b47a8601b	NFSD: Define a proc_layoutcommit for the FlexFiles layout type Avoid a crash if a pNFS client should happen to send a LAYOUTCOMMIT operation on a FlexFiles layout. Reported-by: Robert Morris <rtm@csail.mit.edu> Closes: https://lore.kernel.org/linux-nfs/152f99b2-ba35-4dec-93a9-4690e625dccd@oracle.com/T/#t Cc: Thomas Haynes <loghyr@hammerspace.com> Cc: stable@vger.kernel.org Fixes: `9b9960a0ca` ("nfsd: Add a super simple flex file server") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-10-10 12:53:50 -04:00
NeilBrown	73cc6ec1a8	nfsd: discard nfserr_dropit nfserr_dropit hasn't been used for over a decade, since rq_dropme and the RQ_DROPME were introduced. Time to get rid of it completely. Signed-off-by: NeilBrown <neil@brown.name> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-10-01 15:54:01 -04:00
Eric Biggers	d8e97cc476	SUNRPC: Make RPCSEC_GSS_KRB5 select CRYPTO instead of depending on it Make RPCSEC_GSS_KRB5 select CRYPTO instead of depending on it. This unblocks the eventual removal of the selection of CRYPTO from NFSD_V4, which will no longer be needed by nfsd itself due to switching to the crypto library functions. But NFSD_V4 selects RPCSEC_GSS_KRB5, which still needs CRYPTO. It makes more sense for RPCSEC_GSS_KRB5 to select CRYPTO itself, like most other kconfig options that need CRYPTO do. Signed-off-by: Eric Biggers <ebiggers@kernel.org> Acked-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-10-01 15:54:01 -04:00
Mike Snitzer	6304affe45	NFSD: Add io_cache_{read,write} controls to debugfs Add 'io_cache_read' to NFSD's debugfs interface so that any data read by NFSD will either be: - cached using page cache (NFSD_IO_BUFFERED=0) - cached but removed from the page cache upon completion (NFSD_IO_DONTCACHE=1). io_cache_read may be set by writing to: /sys/kernel/debug/nfsd/io_cache_read Add 'io_cache_write' to NFSD's debugfs interface so that any data written by NFSD will either be: - cached using page cache (NFSD_IO_BUFFERED=0) - cached but removed from the page cache upon completion (NFSD_IO_DONTCACHE=1). io_cache_write may be set by writing to: /sys/kernel/debug/nfsd/io_cache_write The default value for both settings is NFSD_IO_BUFFERED, which is NFSD's existing behavior for both read and write. Changes to these settings take immediate effect for all exports and NFS versions. Currently only xfs and ext4 implement RWF_DONTCACHE. For file systems that do not implement RWF_DONTCACHE, NFSD use only buffered I/O when the io_cache setting is NFSD_IO_DONTCACHE. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-10-01 15:54:01 -04:00
Chuck Lever	d6e80d48f9	NFSD: Do the grace period check in ->proc_layoutget RFC 8881 Section 18.43.3 states: > If the metadata server is in a grace period, and does not persist > layouts and device ID to device address mappings, then it MUST > return NFS4ERR_GRACE (see Section 8.4.2.1). Jeff observed that this suggests the grace period check is better done by the individual layout type implementations, because checking for the server grace period is unnecessary for some layout types. Suggested-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/linux-nfs/7h5p5ktyptyt37u6jhpbjfd5u6tg44lriqkdc7iz7czeeabrvo@ijgxz27dw4sg/T/#t Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-10-01 15:54:01 -04:00
Dan Carpenter	eafdd7e949	nfsd: delete unnecessary NULL check in __fh_verify() In commit 4a0de50a44bb ("nfsd: decouple the xprtsec policy check from check_nfsd_access()") we added a NULL check on "rqstp" to earlier in the function. This check is no longer required so delete it. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-10-01 15:54:01 -04:00
Sergey Bashirov	e0963ce53b	NFSD: Allow layoutcommit during grace period If the loca_reclaim field is set to TRUE, this indicates that the client is attempting to commit changes to a layout after the restart of the metadata server during the metadata server's recovery grace period. This type of request may be necessary when the client has uncommitted writes to provisionally allocated byte-ranges of a file that were sent to the storage devices before the restart of the metadata server. See RFC 8881, section 18.42.3. Without this, the client is not able to increase the file size and commit preallocated extents when the block/scsi layout server is restarted during a write and is in a grace period. And when the grace period ends, the client also cannot perform layoutcommit because the old layout state becomes invalid, resulting in file corruption. Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-10-01 15:54:01 -04:00
Sergey Bashirov	db155b7c7c	NFSD: Disallow layoutget during grace period When the server is recovering from a reboot and is in a grace period, any operation that may result in deletion or reallocation of block extents should not be allowed. See RFC 8881, section 18.43.3. If multiple clients write data to the same file, rebooting the server during writing may result in file corruption. In the worst case, the exported XFS may also become corrupted. Observed this behavior while testing pNFS block volume setup. Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-25 10:01:24 -04:00
Xichao Zhao	6c15463c45	sunrpc: fix "occurence"->"occurrence" Trivial fix to spelling mistake in comment text. Signed-off-by: Xichao Zhao <zhao.xichao@vivo.com> Reviewed-by: Joe Damato <joe@dama.to> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Eric Biggers	13289ed501	nfsd: Don't force CRYPTO_LIB_SHA256 to be built-in Now that nfsd is accessing SHA-256 via the library API instead of via crypto_shash, there is a direct symbol dependency on the SHA-256 code and there is no benefit to be gained from forcing it to be built-in. Therefore, select CRYPTO_LIB_SHA256 from NFSD (conditional on NFSD_V4) instead of from NFSD_V4, so that it can be 'm' if NFSD is 'm'. Signed-off-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Olga Kornievskaia	a082e4b4d0	nfsd: nfserr_jukebox in nlm_fopen should lead to a retry When v3 NLM request finds a conflicting delegation, it triggers a delegation recall and nfsd_open fails with EAGAIN. nfsd_open then translates EAGAIN into nfserr_jukebox. In nlm_fopen, instead of returning nlm_failed for when there is a conflicting delegation, drop this NLM request so that the client retries. Once delegation is recalled and if a local lock is claimed, a retry would lead to nfsd returning a nlm_lck_blocked error or a successful nlm lock. Fixes: `d343fce148` ("[PATCH] knfsd: Allow lockd to drop replies as appropriate") Cc: stable@vger.kernel.org # v6.6 Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Chuck Lever	8ddd06be9a	NFSD: Reduce DRC bucket size The common case is that a DRC lookup will not find the XID in the bucket. Reduce the amount of pointer chasing during the lookup by keeping fewer entries in each hash bucket. Changing the bucket size constant forces the size of the DRC hash table to increase, and the height of each bucket r-b tree to be reduced. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Chuck Lever	fb340bfd48	NFSD: Delay adding new entries to LRU Neil Brown observes: > I would not include RC_INPROG entries in the lru at all - they are > always ignored, and will be added when they are switched to > RCU_DONE. I also removed a stale comment. Suggested-by: NeilBrown <neil@brown.name> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Chuck Lever	d73d06dac6	SUNRPC: Move the svc_rpcb_cleanup() call sites Clean up: because svc_rpcb_cleanup() and svc_xprt_destroy_all() are always invoked in pairs, we can deduplicate code by moving the svc_rpcb_cleanup() call sites into svc_xprt_destroy_all(). Tested-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Chuck Lever	dd9adfa0da	NFS: Remove rpcbind cleanup for NFSv4.0 callback The NFS client's NFSv4.0 callback listeners are created with SVC_SOCK_ANONYMOUS, therefore svc_setup_socket() does not register them with the client's rpcbind service. And, note that nfs_callback_down_net() does not call svc_rpcb_cleanup() at all when shutting down the callback server. Even if svc_setup_socket() were to attempt to register or unregister these sockets, the callback service has vs_hidden set, which shunts the rpcbind upcalls. The svc_rpcb_cleanup() error flow was introduced by commit `c946556b87` ("NFS: move per-net callback thread initialization to nfs_callback_up_net()"). It doesn't appear in the code that was relocated by that commit. Therefore, there is no need to call svc_rpcb_cleanup() when listener creation fails during callback server start-up. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Olga Kornievskaia	898374fdd7	nfsd: unregister with rpcbind when deleting a transport When a listener is added, a part of creation of transport also registers program/port with rpcbind. However, when the listener is removed, while transport goes away, rpcbind still has the entry for that port/type. When deleting the transport, unregister with rpcbind when appropriate. ---v2 created a new xpt_flag XPT_RPCB_UNREG to mark TCP and UDP transport and at xprt destroy send rpcbind unregister if flag set. Suggested-by: Chuck Lever <chuck.lever@oracle.com> Fixes: `d093c90892` ("nfsd: fix management of listener transports") Cc: stable@vger.kernel.org Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Xichao Zhao	f64397e04b	NFSD: Drop redundant conversion to bool The result of integer comparison already evaluates to bool. No need for explicit conversion. Signed-off-by: Xichao Zhao <zhao.xichao@vivo.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Jeff Layton	7569065fb1	sunrpc: eliminate return pointer in svc_tcp_sendmsg() Return a positive value if something was sent, or a negative error code. Eliminate the "err" variable in the only caller as well. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Jeff Layton	a9a15ba23e	sunrpc: fix pr_notice in svc_tcp_sendto() to show correct length This pr_notice() is confusing since it only prints xdr->len, which doesn't include the 4-byte record marker. That can make it sometimes look like the socket sent more than was requested if it's short by just a few bytes. Add sizeof(marker) to the size and fix the format accordingly. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Scott Mayhew	e4f574ca9c	nfsd: decouple the xprtsec policy check from check_nfsd_access() A while back I had reported that an NFSv3 client could successfully mount using '-o xprtsec=none' an export that had been exported with 'xprtsec=tls:mtls'. By "successfully" I mean that the mount command would succeed and the mount would show up in /proc/mount. Attempting to do anything futher with the mount would be met with NFS3ERR_ACCES. This was fixed (albeit accidentally) by commit `bb4f07f240` ("nfsd: Fix NFSD_MAY_BYPASS_GSS and NFSD_MAY_BYPASS_GSS_ON_ROOT") and was subsequently re-broken by commit `0813c5f012` ("nfsd: fix access checking for NLM under XPRTSEC policies"). Transport Layer Security isn't an RPC security flavor or pseudo-flavor, so we shouldn't be conflating them when determining whether the access checks can be bypassed. Split check_nfsd_access() into two helpers, and have __fh_verify() call the helpers directly since __fh_verify() has logic that allows one or both of the checks to be skipped. All other sites will continue to call check_nfsd_access(). Link: https://lore.kernel.org/linux-nfs/ZjO3Qwf_G87yNXb2@aion/ Fixes: `9280c57743` ("NFSD: Handle new xprtsec= export option") Cc: stable@vger.kernel.org Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Thorsten Blum	ab1c282c01	NFSD: Fix destination buffer size in nfsd4_ssc_setup_dul() Commit `5304877936` ("NFSD: Fix strncpy() fortify warning") replaced strncpy(,, sizeof(..)) with strlcpy(,, sizeof(..) - 1), but strlcpy() already guaranteed NUL-termination of the destination buffer and subtracting one byte potentially truncated the source string. The incorrect size was then carried over in commit `72f78ae00a` ("NFSD: move from strlcpy with unused retval to strscpy") when switching from strlcpy() to strscpy(). Fix this off-by-one error by using the full size of the destination buffer again. Cc: stable@vger.kernel.org Fixes: `5304877936` ("NFSD: Fix strncpy() fortify warning") Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Eric Biggers	9ebcd022a3	nfsd: Eliminate an allocation in nfs4_make_rec_clidname() Since MD5 digests are fixed-size, make nfs4_make_rec_clidname() store the digest in a stack buffer instead of a dynamically allocated buffer. Use MD5_DIGEST_SIZE instead of a hard-coded value, both in nfs4_make_rec_clidname() and in the definition of HEXDIR_LEN. Signed-off-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Eric Biggers	17695d72d0	nfsd: Replace open-coded conversion of bytes to hex Since the Linux kernel's sprintf() has conversion to hex built-in via "%*phN", delete md5_to_hex() and just use that. Also add an explicit array bound to the dname parameter of nfs4_make_rec_clidname() to make its size clear. No functional change. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Colin Ian King	6ecdfd7aa8	lockd: Remove space before newline There is an extraneous space before a newline in a dprintk message. Remove the space. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Jeff Layton	e5e9b24ab8	nfsd: freeze c/mtime updates with outstanding WRITE_ATTRS delegation Instead of allowing the ctime to roll backward with a WRITE_ATTRS delegation, set FMODE_NOCMTIME on the file and have it skip mtime and ctime updates. It is possible that the client will never send a SETATTR to set the times before returning the delegation. Add two new bools to struct nfs4_delegation: dl_written: tracks whether the file has been written since the delegation was granted. This is set in the WRITE and LAYOUTCOMMIT handlers. dl_setattr: tracks whether the client has sent at least one valid mtime that can also update the ctime in a SETATTR. When unlocking the lease for the delegation, clear FMODE_NOCMTIME. If the file has been written, but no setattr for the delegated mtime and ctime has been done, update the timestamps to current_time(). Suggested-by: NeilBrown <neil@brown.name> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Jeff Layton	b40b1ba37a	nfsd: fix timestamp updates in CB_GETATTR When updating the local timestamps from CB_GETATTR, the updated values are not being properly vetted. Compare the update times vs. the saved times in the delegation rather than the current times in the inode. Also, ensure that the ctime is properly vetted vs. its original value. Fixes: `6ae30d6eb2` ("nfsd: add support for delegated timestamps") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Jeff Layton	3952f1cbcb	nfsd: fix SETATTR updates for delegated timestamps SETATTRs containing delegated timestamp updates are currently not being vetted properly. Since we no longer need to compare the timestamps vs. the current timestamps, move the vetting of delegated timestamps wholly into nfsd. Rename the set_cb_time() helper to nfsd4_vet_deleg_time(), and make it non-static. Add a new vet_deleg_attrs() helper that is called from nfsd4_setattr that uses nfsd4_vet_deleg_time() to properly validate the all the timestamps. If the validation indicates that the update should be skipped, unset the appropriate flags in ia_valid. Fixes: `7e13f4f8d2` ("nfsd: handle delegated timestamps in SETATTR") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Jeff Layton	7663e963a5	nfsd: track original timestamps in nfs4_delegation As Trond points out [1], the "original time" mentioned in RFC 9754 refers to the timestamps on the files at the time that the delegation was granted, and not the current timestamp of the file on the server. Store the current timestamps for the file in the nfs4_delegation when granting one. Add STATX_ATIME and STATX_MTIME to the request mask in nfs4_delegation_stat(). When granting OPEN_DELEGATE_READ_ATTRS_DELEG, do a nfs4_delegation_stat() and save the correct atime. If the stat() fails for any reason, fall back to granting a normal read deleg. [1]: https://lore.kernel.org/linux-nfs/47a4e40310e797f21b5137e847b06bb203d99e66.camel@kernel.org/ Fixes: `7e13f4f8d2` ("nfsd: handle delegated timestamps in SETATTR") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Jeff Layton	c066ff58e5	nfsd: use ATTR_CTIME_SET for delegated ctime updates Ensure that notify_change() doesn't clobber a delegated ctime update with current_time() by setting ATTR_CTIME_SET for those updates. Don't bother setting the timestamps in cb_getattr_update_times() in the non-delegated case. notify_change() will do that itself. Fixes: `7e13f4f8d2` ("nfsd: handle delegated timestamps in SETATTR") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Jeff Layton	afc5b36e29	vfs: add ATTR_CTIME_SET flag When ATTR_ATIME_SET and ATTR_MTIME_SET are set in the ia_valid mask, the notify_change() logic takes that to mean that the request should set those values explicitly, and not override them with "now". With the advent of delegated timestamps, similar functionality is needed for the ctime. Add a ATTR_CTIME_SET flag, and use that to indicate that the ctime should be accepted as-is. Also, clean up the if statements to eliminate the extra negatives. In setattr_copy() and setattr_copy_mgtime() use inode_set_ctime_deleg() when ATTR_CTIME_SET is set, instead of basing the decision on ATTR_DELEG. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Jeff Layton	5affb498e7	nfsd: ignore ATTR_DELEG when checking ia_valid before notify_change() If the only flag left is ATTR_DELEG, then there are no changes to be made. Fixes: `7e13f4f8d2` ("nfsd: handle delegated timestamps in SETATTR") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Jeff Layton	2990b5a479	nfsd: fix assignment of ia_ctime.tv_nsec on delegated mtime update The ia_ctime.tv_nsec field should be set to modify.nseconds. Fixes: `7e13f4f8d2` ("nfsd: handle delegated timestamps in SETATTR") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Sergey Bashirov	d68886bae7	NFSD: Fix last write offset handling in layoutcommit The data type of loca_last_write_offset is newoffset4 and is switched on a boolean value, no_newoffset, that indicates if a previous write occurred or not. If no_newoffset is FALSE, an offset is not given. This means that client does not try to update the file size. Thus, server should not try to calculate new file size and check if it fits into the segment range. See RFC 8881, section 12.5.4.2. Sometimes the current incorrect logic may cause clients to hang when trying to sync an inode. If layoutcommit fails, the client marks the inode as dirty again. Fixes: `9cf514ccfa` ("nfsd: implement pNFS operations") Cc: stable@vger.kernel.org Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Sergey Bashirov	f963cf2b91	NFSD: Implement large extent array support in pNFS When pNFS client in the block or scsi layout mode sends layoutcommit to MDS, a variable length array of modified extents is supplied within the request. This patch allows the server to accept such extent arrays if they do not fit within single memory page. The issue can be reproduced when writing to a 1GB file using FIO with O_DIRECT, 4K block and large I/O depth without preallocation of the file. In this case, the server returns NFSERR_BADXDR to the client. Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Sergey Bashirov	6bf1be3399	NFSD: Minor cleanup in layoutcommit decoding Use the appropriate xdr function to decode the lc_newoffset field, which is a boolean value. See RFC 8881, section 18.42.1. Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Sergey Bashirov	274365a51d	NFSD: Minor cleanup in layoutcommit processing Remove dprintk in nfsd4_layoutcommit. These are not needed in day to day usage, and the information is also available in Wireshark when capturing NFS traffic. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Sergey Bashirov	832738e4b3	NFSD: Rework encoding and decoding of nfsd4_deviceid Compilers may optimize the layout of C structures, so we should not rely on sizeof struct and memcpy to encode and decode XDR structures. The byte order of the fields should also be taken into account. This patch adds the correct functions to handle the deviceid4 structure and removes the pad field, which is currently not used by NFSD, from the runtime state. The server's byte order is preserved because the deviceid4 blob on the wire is only used as a cookie by the client. Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Sergey Bashirov	c97b737ef8	sunrpc: Change ret code of xdr_stream_decode_opaque_fixed Since the opaque is fixed in size, the caller already knows how many bytes were decoded, on success. Thus, xdr_stream_decode_opaque_fixed() doesn't need to return that value. And, xdr_stream_decode_u32 and _u64 both return zero on success. This patch simplifies the caller's error checking to avoid potential integer promotion issues. Suggested-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
NeilBrown	2ee3a75e42	nfsd: discard nfsd_file_get_local() This interface was deprecated by commit `e6f7e1487a` ("nfs_localio: simplify interface to nfsd for getting nfsd_file") and is now unused. So let's remove it. Signed-off-by: NeilBrown <neil@brown.name> Reviewed-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Jeff Layton	d9adbb6e10	sunrpc: delay pc_release callback until after the reply is sent The server-side sunrpc code currently calls pc_release before sending the reply. Change svc_process and svc_process_bc to call pc_release after sending the reply instead. Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00

1 2 3 4 5 ...

1383590 Commits