Commit 30181faae3 ("nfsd: Check queue type before submitting a SCSI
request") did the work of ensuring that we don't send SCSI requests to a
request queue that won't support them, but that check is in the
GETDEVICEINFO path. Let's not set the SCSI layout in fs_layout_type in the
first place, and then we'll have less clients sending GETDEVICEINFO for
non-SCSI request queues and less unnecessary WARN_ONs.
While we're in here, remove some outdated comments that refer to
"overwriting" layout seletion because commit 8a4c392688 ("nfsd: allow
nfsd to advertise multiple layout types") changed things to no longer
overwrite the layout type.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Although the writeback resends are more robust than the reads, since they
are not immediately rescheduled by the same thread, we are better off
processing them in the same place as the reads.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
We do not want to have rpciod threads perform recursive calls into the
RPC layer since that can deadlock. In particular, having to wait for
a layoutget can be nasty... We want rather to defer scheduling those
retries until we're in the rpc_release() callback, since that is
called from the nfsiod workqueue.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If the layout was invalidated due to a reboot, then don't try to send
a layoutreturn for it.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Right now, we can call nfs_commit_inode() while holding the session slot,
which could lead to NFSv4 deadlocks. Ensure we only keep the slot if
the server returned a layout that we have to process.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Pull jfs fix from Dave Kleikamp:
"This fixes a too-small allocation in the xattr code"
* tag 'jfs-4.18' of git://github.com/kleikamp/linux-shaggy:
jfs: Fix inconsistency between memory allocation and ea_buf->max_size
This patch replaces the ib_device_attr.max_sge with max_send_sge and
max_recv_sge. It allows ulps to take advantage of devices that have very
different send and recv sge depths. For example cxgb4 has a max_recv_sge
of 4, yet a max_send_sge of 16. Splitting out these attributes allows
much more efficient use of the SQ for cxgb4 with ulps that use the RDMA_RW
API. Consider a large RDMA WRITE that has 16 scattergather entries.
With max_sge of 4, the ulp would send 4 WRITE WRs, but with max_sge of
16, it can be done with 1 WRITE WR.
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Pull cifs fixes from Steve French:
"Misc SMB3 fixes, including particularly important ones for signing,
some minor documentation and debug improvements and another posix
smb3.11 fix"
* tag '4.18-rc1-more-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix invalid check in __cifs_calc_signature()
cifs: Use correct packet length in SMB2_TRANSFORM header
smb3: fix corrupt path in subdirs on smb311 with posix
smb3: do not display empty interface list
smb3: Fix mode on mkdir on smb311 mounts
cifs: Fix kernel oops when traceSMB is enabled
CIFS: dump every session iface info
CIFS: parse and store info on iface queries
CIFS: add iface info to struct cifs_ses
CIFS: complete PDU definitions for interface queries
CIFS: move default port definitions to cifsglob.h
cifs: Fix encryption/signing
cifs: update __smb_send_rqst() to take an array of requests
cifs: remove smb2_send_recv()
cifs: push rfc1002 generation down the stack
smb3: increase initial number of credits requested to allow write
cifs: minor documentation updates
cifs: add lease tracking to the cached root fid
smb3: note that smb3.11 posix extensions mount option is experimental
The kernel's ext4 mount-time checks were more permissive than
e2fsprogs's libext2fs checks when opening a file system. The
superblock is considered too insane for debugfs or e2fsck to operate
on it, the kernel has no business trying to mount it.
This will make file system fuzzing tools work harder, but the failure
cases that they find will be more useful and be easier to evaluate.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
We're encoding a single op in the reply but leaving the number of ops
zero, so the reply makes no sense.
Somewhat academic as this isn't a case any real client will hit, though
in theory perhaps that could change in a future protocol extension.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Document a couple things that confused me on a recent reading.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
It's inode->i_lock that's now taken in setlease and break_lease, instead
of the big kernel lock.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The name of this variable doesn't fit the type. And we only ever use
one field of it.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Make the function prototype match the name a little better.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The change attribute is what is used by clients to revalidate their
caches. Our server may use i_version or ctime for that purpose. Those
choices behave slightly differently, and it may be useful to the client
to know which we're using. This attribute tells the client that. The
Linux client doesn't yet use this attribute yet, though.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Currently we return the worst-case value of 1 second in the time delta
attribute. That's not terribly useful. Instead, return a value
calculated from the time granularity supported by the filesystem and the
system clock.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
I don't have a good rationale for the lease period, but 90 seconds seems
long, and as long as we're allowing the server to extend the grace
period up to double the lease period, let's half the default to 45.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If the client is only renewing state a little sooner than once a lease
period, then it might not discover the server has restarted till close
to the end of the grace period, and might run out of time to do the
actual reclaim.
Extend the grace period by a second each time we notice there are
clients still trying to reclaim, up to a limit of another whole lease
period.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If there is a directory entry pointing to a system inode (such as a
journal inode), complain and declare the file system to be corrupted.
Also, if the superblock's first inode number field is too small,
refuse to mount the file system.
This addresses CVE-2018-10882.
https://bugzilla.kernel.org/show_bug.cgi?id=200069
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Use a separate journal transaction if it turns out that we need to
convert an inline file to use an data block. Otherwise we could end
up failing due to not having journal credits.
This addresses CVE-2018-10883.
https://bugzilla.kernel.org/show_bug.cgi?id=200071
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Do not set the b_modified flag in block's journal head should not
until after we're sure that jbd2_journal_dirty_metadat() will not
abort with an error due to there not being enough space reserved in
the jbd2 handle.
Otherwise, future attempts to modify the buffer may lead a large
number of spurious errors and warnings.
This addresses CVE-2018-10883.
https://bugzilla.kernel.org/show_bug.cgi?id=200071
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Pull documentation fixes from Mauro Carvalho Chehab:
"This solves a series of broken links for files under Documentation,
and improves a script meant to detect such broken links (see
scripts/documentation-file-ref-check).
The changes on this series are:
- can.rst: fix a footnote reference;
- crypto_engine.rst: Fix two parsing warnings;
- Fix a lot of broken references to Documentation/*;
- improve the scripts/documentation-file-ref-check script, in order
to help detecting/fixing broken references, preventing
false-positives.
After this patch series, only 33 broken references to doc files are
detected by scripts/documentation-file-ref-check"
* tag 'docs-broken-links' of git://linuxtv.org/mchehab/experimental: (26 commits)
fix a series of Documentation/ broken file name references
Documentation: rstFlatTable.py: fix a broken reference
ABI: sysfs-devices-system-cpu: remove a broken reference
devicetree: fix a series of wrong file references
devicetree: fix name of pinctrl-bindings.txt
devicetree: fix some bindings file names
MAINTAINERS: fix location of DT npcm files
MAINTAINERS: fix location of some display DT bindings
kernel-parameters.txt: fix pointers to sound parameters
bindings: nvmem/zii: Fix location of nvmem.txt
docs: Fix more broken references
scripts/documentation-file-ref-check: check tools/*/Documentation
scripts/documentation-file-ref-check: get rid of false-positives
scripts/documentation-file-ref-check: hint: dash or underline
scripts/documentation-file-ref-check: add a fix logic for DT
scripts/documentation-file-ref-check: accept more wildcards at filenames
scripts/documentation-file-ref-check: fix help message
media: max2175: fix location of driver's companion documentation
media: v4l: fix broken video4linux docs locations
media: dvb: point to the location of the old README.dvb-usb file
...
Pull fsnotify updates from Jan Kara:
"fsnotify cleanups unifying handling of different watch types.
This is the shortened fsnotify series from Amir with the last five
patches pulled out. Amir has modified those patches to not change
struct inode but obviously it's too late for those to go into this
merge window"
* tag 'fsnotify_for_v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fsnotify: add fsnotify_add_inode_mark() wrappers
fanotify: generalize fanotify_should_send_event()
fsnotify: generalize send_to_group()
fsnotify: generalize iteration of marks by object type
fsnotify: introduce marks iteration helpers
fsnotify: remove redundant arguments to handle_event()
fsnotify: use type id to identify connector object type
When expanding the extra isize space, we must never move the
system.data xattr out of the inode body. For performance reasons, it
doesn't make any sense, and the inline data implementation assumes
that system.data xattr is never in the external xattr block.
This addresses CVE-2018-10880
https://bugzilla.kernel.org/show_bug.cgi?id=200005
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Pull AFS updates from Al Viro:
"Assorted AFS stuff - ended up in vfs.git since most of that consists
of David's AFS-related followups to Christoph's procfs series"
* 'afs-proc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
afs: Optimise callback breaking by not repeating volume lookup
afs: Display manually added cells in dynamic root mount
afs: Enable IPv6 DNS lookups
afs: Show all of a server's addresses in /proc/fs/afs/servers
afs: Handle CONFIG_PROC_FS=n
proc: Make inline name size calculation automatic
afs: Implement network namespacing
afs: Mark afs_net::ws_cell as __rcu and set using rcu functions
afs: Fix a Sparse warning in xdr_decode_AFSFetchStatus()
proc: Add a way to make network proc files writable
afs: Rearrange fs/afs/proc.c to remove remaining predeclarations.
afs: Rearrange fs/afs/proc.c to move the show routines up
afs: Rearrange fs/afs/proc.c by moving fops and open functions down
afs: Move /proc management functions to the end of the file
Pull compat updates from Al Viro:
"Some biarch patches - getting rid of assorted (mis)uses of
compat_alloc_user_space().
Not much in that area this cycle..."
* 'work.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
orangefs: simplify compat ioctl handling
signalfd: lift sigmask copyin and size checks to callers of do_signalfd4()
vmsplice(): lift importing iovec into vmsplice(2) and compat counterpart
Pull aio fixes from Al Viro:
"Assorted AIO followups and fixes"
* 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
eventpoll: switch to ->poll_mask
aio: only return events requested in poll_mask() for IOCB_CMD_POLL
eventfd: only return events requested in poll_mask()
aio: mark __aio_sigset::sigmask const
The following check would never evaluate to true:
> if (i == 0 && iov[0].iov_len <= 4)
Because 'i' always starts at 1.
This patch fixes it and also move the header checks outside the for loop
- which makes more sense.
Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
In smb3_init_transform_rq(), 'orig_len' was only counting the request
length, but forgot to count any data pages in the request.
Writing or creating files with the 'seal' mount option was broken.
In addition, do some code refactoring by exporting smb2_rqst_len() to
calculate the appropriate packet size and avoid duplicating the same
calculation all over the code.
The start of the io vector is either the rfc1002 length (4 bytes) or a
SMB2 header which is always > 4. Use this fact to check and skip the
rfc1002 length if requested.
Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
When converting from an inode from storing the data in-line to a data
block, ext4_destroy_inline_data_nolock() was only clearing the on-disk
copy of the i_blocks[] array. It was not clearing copy of the
i_blocks[] in ext4_inode_info, in i_data[], which is the copy actually
used by ext4_map_blocks().
This didn't matter much if we are using extents, since the extents
header would be invalid and thus the extents could would re-initialize
the extents tree. But if we are using indirect blocks, the previous
contents of the i_blocks array will be treated as block numbers, with
potentially catastrophic results to the file system integrity and/or
user data.
This gets worse if the file system is using a 1k block size and
s_first_data is zero, but even without this, the file system can get
quite badly corrupted.
This addresses CVE-2018-10881.
https://bugzilla.kernel.org/show_bug.cgi?id=200015
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
At the moment, afs_break_callbacks calls afs_break_one_callback() for each
separate FID it was given, and the latter looks up the volume individually
for each one.
However, this is inefficient if two or more FIDs have the same vid as we
could reuse the volume. This is complicated by cell aliasing whereby we
may have multiple cells sharing a volume and can therefore have multiple
callback interests for any particular volume ID.
At the moment afs_break_one_callback() scans the entire list of volumes
we're getting from a server and breaks the appropriate callback in every
matching volume, regardless of cell. This scan is done for every FID.
Optimise callback breaking by the following means:
(1) Sort the FID list by vid so that all FIDs belonging to the same volume
are clumped together.
This is done through the use of an indirection table as we cannot do
an insertion sort on the afs_callback_break array as we decode FIDs
into it as we subsequently also have to decode callback info into it
that corresponds by array index only.
We also don't really want to bubblesort afterwards if we can avoid it.
(2) Sort the server->cb_interests array by vid so that all the matching
volumes are grouped together. This permits the scan to stop after
finding a record that has a higher vid.
(3) When breaking FIDs, we try to keep server->cb_break_lock as long as
possible, caching the start point in the array for that volume group
as long as possible.
It might make sense to add another layer in that list and have a
refcounted volume ID anchor that has the matching interests attached
to it rather than being in the list. This would allow the lock to be
dropped without losing the cursor.
Signed-off-by: David Howells <dhowells@redhat.com>
Alter the dynroot mount so that cells created by manipulation of
/proc/fs/afs/cells and /proc/fs/afs/rootcell and by specification of a root
cell as a module parameter will cause directories for those cells to be
created in the dynamic root superblock for the network namespace[*].
To this end:
(1) Only one dynamic root superblock is now created per network namespace
and this is shared between all attempts to mount it. This makes it
easier to find the superblock to modify.
(2) When a dynamic root superblock is created, the list of cells is walked
and directories created for each cell already defined.
(3) When a new cell is added, if a dynamic root superblock exists, a
directory is created for it.
(4) When a cell is destroyed, the directory is removed.
(5) These directories are created by calling lookup_one_len() on the root
dir which automatically creates them if they don't exist.
[*] Inasmuch as network namespaces are currently supported here.
Signed-off-by: David Howells <dhowells@redhat.com>
If server does not support listing interfaces then do not
display empty "Server interfaces" line to avoid confusing users.
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Aurelien Aptel <aaptel@suse.com>
Since the rfc1002 generation was moved down to __smb_send_rqst(),
the transform header is now in rqst->rq_iov[0].
Correctly assign the transform header pointer in crypt_message().
Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Now that we have the plumbing to pass request without an rfc1002
header all the way down to the point we write to the socket we no
longer need the smb2_send_recv() function.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Move the generation of the 4 byte length field down the stack and
generate it immediately before we start writing the data to the socket.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>