Commit Graph

56061 Commits

Author SHA1 Message Date
David Sterba
6fc4749d25 btrfs: make success path out of btrfs_init_dev_replace_tgtdev more clear
This is a preparatory cleanup that will make clear that the only
successful way out of btrfs_init_dev_replace_tgtdev will also set the
device_out to a valid pointer. With this guarantee, the callers can be
simplified.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:23 +02:00
David Sterba
00251a527a btrfs: squeeze btrfs_dev_replace_continue_on_mount to its caller
The function is called once and is fairly small, we can merge it with
the caller.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:22 +02:00
Anand Jain
b518519713 btrfs: cleanup btrfs_rm_device() promote fs_devices pointer
This function uses fs_info::fs_devices number of time, however we
declare and use it only at the end, instead do it in the beginning of
the function and use it.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:22 +02:00
Anand Jain
636d2c9d63 btrfs: cleanup find_device() drop list_head pointer
find_device() declares struct list_head *head pointer and used only once,
instead just use it directly.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:22 +02:00
Anand Jain
897fb5734a btrfs: rename __btrfs_open_devices to open_fs_devices
__btrfs_open_devices() is un-exported drop __ prefix and rename it to
open_fs_devices().

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:21 +02:00
Anand Jain
0226e0eb65 btrfs: rename __btrfs_close_devices to close_fs_devices
__btrfs_close_devices() is un-exported, drop the __ prefix and rename it
to close_fs_devices().

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:21 +02:00
Anand Jain
f117e290e8 btrfs: cleanup __btrfs_open_devices() drop head pointer
__btrfs_open_devices() declares struct list_head *head, however head is
used only once, instead use btrfs_fs_devices::devices directly.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:21 +02:00
Anand Jain
c4babc5e38 btrfs: rename struct btrfs_fs_devices::list
btrfs_fs_devices::list is the list of BTRFS fsid in the kernel, a generic
name 'list' makes it's search very difficult, rename it to fs_list.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:21 +02:00
Nikolay Borisov
be97f133b3 btrfs: Drop fs_info parameter from btrfs_merge_delayed_refs
It's provided by the transaction handle.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:20 +02:00
Nikolay Borisov
f033798d12 btrfs: Drop fs_info parameter from add_delayed_data_ref
It's provided by the transaction handle.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:20 +02:00
Nikolay Borisov
1acda0c289 btrfs: Drop add_delayed_ref_head fs_info parameter
It's provided by the transaction handle.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:20 +02:00
Nikolay Borisov
40012f96b6 btrfs: Remove btrfs_wait_and_free_delalloc_work
This function is called from only 1 place and is effectively a wrapper
over wait_completion/kfree. It doesn't really bring any value having
those two calls in a separate function. Just open code it and remove it.
No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:20 +02:00
Nikolay Borisov
8ae225a8a4 btrfs: Remove tree argument from extent_writepages
It can be directly referenced from the passed address_space so do that.
No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:20 +02:00
Nikolay Borisov
81f1d39035 btrfs: Use list_empty instead of list_empty_careful
list_empty_careful usually is a signal of something tricky going on. Its
usage in btrfs is actually not needed since both lists it's used on are
local to a function and cannot be modified concurrently. So switch to
plain list_empty. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:19 +02:00
Nikolay Borisov
2a3ff0adc9 btrfs: Remove redundant tree argument from extent_readpages
This function is called only from btrfs_readpage and is already passed
the mapping. Simplify its signature by moving the code obtaining
reference to the extent tree in the function. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:19 +02:00
Nikolay Borisov
29c68b2de9 btrfs: Remove map argument from try_release_extent_state
It's not used in the function so just remove it. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:19 +02:00
Nikolay Borisov
477a30ba5f btrfs: Sink extent_tree arguments in try_release_extent_mapping
This function already gets the page from which the two extent trees
are referenced. Simplify its signature by moving the code getting the
trees inside the function. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:19 +02:00
Misono Tomohiro
a79a464d56 btrfs: Allow rmdir(2) to delete an empty subvolume
Change the behavior of rmdir(2) and allow it to delete an empty
subvolume by using btrfs_delete_subvolume() which is used by
btrfs_ioctl_snap_destroy().

This is a change in behaviour and has been requested by users. Deleting
the subvolume by ioctl requires root permissions while the rmdir way
does works with standard tools and syscalls for all users that can
access the subvolume.

The main usecase is to allow 'rm -rf /path/with/subvols' to simply work.
We were not able to find any nasty usability surprises, the intention is
to do the destructive rm. Without allowing rmdir, this would have to be
followed by the ioctl subvolume deletion, which is more of an annoyance.

Implementation details:

The required lock for @dir and inode of @dentry is already acquired in
vfs layer.

We need some check before deleting a subvolume. Permission check is done
in vfs layer, emptiness check is in btrfs_rmdir() and additional check
(i.e. neither the subvolume is a default subvolume nor send is in progress)
is in btrfs_delete_subvolume().

Note that in btrfs_ioctl_snap_destroy(), d_delete() is called after
btrfs_delete_subvolume(). For rmdir(2), d_delete() is called in vfs
layer later.

Tested-by: Goffredo Baroncelli <kreijack@inwind.it>
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ enhance changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:18 +02:00
Misono Tomohiro
f60a2364a4 btrfs: Factor out the main deletion process from btrfs_ioctl_snap_destroy()
Factor out the second half of btrfs_ioctl_snap_destroy() as
btrfs_delete_subvolume(), which performs some subvolume specific checks
before deletion:

1. send is not in progress
2. the subvolume is not the default subvolume
3. the subvolume does not contain other subvolumes

and actual deletion process. btrfs_delete_subvolume() requires
inode_lock for both @dir and inode of @dentry. The remaining part of
btrfs_ioctl_snap_destroy() is mainly permission checks.

Note that call of d_delete() is not included in btrfs_delete_subvolume()
as this function will also be used by btrfs_rmdir() to delete an empty
subvolume and in that case d_delete() is called in VFS layer.

As a result, btrfs_unlink_subvol() and may_destroy_subvol()
become static functions. No functional changes.

Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ minor comment updates ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:18 +02:00
Misono Tomohiro
ec42f16734 btrfs: Move may_destroy_subvol() from ioctl.c to inode.c
This is a preparation work to refactor btrfs_ioctl_snap_destroy()
and to allow rmdir(2) to delete an empty subvolume.

Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ minor update of the function comment ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:18 +02:00
Howard McLauchlan
3b079a919a btrfs: remove unused le_test_bit()
With commit b18253ec57c0 ("btrfs: optimize free space tree bitmap
conversion"), there are no more callers to le_test_bit(). This patch
removes le_test_bit().

Signed-off-by: Howard McLauchlan <hmclauchlan@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:18 +02:00
Howard McLauchlan
a565971ff3 btrfs: optimize free space tree bitmap conversion
Presently, convert_free_space_to_extents() does a linear scan of the
bitmap. We can speed this up with find_next_{bit,zero_bit}_le().

This patch replaces the linear scan with find_next_{bit,zero_bit}_le().
Testing shows a 20-33% decrease in execution time for
convert_free_space_to_extents().

Since we change bitmap to be unsigned long, we have to do some casting
for the bitmap cursor. In le_bitmap_set() it makes sense to use u8, as
we are doing bit operations. Everywhere else, we're just using it for
pointer arithmetic and not directly accessing it, so char seems more
appropriate.

Suggested-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Howard McLauchlan <hmclauchlan@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:18 +02:00
Howard McLauchlan
6faa8f475e btrfs: clean up le_bitmap_{set, clear}()
le_bitmap_set() is only used by free-space-tree, so move it there and
make it static. le_bitmap_clear() is not used, so remove it.

Signed-off-by: Howard McLauchlan <hmclauchlan@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:18 +02:00
David Sterba
f46b24c945 btrfs: use fs_info for btrfs_handle_em_exist tracepoint
We really want to know to which filesystem the extent map events belong,
but as it cannot be reached from the extent_map pointers, we need to
pass it down the callchain.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:17 +02:00
David Sterba
0e08eb9b1c btrfs: tests: pass fs_info to extent_map tests
Preparatory work to pass fs_info to btrfs_add_extent_mapping so we can
get a better tracepoint message. Extent maps do not need fs_info for
anything so we only add a dummy one without any other initialization.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:17 +02:00
Nikolay Borisov
57f1642ec3 btrfs: Consolidate error checking for btrfs_alloc_chunk
The second if is really a subcase of ret being less than 0. So
introduce a generic if (ret < 0) check, and inside have another if
which explicitly handles the -ENOSPC and any other errors. No
functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:16 +02:00
Nikolay Borisov
1e7a14211b btrfs: Fix lock release order
Locks should generally be released in the oppposite order they are
acquired. Generally lock acquisiton ordering is used to ensure
deadlocks don't happen. However, as becomes more complicated it's
best to also maintain proper unlock order so as to avoid possible dead
locks. This was found by code inspection and doesn't necessarily lead
to a deadlock scenario.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:16 +02:00
Nikolay Borisov
b25f0d0012 btrfs: Use while loop instead of labels in __endio_write_update_ordered
Currently __endio_write_update_ordered uses labels to implement
what is essentially a simple while loop. This makes the code more
cumbersome to follow than it actually has to be. No functional
changes. No xfstest regressions were found during testing.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:15 +02:00
Anand Jain
89595e80de btrfs: add comment about BTRFS_FS_EXCL_OP
Adds comments about BTRFS_FS_EXCL_OP to existing comments
about the device locks.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ minor updates ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 18:07:15 +02:00
Nikolay Borisov
41d0bd3b5e btrfs: Drop delayed_refs argument from btrfs_check_delayed_seq
It's used to print its pointer in a debug statement but doesn't really
bring any useful information to the error message.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 13:12:11 +02:00
Su Yue
c065f5b1cf btrfs: rename btrfs_get_block_group_info and make it static
The function btrfs_get_block_group_info() was introduced by the
commit 5af3e8cce8 ("Btrfs: make filesystem read-only when submitting
 barrier fails") which used it in disk-io.c.

However, the function is only called in ioctl.c now.
Its parameter type btrfs_ioctl_space_info* is only for ioctl.

So, make it static and rename it to be original name
get_block_group_info.

No functional change.

Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 13:12:11 +02:00
Nikolay Borisov
29d2b84cf9 btrfs: Replace owner argument in add_pinned_bytes with a boolean
add_pinned_bytes really cares whether the bytes being pinned are either
data or metadata. To that effect it checks whether the 'owner' argument
is less than BTRFS_FIRST_FREE_OBJECTID (256). This works because
owner can really have 2 types of values:

 a) For metadata extents it holds the level at which the parent is in
    the btree. This amounts to owner having the values 0-7

 b) In case of modifying data extents, owner is the inode number
    to which those extents belongs.

Let's make this more explicit byt converting the owner parameter to a
boolean value and either pass it directly when we know the type of
extents we are working with (i.e. in btrfs_free_tree_block). In cases
when the parent function can be called on both metadata/data extents
perform the check in the caller. This hopefully makes the interface
of add_pinned_bytes more intuitive.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 13:12:11 +02:00
Chengguang Xu
84ae6f829f affs: fix potential memory leak when parsing option 'prefix'
When specifying option 'prefix' multiple times, current option parsing
will cause memory leak.  Hence, call kfree for previous one in this
case.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28 12:36:41 +02:00
Steve French
eccb4422cf smb3: Add ftrace tracepoints for improved SMB3 debugging
Although dmesg logs and wireshark network traces can be
helpful, being able to dynamically enable/disable tracepoints
(in this case via the kernel ftrace mechanism) can also be
helpful in more quickly debugging problems, and more
selectively tracing the events related to the bug report.

This patch adds 12 ftrace tracepoints to cifs.ko for SMB3 events
in some obvious locations.  Subsequent patches will add more
as needed.

Example use:
   trace-cmd record -e cifs
   <run test case>
   trace-cmd show

Various trace events can be filtered. See:
       trace-cmd list | grep cifs
for the current list of cifs tracepoints.

Sample output (from mount and writing to a file):

root@smf:/sys/kernel/debug/tracing/events/cifs# trace-cmd show
<snip>
      mount.cifs-6633  [006] ....  7246.936461: smb3_cmd_done: pid=6633 tid=0x0 sid=0x0 cmd=0 mid=0
      mount.cifs-6633  [006] ....  7246.936701: smb3_cmd_err:  pid=6633 tid=0x0 sid=0x3d9cf8e5 cmd=1 mid=1 status=0xc0000016 rc=-5
      mount.cifs-6633  [006] ....  7246.943055: smb3_cmd_done: pid=6633 tid=0x0 sid=0x3d9cf8e5 cmd=1 mid=2
      mount.cifs-6633  [006] ....  7246.943298: smb3_cmd_done: pid=6633 tid=0xf9447636 sid=0x3d9cf8e5 cmd=3 mid=3
      mount.cifs-6633  [006] ....  7246.943446: smb3_cmd_done: pid=6633 tid=0xf9447636 sid=0x3d9cf8e5 cmd=11 mid=4
      mount.cifs-6633  [006] ....  7246.943659: smb3_cmd_done: pid=6633 tid=0xe1b781a sid=0x3d9cf8e5 cmd=3 mid=5
      mount.cifs-6633  [006] ....  7246.943766: smb3_cmd_done: pid=6633 tid=0xe1b781a sid=0x3d9cf8e5 cmd=11 mid=6
      mount.cifs-6633  [006] ....  7246.943937: smb3_cmd_done: pid=6633 tid=0xe1b781a sid=0x3d9cf8e5 cmd=5 mid=7
      mount.cifs-6633  [006] ....  7246.944020: smb3_cmd_done: pid=6633 tid=0xe1b781a sid=0x3d9cf8e5 cmd=16 mid=8
      mount.cifs-6633  [006] ....  7246.944091: smb3_cmd_done: pid=6633 tid=0xe1b781a sid=0x3d9cf8e5 cmd=16 mid=9
      mount.cifs-6633  [006] ....  7246.944163: smb3_cmd_done: pid=6633 tid=0xe1b781a sid=0x3d9cf8e5 cmd=16 mid=10
      mount.cifs-6633  [006] ....  7246.944218: smb3_cmd_err:  pid=6633 tid=0xf9447636 sid=0x3d9cf8e5 cmd=11 mid=11 status=0xc0000225 rc=-2
      mount.cifs-6633  [006] ....  7246.944219: smb3_fsctl_err: xid=0 fid=0xffffffffffffffff tid=0xf9447636 sid=0x3d9cf8e5 class=0 type=393620 rc=-2
      mount.cifs-6633  [007] ....  7246.944353: smb3_cmd_done: pid=6633 tid=0xe1b781a sid=0x3d9cf8e5 cmd=16 mid=12
            bash-2071  [000] ....  7256.903844: smb3_cmd_done: pid=2071 tid=0xe1b781a sid=0x3d9cf8e5 cmd=5 mid=13
            bash-2071  [000] ....  7256.904172: smb3_cmd_done: pid=2071 tid=0xe1b781a sid=0x3d9cf8e5 cmd=16 mid=14
            bash-2071  [000] ....  7256.904471: smb3_cmd_done: pid=2071 tid=0xe1b781a sid=0x3d9cf8e5 cmd=17 mid=15
            bash-2071  [000] ....  7256.904950: smb3_cmd_done: pid=2071 tid=0xe1b781a sid=0x3d9cf8e5 cmd=5 mid=16
            bash-2071  [000] ....  7256.905305: smb3_cmd_done: pid=2071 tid=0xe1b781a sid=0x3d9cf8e5 cmd=17 mid=17
            bash-2071  [000] ....  7256.905688: smb3_cmd_done: pid=2071 tid=0xe1b781a sid=0x3d9cf8e5 cmd=6 mid=18
            bash-2071  [000] ....  7256.905809: smb3_write_done: xid=0 fid=0xd628f511 tid=0xe1b781a sid=0x3d9cf8e5 offset=0x0 len=0x1b

Signed-off-by: Steve French <stfrench@microsoft.com>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-05-27 17:56:35 -05:00
Steve French
5a77e75fed smb3: rename encryption_required to smb3_encryption_required
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2018-05-27 17:56:35 -05:00
Steve French
3fa9a54061 cifs: update internal module version number for cifs.ko to 2.12
Signed-off-by: Steve French <smfrench@gmail.com>
2018-05-27 17:56:35 -05:00
Ronnie Sahlberg
97ca176224 cifs: add a new SMB2_close_flags function
And make SMB2_close just a wrapper for SMB2_close_flags.
We need this as we will start to send SMB2_CLOSE pdus using special
flags.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2018-05-27 17:56:35 -05:00
Ronnie Sahlberg
96164ab2d8 cifs: store the leaseKey in the fid on SMB2_open
In SMB2_open(), if we got a lease we need to store this in the fid structure
or else we will never be able to map a lease break back to which file/fid
it applies to.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2018-05-27 17:56:35 -05:00
Steve French
71992e62b8 cifs: fix build break when CONFIG_CIFS_DEBUG2 enabled
Previous patches "cifs: update calc_size to take a server argument"
and
  "cifs: add server argument to the dump_detail method"
were broken if CONFIG_CIFS_DEBUG2 enabled

Signed-off-by: Steve French <smfrench@gmail.com>
CC: Ronnie Sahlberg <lsahlber@redhat.com>
2018-05-27 17:56:35 -05:00
Ronnie Sahlberg
9ec672bd17 cifs: update calc_size to take a server argument
and change the smb2 version to take heder_preamble_size into account
instead of hardcoding it as 4 bytes.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2018-05-27 17:56:35 -05:00
Ronnie Sahlberg
14547f7d74 cifs: add server argument to the dump_detail method
We need a struct TCP_Server_Info *server to this method as it calls
calc_size. The calc_size method will soon be changed to also
take a server argument.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2018-05-27 17:56:35 -05:00
Steve French
3d4ef9a153 smb3: fix redundant opens on root
In SMB2/SMB3 unlike in cifs we unnecessarily open the root of the share
over and over again in various places during mount and path revalidation
and also in statfs.  This patch cuts redundant traffic (opens and closes)
by simply keeping the directory handle for the root around (and reopening
it as needed on reconnect), so query calls don't require three round
trips to copmlete - just one, and eases load on network, client and
server (on mount alone, cuts network traffic by more than a third).

Also add a new cifs mount parm "nohandlecache" to allow users whose
servers might have resource constraints (eg in case they have a server
with so many users connecting to it that this extra handle per mount
could possibly be a resource concern).

Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-05-27 17:56:35 -05:00
Al Viro
8767712f26 rmdir(),rename(): do shrink_dcache_parent() only on success
Once upon a time ->rmdir() instances used to check if victim inode
had more than one (in-core) reference and failed with -EBUSY if it
had.  The reason was race avoidance - emptiness check is worthless
if somebody could just go and create new objects in the victim
directory afterwards.

With introduction of dcache the checks had been replaced with
checking the refcount of dentry.  However, since a cached negative
lookup leaves a negative child dentry, such check had lead to false
positives - with empty foo/ doing stat foo/bar before rmdir foo
ended up with -EBUSY unless the negative dentry of foo/bar happened
to be evicted by the time of rmdir(2).  That had been fixed by
doing shrink_dcache_parent() just before the refcount check.

At the same time, ext2_rmdir() has grown a private solution that
eliminated those -EBUSY - it did something (setting ->i_size to 0)
which made any subsequent ext2_add_entry() fail.

Unfortunately, even with shrink_dcache_parent() the check had been
racy - after all, the victim itself could be found by dcache lookup
just after we'd checked its refcount.  That got fixed by a new
helper (dentry_unhash()) that did shrink_dcache_parent() and unhashed
the sucker if its refcount ended up equal to 1.  That got called before
->rmdir(), turning the checks in ->rmdir() instances into "if not
unhashed fail with -EBUSY".  Which reduced the boilerplate nicely, but
had an unpleasant side effect - now shrink_dcache_parent() had been
done before the emptiness checks, leading to easily triggerable calls
of shrink_dcache_parent() on arbitrary large subtrees, quite possibly
nested into each other.

Several years later the ext2-private trick had been generalized -
(in-core) inodes of dead directories are flagged and calls of
lookup, readdir and all directory-modifying methods were prevented
in so marked directories.  Remaining boilerplate in ->rmdir() instances
became redundant and some instances got rid of it.

In 2011 the call of dentry_unhash() got shifted into ->rmdir() instances
and then killed off in all of them.  That has lead to another problem,
though - in case of successful rmdir we *want* any (negative) child
dentries dropped and the victim itself made negative.  There's no point
keeping cached negative lookups in foo when we can get the negative
lookup of foo itself cached.  So shrink_dcache_parent() call had been
restored; unfortunately, it went into the place where dentry_unhash()
used to be, i.e. before the ->rmdir() call.  Note that we don't unhash
anymore, so any "is it busy" checks would be racy; fortunately, all of
them are gone.

We should've done that call right *after* successful ->rmdir().  That
reduces contention caused by tree-walking in shrink_dcache_parent()
and, especially, contention caused by evictions in two nested subtrees
going on in parallel.  The same goes for directory-overwriting rename() -
the story there had been parallel to that of rmdir().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-27 16:23:51 -04:00
David S. Miller
5b79c2af66 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Lots of easy overlapping changes in the confict
resolutions here.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-26 19:46:15 -04:00
Al Viro
888e2b03ef switch the rest of procfs lookups to d_splice_alias()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-26 14:20:50 -04:00
Al Viro
0168b9e38c procfs: switch instantiate_t to d_splice_alias()
... and get rid of pointless struct inode *dir argument of those,
while we are at it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-26 14:20:50 -04:00
Al Viro
9883638641 don't bother with tid_fd_revalidate() in lookups
what we want it for is actually updating inode metadata;
take _that_ into a separate helper and use it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-26 14:20:28 -04:00
Christoph Hellwig
652fe8e876 timerfd: convert to ->poll_mask
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-26 09:16:44 +02:00
Christoph Hellwig
9e42f195f5 eventfd: switch to ->poll_mask
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-26 09:16:44 +02:00
Christoph Hellwig
dd67081b36 pipe: convert to ->poll_mask
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-26 09:16:44 +02:00