mirror of
https://github.com/raspberrypi/linux.git
synced 2025-12-24 02:52:38 +00:00
b13201da5f81bbe7d901b9822fadd58492f11541
11880 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
d62835bafe |
btrfs: fix file_offset for REQ_BTRFS_ONE_ORDERED bios that get split
[ Upstream commit |
||
|
|
15d7102ee2 |
btrfs: make btrfs_split_bio work on struct btrfs_bio
[ Upstream commit |
||
|
|
e6a9a52882 |
btrfs: fix an uninitialized variable warning in btrfs_log_inode
[ Upstream commit
|
||
|
|
ed4dc4735e |
btrfs: can_nocow_file_extent should pass down args->strict from callers
commit |
||
|
|
8aa2879c28 |
btrfs: fix iomap_begin length for nocow writes
commit |
||
|
|
ee2575289a |
btrfs: do not ASSERT() on duplicated global roots
commit |
||
|
|
1c8666127f |
btrfs: properly enable async discard when switching from RO->RW
commit |
||
|
|
1cfba7edb9 |
btrfs: subpage: fix a crash in metadata repair path
commit
|
||
|
|
38cb20b671 |
btrfs: handle memory allocation failure in btrfs_csum_one_bio
[ Upstream commit |
||
|
|
a96cad9a47 |
btrfs: scrub: try harder to mark RAID56 block groups read-only
[ Upstream commit
|
||
|
|
1d454bcf87 |
btrfs: call btrfs_orig_bbio_end_io in btrfs_end_bio_work
[ Upstream commit |
||
|
|
cd001ac151 |
btrfs: fix csum_tree_block page iteration to avoid tripping on -Werror=array-bounds
commit
|
||
|
|
763f0c269f |
btrfs: abort transaction when sibling keys check fails for leaves
[ Upstream commit
|
||
|
|
6d86606488 |
btrfs: use nofs when cleaning up aborted transactions
commit
|
||
|
|
dee80752e1 |
btrfs: fix backref walking not returning all inode refs
commit |
||
|
|
cd8cffa9d4 |
btrfs: zoned: fix full zone super block reading on ZNS
commit |
||
|
|
0af493c79a |
btrfs: zoned: zone finish data relocation BG with last IO
commit |
||
|
|
573b38f8df |
btrfs: fix space cache inconsistency after error loading it from disk
commit |
||
|
|
978eb480ae |
btrfs: print-tree: parent bytenr must be aligned to sector size
commit |
||
|
|
11bd62688c |
btrfs: make clear_cache mount option to rebuild FST without disabling it
commit
|
||
|
|
4484428b7a |
btrfs: zero the buffer before marking it dirty in btrfs_redirty_list_add
commit |
||
|
|
478bd15f46 |
btrfs: don't free qgroup space unless specified
commit
|
||
|
|
9fd7199b7a |
btrfs: fix encoded write i_size corruption with no-holes
commit |
||
|
|
6062e9e335 |
btrfs: fix assertion of exclop condition when starting balance
commit
|
||
|
|
dc0072dd5d |
btrfs: properly reject clear_cache and v1 cache for block-group-tree
commit
|
||
|
|
bf3d7a5c39 |
btrfs: zoned: fix wrong use of bitops API in btrfs_ensure_empty_zones
commit |
||
|
|
371a81d7fa |
btrfs: fix btrfs_prev_leaf() to not return the same key twice
commit
|
||
|
|
3f466c7bc5 |
btrfs: scrub: reject unsupported scrub flags
commit
|
||
|
|
3686db7c07 |
btrfs: fix uninitialized variable warnings
commit
|
||
|
|
c337b23f32 |
Merge tag 'for-6.3-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"Two patches fixing the problem with aync discard.
The default settings had a low IOPS limit and processing a large batch
to discard would take a long time. On laptops this can cause increased
power consumption due to disk activity.
As async discard has been on by default since 6.2 this likely affects
a lot of users.
Summary:
- increase the default IOPS limit 10x which reportedly helped
- setting the sysfs IOPS value to 0 now does not throttle anymore
allowing the discards to be processed at full speed. Previously
there was an arbitrary 6 hour target for processing the pending
batch"
* tag 'for-6.3-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: reinterpret async discard iops_limit=0 as no delay
btrfs: set default discard iops_limit to 1000
|
||
|
|
ef9cddfe57 |
btrfs: reinterpret async discard iops_limit=0 as no delay
Currently, a limit of 0 results in a hard coded metering over 6 hours. Since the default is a set limit, I suspect no one truly depends on this rather arbitrary setting. Repurpose it for an arguably more useful "unlimited" mode, where the delay is 0. Note that if block groups are too new, or go fully empty, there is still a delay associated with those conditions. Those delays implement heuristics for not trimming a region we are relatively likely to fully overwrite soon. CC: stable@vger.kernel.org # 6.2+ Reviewed-by: Neal Gompa <neal@gompa.dev> Signed-off-by: Boris Burkov <boris@bur.io> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
|
|
e9f59429b8 |
btrfs: set default discard iops_limit to 1000
Previously, the default was a relatively conservative 10. This results in a 100ms delay, so with ~300 discards in a commit, it takes the full 30s till the next commit to finish the discards. On a workstation, this results in the disk never going idle, wasting power/battery, etc. Set the default to 1000, which results in using the smallest possible delay, currently, which is 1ms. This has shown to not pathologically keep the disk busy by the original reporter. Link: https://lore.kernel.org/linux-btrfs/Y%2F+n1wS%2F4XAH7X1p@nz/ Link: https://bugzilla.redhat.com/show_bug.cgi?id=2182228 CC: stable@vger.kernel.org # 6.2+ Reviewed-by: Neal Gompa <neal@gompa.dev Signed-off-by: Boris Burkov <boris@bur.io> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
|
|
2c40519251 |
Merge tag 'for-6.3-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba: - fix fast checksum detection, this affects filesystems with non-crc32c checksum, calculation would not be offloaded to worker threads - restore thread_pool mount option behaviour for endio workers, the new value for maximum active threads would not be set to the actual work queues * tag 'for-6.3-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix fast csum implementation detection btrfs: restore the thread_pool= behavior in remount for the end I/O workqueues |
||
|
|
68d99ab0e9 |
btrfs: fix fast csum implementation detection
The BTRFS_FS_CSUM_IMPL_FAST flag is currently set whenever a non-generic
crc32c is detected, which is the incorrect check if the file system uses
a different checksumming algorithm. Refactor the code to only check
this if crc32c is actually used. Note that in an ideal world the
information if an algorithm is hardware accelerated or not should be
provided by the crypto API instead, but that's left for another day.
CC: stable@vger.kernel.org # 5.4.x:
|
||
|
|
40fac6472f |
btrfs: restore the thread_pool= behavior in remount for the end I/O workqueues
Commit |
||
|
|
6ab608fe85 |
Merge tag 'for-6.3-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba: - scan block devices in non-exclusive mode to avoid temporary mkfs failures - fix race between quota disable and quota assign ioctls - fix deadlock when aborting transaction during relocation with scrub - ignore fiemap path cache when there are multiple paths for a node * tag 'for-6.3-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: ignore fiemap path cache when there are multiple paths for a node btrfs: fix deadlock when aborting transaction during relocation with scrub btrfs: scan device in non-exclusive mode btrfs: fix race between quota disable and quota assign ioctls |
||
|
|
2280d425ba |
btrfs: ignore fiemap path cache when there are multiple paths for a node
During fiemap, when walking backreferences to determine if a b+tree
node/leaf is shared, we may find a tree block (leaf or node) for which
two parents were added to the references ulist. This happens if we get
for example one direct ref (shared tree block ref) and one indirect ref
(non-shared tree block ref) for the tree block at the current level,
which can happen during relocation.
In that case the fiemap path cache can not be used since it's meant for
a single path, with one tree block at each possible level, so having
multiple references for a tree block at any level may result in getting
the level counter exceed BTRFS_MAX_LEVEL and eventually trigger the
warning:
WARN_ON_ONCE(level >= BTRFS_MAX_LEVEL)
at lookup_backref_shared_cache() and at store_backref_shared_cache().
This is harmless since the code ignores any level >= BTRFS_MAX_LEVEL, the
warning is there just to catch any unexpected case like the one described
above. However if a user finds this it may be scary and get reported.
So just ignore the path cache once we find a tree block for which there
are more than one reference, which is the less common case, and update
the cache with the sharedness check result for all levels below the level
for which we found multiple references.
Reported-by: Jarno Pelkonen <jarno.pelkonen@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CAKv8qLmDNAGJGCtsevxx_VZ_YOvvs1L83iEJkTgyA4joJertng@mail.gmail.com/
Fixes:
|
||
|
|
2d82a40aa7 |
btrfs: fix deadlock when aborting transaction during relocation with scrub
Before relocating a block group we pause scrub, then do the relocation and
then unpause scrub. The relocation process requires starting and committing
a transaction, and if we have a failure in the critical section of the
transaction commit path (transaction state >= TRANS_STATE_COMMIT_START),
we will deadlock if there is a paused scrub.
That results in stack traces like the following:
[42.479] BTRFS info (device sdc): relocating block group 53876686848 flags metadata|raid6
[42.936] BTRFS warning (device sdc): Skipping commit of aborted transaction.
[42.936] ------------[ cut here ]------------
[42.936] BTRFS: Transaction aborted (error -28)
[42.936] WARNING: CPU: 11 PID: 346822 at fs/btrfs/transaction.c:1977 btrfs_commit_transaction+0xcc8/0xeb0 [btrfs]
[42.936] Modules linked in: dm_flakey dm_mod loop btrfs (...)
[42.936] CPU: 11 PID: 346822 Comm: btrfs Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1
[42.936] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[42.936] RIP: 0010:btrfs_commit_transaction+0xcc8/0xeb0 [btrfs]
[42.936] Code: ff ff 45 8b (...)
[42.936] RSP: 0018:ffffb58649633b48 EFLAGS: 00010282
[42.936] RAX: 0000000000000000 RBX: ffff8be6ef4d5bd8 RCX: 0000000000000000
[42.936] RDX: 0000000000000002 RSI: ffffffffb35e7782 RDI: 00000000ffffffff
[42.936] RBP: ffff8be6ef4d5c98 R08: 0000000000000000 R09: ffffb586496339e8
[42.936] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8be6d38c7c00
[42.936] R13: 00000000ffffffe4 R14: ffff8be6c268c000 R15: ffff8be6ef4d5cf0
[42.936] FS: 00007f381a82b340(0000) GS:ffff8beddfcc0000(0000) knlGS:0000000000000000
[42.936] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[42.936] CR2: 00007f1e35fb7638 CR3: 0000000117680006 CR4: 0000000000370ee0
[42.936] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[42.936] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[42.936] Call Trace:
[42.936] <TASK>
[42.936] ? start_transaction+0xcb/0x610 [btrfs]
[42.936] prepare_to_relocate+0x111/0x1a0 [btrfs]
[42.936] relocate_block_group+0x57/0x5d0 [btrfs]
[42.936] ? btrfs_wait_nocow_writers+0x25/0xb0 [btrfs]
[42.936] btrfs_relocate_block_group+0x248/0x3c0 [btrfs]
[42.936] ? __pfx_autoremove_wake_function+0x10/0x10
[42.936] btrfs_relocate_chunk+0x3b/0x150 [btrfs]
[42.936] btrfs_balance+0x8ff/0x11d0 [btrfs]
[42.936] ? __kmem_cache_alloc_node+0x14a/0x410
[42.936] btrfs_ioctl+0x2334/0x32c0 [btrfs]
[42.937] ? mod_objcg_state+0xd2/0x360
[42.937] ? refill_obj_stock+0xb0/0x160
[42.937] ? seq_release+0x25/0x30
[42.937] ? __rseq_handle_notify_resume+0x3b5/0x4b0
[42.937] ? percpu_counter_add_batch+0x2e/0xa0
[42.937] ? __x64_sys_ioctl+0x88/0xc0
[42.937] __x64_sys_ioctl+0x88/0xc0
[42.937] do_syscall_64+0x38/0x90
[42.937] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[42.937] RIP: 0033:0x7f381a6ffe9b
[42.937] Code: 00 48 89 44 24 (...)
[42.937] RSP: 002b:00007ffd45ecf060 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[42.937] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f381a6ffe9b
[42.937] RDX: 00007ffd45ecf150 RSI: 00000000c4009420 RDI: 0000000000000003
[42.937] RBP: 0000000000000003 R08: 0000000000000013 R09: 0000000000000000
[42.937] R10: 00007f381a60c878 R11: 0000000000000246 R12: 00007ffd45ed0423
[42.937] R13: 00007ffd45ecf150 R14: 0000000000000000 R15: 00007ffd45ecf148
[42.937] </TASK>
[42.937] ---[ end trace 0000000000000000 ]---
[42.937] BTRFS: error (device sdc: state A) in cleanup_transaction:1977: errno=-28 No space left
[59.196] INFO: task btrfs:346772 blocked for more than 120 seconds.
[59.196] Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1
[59.196] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[59.196] task:btrfs state:D stack:0 pid:346772 ppid:1 flags:0x00004002
[59.196] Call Trace:
[59.196] <TASK>
[59.196] __schedule+0x392/0xa70
[59.196] ? __pv_queued_spin_lock_slowpath+0x165/0x370
[59.196] schedule+0x5d/0xd0
[59.196] __scrub_blocked_if_needed+0x74/0xc0 [btrfs]
[59.197] ? __pfx_autoremove_wake_function+0x10/0x10
[59.197] scrub_pause_off+0x21/0x50 [btrfs]
[59.197] scrub_simple_mirror+0x1c7/0x950 [btrfs]
[59.197] ? scrub_parity_put+0x1a5/0x1d0 [btrfs]
[59.198] ? __pfx_autoremove_wake_function+0x10/0x10
[59.198] scrub_stripe+0x20d/0x740 [btrfs]
[59.198] scrub_chunk+0xc4/0x130 [btrfs]
[59.198] scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs]
[59.198] ? __pfx_autoremove_wake_function+0x10/0x10
[59.198] btrfs_scrub_dev+0x236/0x6a0 [btrfs]
[59.199] ? btrfs_ioctl+0xd97/0x32c0 [btrfs]
[59.199] ? _copy_from_user+0x7b/0x80
[59.199] btrfs_ioctl+0xde1/0x32c0 [btrfs]
[59.199] ? refill_stock+0x33/0x50
[59.199] ? should_failslab+0xa/0x20
[59.199] ? kmem_cache_alloc_node+0x151/0x460
[59.199] ? alloc_io_context+0x1b/0x80
[59.199] ? preempt_count_add+0x70/0xa0
[59.199] ? __x64_sys_ioctl+0x88/0xc0
[59.199] __x64_sys_ioctl+0x88/0xc0
[59.199] do_syscall_64+0x38/0x90
[59.199] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[59.199] RIP: 0033:0x7f82ffaffe9b
[59.199] RSP: 002b:00007f82ff9fcc50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[59.199] RAX: ffffffffffffffda RBX: 000055b191e36310 RCX: 00007f82ffaffe9b
[59.199] RDX: 000055b191e36310 RSI: 00000000c400941b RDI: 0000000000000003
[59.199] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000
[59.199] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82ff9fd640
[59.199] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000
[59.199] </TASK>
[59.199] INFO: task btrfs:346773 blocked for more than 120 seconds.
[59.200] Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1
[59.200] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[59.201] task:btrfs state:D stack:0 pid:346773 ppid:1 flags:0x00004002
[59.201] Call Trace:
[59.201] <TASK>
[59.201] __schedule+0x392/0xa70
[59.201] ? __pv_queued_spin_lock_slowpath+0x165/0x370
[59.201] schedule+0x5d/0xd0
[59.201] __scrub_blocked_if_needed+0x74/0xc0 [btrfs]
[59.201] ? __pfx_autoremove_wake_function+0x10/0x10
[59.201] scrub_pause_off+0x21/0x50 [btrfs]
[59.202] scrub_simple_mirror+0x1c7/0x950 [btrfs]
[59.202] ? scrub_parity_put+0x1a5/0x1d0 [btrfs]
[59.202] ? __pfx_autoremove_wake_function+0x10/0x10
[59.202] scrub_stripe+0x20d/0x740 [btrfs]
[59.202] scrub_chunk+0xc4/0x130 [btrfs]
[59.203] scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs]
[59.203] ? __pfx_autoremove_wake_function+0x10/0x10
[59.203] btrfs_scrub_dev+0x236/0x6a0 [btrfs]
[59.203] ? btrfs_ioctl+0xd97/0x32c0 [btrfs]
[59.203] ? _copy_from_user+0x7b/0x80
[59.203] btrfs_ioctl+0xde1/0x32c0 [btrfs]
[59.204] ? should_failslab+0xa/0x20
[59.204] ? kmem_cache_alloc_node+0x151/0x460
[59.204] ? alloc_io_context+0x1b/0x80
[59.204] ? preempt_count_add+0x70/0xa0
[59.204] ? __x64_sys_ioctl+0x88/0xc0
[59.204] __x64_sys_ioctl+0x88/0xc0
[59.204] do_syscall_64+0x38/0x90
[59.204] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[59.204] RIP: 0033:0x7f82ffaffe9b
[59.204] RSP: 002b:00007f82ff1fbc50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[59.204] RAX: ffffffffffffffda RBX: 000055b191e36790 RCX: 00007f82ffaffe9b
[59.204] RDX: 000055b191e36790 RSI: 00000000c400941b RDI: 0000000000000003
[59.204] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000
[59.204] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82ff1fc640
[59.204] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000
[59.204] </TASK>
[59.204] INFO: task btrfs:346774 blocked for more than 120 seconds.
[59.205] Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1
[59.205] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[59.206] task:btrfs state:D stack:0 pid:346774 ppid:1 flags:0x00004002
[59.206] Call Trace:
[59.206] <TASK>
[59.206] __schedule+0x392/0xa70
[59.206] schedule+0x5d/0xd0
[59.206] __scrub_blocked_if_needed+0x74/0xc0 [btrfs]
[59.206] ? __pfx_autoremove_wake_function+0x10/0x10
[59.206] scrub_pause_off+0x21/0x50 [btrfs]
[59.207] scrub_simple_mirror+0x1c7/0x950 [btrfs]
[59.207] ? scrub_parity_put+0x1a5/0x1d0 [btrfs]
[59.207] ? __pfx_autoremove_wake_function+0x10/0x10
[59.207] scrub_stripe+0x20d/0x740 [btrfs]
[59.208] scrub_chunk+0xc4/0x130 [btrfs]
[59.208] scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs]
[59.208] ? __mutex_unlock_slowpath.isra.0+0x9a/0x120
[59.208] btrfs_scrub_dev+0x236/0x6a0 [btrfs]
[59.208] ? btrfs_ioctl+0xd97/0x32c0 [btrfs]
[59.209] ? _copy_from_user+0x7b/0x80
[59.209] btrfs_ioctl+0xde1/0x32c0 [btrfs]
[59.209] ? should_failslab+0xa/0x20
[59.209] ? kmem_cache_alloc_node+0x151/0x460
[59.209] ? alloc_io_context+0x1b/0x80
[59.209] ? preempt_count_add+0x70/0xa0
[59.209] ? __x64_sys_ioctl+0x88/0xc0
[59.209] __x64_sys_ioctl+0x88/0xc0
[59.209] do_syscall_64+0x38/0x90
[59.209] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[59.209] RIP: 0033:0x7f82ffaffe9b
[59.209] RSP: 002b:00007f82fe9fac50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[59.209] RAX: ffffffffffffffda RBX: 000055b191e36c10 RCX: 00007f82ffaffe9b
[59.209] RDX: 000055b191e36c10 RSI: 00000000c400941b RDI: 0000000000000003
[59.209] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000
[59.209] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82fe9fb640
[59.209] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000
[59.209] </TASK>
[59.209] INFO: task btrfs:346775 blocked for more than 120 seconds.
[59.210] Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1
[59.210] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[59.211] task:btrfs state:D stack:0 pid:346775 ppid:1 flags:0x00004002
[59.211] Call Trace:
[59.211] <TASK>
[59.211] __schedule+0x392/0xa70
[59.211] schedule+0x5d/0xd0
[59.211] __scrub_blocked_if_needed+0x74/0xc0 [btrfs]
[59.211] ? __pfx_autoremove_wake_function+0x10/0x10
[59.211] scrub_pause_off+0x21/0x50 [btrfs]
[59.212] scrub_simple_mirror+0x1c7/0x950 [btrfs]
[59.212] ? scrub_parity_put+0x1a5/0x1d0 [btrfs]
[59.212] ? __pfx_autoremove_wake_function+0x10/0x10
[59.212] scrub_stripe+0x20d/0x740 [btrfs]
[59.213] scrub_chunk+0xc4/0x130 [btrfs]
[59.213] scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs]
[59.213] ? __mutex_unlock_slowpath.isra.0+0x9a/0x120
[59.213] btrfs_scrub_dev+0x236/0x6a0 [btrfs]
[59.213] ? btrfs_ioctl+0xd97/0x32c0 [btrfs]
[59.214] ? _copy_from_user+0x7b/0x80
[59.214] btrfs_ioctl+0xde1/0x32c0 [btrfs]
[59.214] ? should_failslab+0xa/0x20
[59.214] ? kmem_cache_alloc_node+0x151/0x460
[59.214] ? alloc_io_context+0x1b/0x80
[59.214] ? preempt_count_add+0x70/0xa0
[59.214] ? __x64_sys_ioctl+0x88/0xc0
[59.214] __x64_sys_ioctl+0x88/0xc0
[59.214] do_syscall_64+0x38/0x90
[59.214] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[59.214] RIP: 0033:0x7f82ffaffe9b
[59.214] RSP: 002b:00007f82fe1f9c50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[59.214] RAX: ffffffffffffffda RBX: 000055b191e37090 RCX: 00007f82ffaffe9b
[59.214] RDX: 000055b191e37090 RSI: 00000000c400941b RDI: 0000000000000003
[59.214] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000
[59.214] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82fe1fa640
[59.214] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000
[59.214] </TASK>
[59.214] INFO: task btrfs:346776 blocked for more than 120 seconds.
[59.215] Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1
[59.216] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[59.217] task:btrfs state:D stack:0 pid:346776 ppid:1 flags:0x00004002
[59.217] Call Trace:
[59.217] <TASK>
[59.217] __schedule+0x392/0xa70
[59.217] ? __pv_queued_spin_lock_slowpath+0x165/0x370
[59.217] schedule+0x5d/0xd0
[59.217] __scrub_blocked_if_needed+0x74/0xc0 [btrfs]
[59.217] ? __pfx_autoremove_wake_function+0x10/0x10
[59.217] scrub_pause_off+0x21/0x50 [btrfs]
[59.217] scrub_simple_mirror+0x1c7/0x950 [btrfs]
[59.217] ? scrub_parity_put+0x1a5/0x1d0 [btrfs]
[59.218] ? __pfx_autoremove_wake_function+0x10/0x10
[59.218] scrub_stripe+0x20d/0x740 [btrfs]
[59.218] scrub_chunk+0xc4/0x130 [btrfs]
[59.218] scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs]
[59.219] ? __pfx_autoremove_wake_function+0x10/0x10
[59.219] btrfs_scrub_dev+0x236/0x6a0 [btrfs]
[59.219] ? btrfs_ioctl+0xd97/0x32c0 [btrfs]
[59.219] ? _copy_from_user+0x7b/0x80
[59.219] btrfs_ioctl+0xde1/0x32c0 [btrfs]
[59.219] ? should_failslab+0xa/0x20
[59.219] ? kmem_cache_alloc_node+0x151/0x460
[59.219] ? alloc_io_context+0x1b/0x80
[59.219] ? preempt_count_add+0x70/0xa0
[59.219] ? __x64_sys_ioctl+0x88/0xc0
[59.219] __x64_sys_ioctl+0x88/0xc0
[59.219] do_syscall_64+0x38/0x90
[59.219] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[59.219] RIP: 0033:0x7f82ffaffe9b
[59.219] RSP: 002b:00007f82fd9f8c50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[59.219] RAX: ffffffffffffffda RBX: 000055b191e37510 RCX: 00007f82ffaffe9b
[59.219] RDX: 000055b191e37510 RSI: 00000000c400941b RDI: 0000000000000003
[59.219] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000
[59.219] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82fd9f9640
[59.219] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000
[59.219] </TASK>
[59.219] INFO: task btrfs:346822 blocked for more than 120 seconds.
[59.220] Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1
[59.221] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[59.222] task:btrfs state:D stack:0 pid:346822 ppid:1 flags:0x00004002
[59.222] Call Trace:
[59.222] <TASK>
[59.222] __schedule+0x392/0xa70
[59.222] schedule+0x5d/0xd0
[59.222] btrfs_scrub_cancel+0x91/0x100 [btrfs]
[59.222] ? __pfx_autoremove_wake_function+0x10/0x10
[59.222] btrfs_commit_transaction+0x572/0xeb0 [btrfs]
[59.223] ? start_transaction+0xcb/0x610 [btrfs]
[59.223] prepare_to_relocate+0x111/0x1a0 [btrfs]
[59.223] relocate_block_group+0x57/0x5d0 [btrfs]
[59.223] ? btrfs_wait_nocow_writers+0x25/0xb0 [btrfs]
[59.223] btrfs_relocate_block_group+0x248/0x3c0 [btrfs]
[59.224] ? __pfx_autoremove_wake_function+0x10/0x10
[59.224] btrfs_relocate_chunk+0x3b/0x150 [btrfs]
[59.224] btrfs_balance+0x8ff/0x11d0 [btrfs]
[59.224] ? __kmem_cache_alloc_node+0x14a/0x410
[59.224] btrfs_ioctl+0x2334/0x32c0 [btrfs]
[59.225] ? mod_objcg_state+0xd2/0x360
[59.225] ? refill_obj_stock+0xb0/0x160
[59.225] ? seq_release+0x25/0x30
[59.225] ? __rseq_handle_notify_resume+0x3b5/0x4b0
[59.225] ? percpu_counter_add_batch+0x2e/0xa0
[59.225] ? __x64_sys_ioctl+0x88/0xc0
[59.225] __x64_sys_ioctl+0x88/0xc0
[59.225] do_syscall_64+0x38/0x90
[59.225] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[59.225] RIP: 0033:0x7f381a6ffe9b
[59.225] RSP: 002b:00007ffd45ecf060 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[59.225] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f381a6ffe9b
[59.225] RDX: 00007ffd45ecf150 RSI: 00000000c4009420 RDI: 0000000000000003
[59.225] RBP: 0000000000000003 R08: 0000000000000013 R09: 0000000000000000
[59.225] R10: 00007f381a60c878 R11: 0000000000000246 R12: 00007ffd45ed0423
[59.225] R13: 00007ffd45ecf150 R14: 0000000000000000 R15: 00007ffd45ecf148
[59.225] </TASK>
What happens is the following:
1) A scrub is running, so fs_info->scrubs_running is 1;
2) Task A starts block group relocation, and at btrfs_relocate_chunk() it
pauses scrub by calling btrfs_scrub_pause(). That increments
fs_info->scrub_pause_req from 0 to 1 and waits for the scrub task to
pause (for fs_info->scrubs_paused to be == to fs_info->scrubs_running);
3) The scrub task pauses at scrub_pause_off(), waiting for
fs_info->scrub_pause_req to decrease to 0;
4) Task A then enters btrfs_relocate_block_group(), and down that call
chain we start a transaction and then attempt to commit it;
5) When task A calls btrfs_commit_transaction(), it either will do the
commit itself or wait for some other task that already started the
commit of the transaction - it doesn't matter which case;
6) The transaction commit enters state TRANS_STATE_COMMIT_START;
7) An error happens during the transaction commit, like -ENOSPC when
running delayed refs or delayed items for example;
8) This results in calling transaction.c:cleanup_transaction(), where
we call btrfs_scrub_cancel(), incrementing fs_info->scrub_cancel_req
from 0 to 1, and blocking this task waiting for fs_info->scrubs_running
to decrease to 0;
9) From this point on, both the transaction commit and the scrub task
hang forever:
1) The transaction commit is waiting for fs_info->scrubs_running to
be decreased to 0;
2) The scrub task is at scrub_pause_off() waiting for
fs_info->scrub_pause_req to decrease to 0 - so it can not proceed
to stop the scrub and decrement fs_info->scrubs_running from 0 to 1.
Therefore resulting in a deadlock.
Fix this by having cleanup_transaction(), called if a transaction commit
fails, not call btrfs_scrub_cancel() if relocation is in progress, and
having btrfs_relocate_block_group() call btrfs_scrub_cancel() instead if
the relocation failed and a transaction abort happened.
This was triggered with btrfs/061 from fstests.
Fixes:
|
||
|
|
50d281fc43 |
btrfs: scan device in non-exclusive mode
This fixes mkfs/mount/check failures due to race with systemd-udevd
scan.
During the device scan initiated by systemd-udevd, other user space
EXCL operations such as mkfs, mount, or check may get blocked and result
in a "Device or resource busy" error. This is because the device
scan process opens the device with the EXCL flag in the kernel.
Two reports were received:
- btrfs/179 test case, where the fsck command failed with the -EBUSY
error
- LTP pwritev03 test case, where mkfs.vfs failed with
the -EBUSY error, when mkfs.vfs tried to overwrite old btrfs filesystem
on the device.
In both cases, fsck and mkfs (respectively) were racing with a
systemd-udevd device scan, and systemd-udevd won, resulting in the
-EBUSY error for fsck and mkfs.
Reproducing the problem has been difficult because there is a very
small window during which these userspace threads can race to
acquire the exclusive device open. Even on the system where the problem
was observed, the problem occurrences were anywhere between 10 to 400
iterations and chances of reproducing decreases with debug printk()s.
However, an exclusive device open is unnecessary for the scan process,
as there are no write operations on the device during scan. Furthermore,
during the mount process, the superblock is re-read in the below
function call chain:
btrfs_mount_root
btrfs_open_devices
open_fs_devices
btrfs_open_one_device
btrfs_get_bdev_and_sb
So, to fix this issue, removes the FMODE_EXCL flag from the scan
operation, and add a comment.
The case where mkfs may still write to the device and a scan is running,
the btrfs signature is not written at that time so scan will not
recognize such device.
Reported-by: Sherry Yang <sherry.yang@oracle.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/oe-lkp/202303170839.fdf23068-oliver.sang@intel.com
CC: stable@vger.kernel.org # 5.4+
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
||
|
|
2f1a6be12a |
btrfs: fix race between quota disable and quota assign ioctls
The quota assign ioctl can currently run in parallel with a quota disable ioctl call. The assign ioctl uses the quota root, while the disable ioctl frees that root, and therefore we can have a use-after-free triggered in the assign ioctl, leading to a trace like the following when KASAN is enabled: [672.723][T736] BUG: KASAN: slab-use-after-free in btrfs_search_slot+0x2962/0x2db0 [672.723][T736] Read of size 8 at addr ffff888022ec0208 by task btrfs_search_sl/27736 [672.724][T736] [672.725][T736] CPU: 1 PID: 27736 Comm: btrfs_search_sl Not tainted 6.3.0-rc3 #37 [672.723][T736] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [672.727][T736] Call Trace: [672.728][T736] <TASK> [672.728][T736] dump_stack_lvl+0xd9/0x150 [672.725][T736] print_report+0xc1/0x5e0 [672.720][T736] ? __virt_addr_valid+0x61/0x2e0 [672.727][T736] ? __phys_addr+0xc9/0x150 [672.725][T736] ? btrfs_search_slot+0x2962/0x2db0 [672.722][T736] kasan_report+0xc0/0xf0 [672.729][T736] ? btrfs_search_slot+0x2962/0x2db0 [672.724][T736] btrfs_search_slot+0x2962/0x2db0 [672.723][T736] ? fs_reclaim_acquire+0xba/0x160 [672.722][T736] ? split_leaf+0x13d0/0x13d0 [672.726][T736] ? rcu_is_watching+0x12/0xb0 [672.723][T736] ? kmem_cache_alloc+0x338/0x3c0 [672.722][T736] update_qgroup_status_item+0xf7/0x320 [672.724][T736] ? add_qgroup_rb+0x3d0/0x3d0 [672.739][T736] ? do_raw_spin_lock+0x12d/0x2b0 [672.730][T736] ? spin_bug+0x1d0/0x1d0 [672.737][T736] btrfs_run_qgroups+0x5de/0x840 [672.730][T736] ? btrfs_qgroup_rescan_worker+0xa70/0xa70 [672.738][T736] ? __del_qgroup_relation+0x4ba/0xe00 [672.738][T736] btrfs_ioctl+0x3d58/0x5d80 [672.735][T736] ? tomoyo_path_number_perm+0x16a/0x550 [672.737][T736] ? tomoyo_execute_permission+0x4a0/0x4a0 [672.731][T736] ? btrfs_ioctl_get_supported_features+0x50/0x50 [672.737][T736] ? __sanitizer_cov_trace_switch+0x54/0x90 [672.734][T736] ? do_vfs_ioctl+0x132/0x1660 [672.730][T736] ? vfs_fileattr_set+0xc40/0xc40 [672.730][T736] ? _raw_spin_unlock_irq+0x2e/0x50 [672.732][T736] ? sigprocmask+0xf2/0x340 [672.737][T736] ? __fget_files+0x26a/0x480 [672.732][T736] ? bpf_lsm_file_ioctl+0x9/0x10 [672.738][T736] ? btrfs_ioctl_get_supported_features+0x50/0x50 [672.736][T736] __x64_sys_ioctl+0x198/0x210 [672.736][T736] do_syscall_64+0x39/0xb0 [672.731][T736] entry_SYSCALL_64_after_hwframe+0x63/0xcd [672.739][T736] RIP: 0033:0x4556ad [672.742][T736] </TASK> [672.743][T736] [672.748][T736] Allocated by task 27677: [672.743][T736] kasan_save_stack+0x22/0x40 [672.741][T736] kasan_set_track+0x25/0x30 [672.741][T736] __kasan_kmalloc+0xa4/0xb0 [672.749][T736] btrfs_alloc_root+0x48/0x90 [672.746][T736] btrfs_create_tree+0x146/0xa20 [672.744][T736] btrfs_quota_enable+0x461/0x1d20 [672.743][T736] btrfs_ioctl+0x4a1c/0x5d80 [672.747][T736] __x64_sys_ioctl+0x198/0x210 [672.749][T736] do_syscall_64+0x39/0xb0 [672.744][T736] entry_SYSCALL_64_after_hwframe+0x63/0xcd [672.756][T736] [672.757][T736] Freed by task 27677: [672.759][T736] kasan_save_stack+0x22/0x40 [672.759][T736] kasan_set_track+0x25/0x30 [672.756][T736] kasan_save_free_info+0x2e/0x50 [672.751][T736] ____kasan_slab_free+0x162/0x1c0 [672.758][T736] slab_free_freelist_hook+0x89/0x1c0 [672.752][T736] __kmem_cache_free+0xaf/0x2e0 [672.752][T736] btrfs_put_root+0x1ff/0x2b0 [672.759][T736] btrfs_quota_disable+0x80a/0xbc0 [672.752][T736] btrfs_ioctl+0x3e5f/0x5d80 [672.756][T736] __x64_sys_ioctl+0x198/0x210 [672.753][T736] do_syscall_64+0x39/0xb0 [672.765][T736] entry_SYSCALL_64_after_hwframe+0x63/0xcd [672.769][T736] [672.768][T736] The buggy address belongs to the object at ffff888022ec0000 [672.768][T736] which belongs to the cache kmalloc-4k of size 4096 [672.769][T736] The buggy address is located 520 bytes inside of [672.769][T736] freed 4096-byte region [ffff888022ec0000, ffff888022ec1000) [672.760][T736] [672.764][T736] The buggy address belongs to the physical page: [672.761][T736] page:ffffea00008bb000 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x22ec0 [672.766][T736] head:ffffea00008bb000 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [672.779][T736] flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff) [672.770][T736] raw: 00fff00000010200 ffff888012842140 ffffea000054ba00 dead000000000002 [672.770][T736] raw: 0000000000000000 0000000000040004 00000001ffffffff 0000000000000000 [672.771][T736] page dumped because: kasan: bad access detected [672.778][T736] page_owner tracks the page as allocated [672.777][T736] page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd2040(__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 88 [672.779][T736] get_page_from_freelist+0x119c/0x2d50 [672.779][T736] __alloc_pages+0x1cb/0x4a0 [672.776][T736] alloc_pages+0x1aa/0x270 [672.773][T736] allocate_slab+0x260/0x390 [672.771][T736] ___slab_alloc+0xa9a/0x13e0 [672.778][T736] __slab_alloc.constprop.0+0x56/0xb0 [672.771][T736] __kmem_cache_alloc_node+0x136/0x320 [672.789][T736] __kmalloc+0x4e/0x1a0 [672.783][T736] tomoyo_realpath_from_path+0xc3/0x600 [672.781][T736] tomoyo_path_perm+0x22f/0x420 [672.782][T736] tomoyo_path_unlink+0x92/0xd0 [672.780][T736] security_path_unlink+0xdb/0x150 [672.788][T736] do_unlinkat+0x377/0x680 [672.788][T736] __x64_sys_unlink+0xca/0x110 [672.789][T736] do_syscall_64+0x39/0xb0 [672.783][T736] entry_SYSCALL_64_after_hwframe+0x63/0xcd [672.784][T736] page last free stack trace: [672.787][T736] free_pcp_prepare+0x4e5/0x920 [672.787][T736] free_unref_page+0x1d/0x4e0 [672.784][T736] __unfreeze_partials+0x17c/0x1a0 [672.797][T736] qlist_free_all+0x6a/0x180 [672.796][T736] kasan_quarantine_reduce+0x189/0x1d0 [672.797][T736] __kasan_slab_alloc+0x64/0x90 [672.793][T736] kmem_cache_alloc+0x17c/0x3c0 [672.799][T736] getname_flags.part.0+0x50/0x4e0 [672.799][T736] getname_flags+0x9e/0xe0 [672.792][T736] vfs_fstatat+0x77/0xb0 [672.791][T736] __do_sys_newlstat+0x84/0x100 [672.798][T736] do_syscall_64+0x39/0xb0 [672.796][T736] entry_SYSCALL_64_after_hwframe+0x63/0xcd [672.790][T736] [672.791][T736] Memory state around the buggy address: [672.799][T736] ffff888022ec0100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [672.805][T736] ffff888022ec0180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [672.802][T736] >ffff888022ec0200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [672.809][T736] ^ [672.809][T736] ffff888022ec0280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [672.809][T736] ffff888022ec0300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fix this by having the qgroup assign ioctl take the qgroup ioctl mutex before calling btrfs_run_qgroups(), which is what all qgroup ioctls should call. Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CAFcO6XN3VD8ogmHwqRk4kbiwtpUSNySu2VAxN8waEPciCHJvMA@mail.gmail.com/ CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
|
|
285063049a |
Merge tag 'for-6.3-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few more fixes, the zoned accounting fix is spread across a few
patches, preparatory and the actual fixes:
- zoned mode:
- fix accounting of unusable zone space
- fix zone activation condition for DUP profile
- preparatory patches
- improved error handling of missing chunks
- fix compiler warning"
* tag 'for-6.3-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: zoned: drop space_info->active_total_bytes
btrfs: zoned: count fresh BG region as zone unusable
btrfs: use temporary variable for space_info in btrfs_update_block_group
btrfs: rename BTRFS_FS_NO_OVERCOMMIT to BTRFS_FS_ACTIVE_ZONE_TRACKING
btrfs: zoned: fix btrfs_can_activate_zone() to support DUP profile
btrfs: fix compiler warning on SPARC/PA-RISC handling fscrypt_setup_filename
btrfs: handle missing chunk mapping more gracefully
|
||
|
|
e15acc2588 |
btrfs: zoned: drop space_info->active_total_bytes
The space_info->active_total_bytes is no longer necessary as we now
count the region of newly allocated block group as zone_unusable. Drop
its usage.
Fixes:
|
||
|
|
fa2068d7e9 |
btrfs: zoned: count fresh BG region as zone unusable
The naming of space_info->active_total_bytes is misleading. It counts
not only active block groups but also full ones which are previously
active but now inactive. That confusion results in a bug not counting
the full BGs into active_total_bytes on mount time.
For a background, there are three kinds of block groups in terms of
activation.
1. Block groups never activated
2. Block groups currently active
3. Block groups previously active and currently inactive (due to fully
written or zone finish)
What we really wanted to exclude from "total_bytes" is the total size of
BGs #1. They seem empty and allocatable but since they are not activated,
we cannot rely on them to do the space reservation.
And, since BGs #1 never get activated, they should have no "used",
"reserved" and "pinned" bytes.
OTOH, BGs #3 can be counted in the "total", since they are already full
we cannot allocate from them anyway. For them, "total_bytes == used +
reserved + pinned + zone_unusable" should hold.
Tracking #2 and #3 as "active_total_bytes" (current implementation) is
confusing. And, tracking #1 and subtract that properly from "total_bytes"
every time you need space reservation is cumbersome.
Instead, we can count the whole region of a newly allocated block group as
zone_unusable. Then, once that block group is activated, release
[0 .. zone_capacity] from the zone_unusable counters. With this, we can
eliminate the confusing ->active_total_bytes and the code will be common
among regular and the zoned mode. Also, no additional counter is needed
with this approach.
Fixes:
|
||
|
|
df384da5a4 |
btrfs: use temporary variable for space_info in btrfs_update_block_group
We do cache->space_info->counter += num_bytes; everywhere in here. This is makes the lines longer than they need to be, and will be especially noticeable when we add the active tracking in, so add a temp variable for the space_info so this is cleaner. Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
|
|
bf1f1fec27 |
btrfs: rename BTRFS_FS_NO_OVERCOMMIT to BTRFS_FS_ACTIVE_ZONE_TRACKING
This flag only gets set when we're doing active zone tracking, and we're going to need to use this flag for things related to this behavior. Rename the flag to represent what it actually means for the file system so it can be used in other ways and still make sense. Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
|
|
9e1cdf0c35 |
btrfs: zoned: fix btrfs_can_activate_zone() to support DUP profile
btrfs_can_activate_zone() returns true if at least one device has one zone
available for activation. This is OK for the single profile, but not OK for
DUP profile. We need two zones to create a DUP block group. Fix it by
properly handling the case with the profile flags.
Fixes:
|
||
|
|
10a8857a1b |
btrfs: fix compiler warning on SPARC/PA-RISC handling fscrypt_setup_filename
Commit
|
||
|
|
1c3ab6dfa0 |
btrfs: handle missing chunk mapping more gracefully
[BUG]
During my scrub rework, I did a stupid thing like this:
bio->bi_iter.bi_sector = stripe->logical;
btrfs_submit_bio(fs_info, bio, stripe->mirror_num);
Above bi_sector assignment is using logical address directly, which
lacks ">> SECTOR_SHIFT".
This results a read on a range which has no chunk mapping.
This results the following crash:
BTRFS critical (device dm-1): unable to find logical 11274289152 length 65536
assertion failed: !IS_ERR(em), in fs/btrfs/volumes.c:6387
Sure this is all my fault, but this shows a possible problem in real
world, that some bit flip in file extents/tree block can point to
unmapped ranges, and trigger above ASSERT(), or if CONFIG_BTRFS_ASSERT
is not configured, cause invalid pointer access.
[PROBLEMS]
In the above call chain, we just don't handle the possible error from
btrfs_get_chunk_map() inside __btrfs_map_block().
[FIX]
The fix is straightforward, replace the ASSERT() with proper error
handling (callers handle errors already).
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
||
|
|
ae195ca1a8 |
Merge tag 'for-6.3-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"First batch of fixes. Among them there are two updates to sysfs and
ioctl which are not strictly fixes but are used for testing so there's
no reason to delay them.
- fix block group item corruption after inserting new block group
- fix extent map logging bit not cleared for split maps after
dropping range
- fix calculation of unusable block group space reporting bogus
values due to 32/64b division
- fix unnecessary increment of read error stat on write error
- improve error handling in inode update
- export per-device fsid in DEV_INFO ioctl to distinguish seeding
devices, needed for testing
- allocator size classes:
- fix potential dead lock in size class loading logic
- print sysfs stats for the allocation classes"
* tag 'for-6.3-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix block group item corruption after inserting new block group
btrfs: fix extent map logging bit not cleared for split maps after dropping range
btrfs: fix percent calculation for bg reclaim message
btrfs: fix unnecessary increment of read error stat on write error
btrfs: handle btrfs_del_item errors in __btrfs_update_delayed_inode
btrfs: ioctl: return device fsid from DEV_INFO ioctl
btrfs: fix potential dead lock in size class loading logic
btrfs: sysfs: add size class stats
|
||
|
|
675dfe1223 |
btrfs: fix block group item corruption after inserting new block group
We can often end up inserting a block group item, for a new block group,
with a wrong value for the used bytes field.
This happens if for the new allocated block group, in the same transaction
that created the block group, we have tasks allocating extents from it as
well as tasks removing extents from it.
For example:
1) Task A creates a metadata block group X;
2) Two extents are allocated from block group X, so its "used" field is
updated to 32K, and its "commit_used" field remains as 0;
3) Transaction commit starts, by some task B, and it enters
btrfs_start_dirty_block_groups(). There it tries to update the block
group item for block group X, which currently has its "used" field with
a value of 32K. But that fails since the block group item was not yet
inserted, and so on failure update_block_group_item() sets the
"commit_used" field of the block group back to 0;
4) The block group item is inserted by task A, when for example
btrfs_create_pending_block_groups() is called when releasing its
transaction handle. This results in insert_block_group_item() inserting
the block group item in the extent tree (or block group tree), with a
"used" field having a value of 32K, but without updating the
"commit_used" field in the block group, which remains with value of 0;
5) The two extents are freed from block X, so its "used" field changes
from 32K to 0;
6) The transaction commit by task B continues, it enters
btrfs_write_dirty_block_groups() which calls update_block_group_item()
for block group X, and there it decides to skip the block group item
update, because "used" has a value of 0 and "commit_used" has a value
of 0 too.
As a result, we end up with a block item having a 32K "used" field but
no extents allocated from it.
When this issue happens, a btrfs check reports an error like this:
[1/7] checking root items
[2/7] checking extents
block group [1104150528 1073741824] used 39796736 but extent items used 0
ERROR: errors found in extent allocation tree or chunk allocation
(...)
Fix this by making insert_block_group_item() update the block group's
"commit_used" field.
Fixes:
|