When BLOCK_LEGACY_AUTOLOAD is not enable, mdadm is not able to
activate new arrays unless "CREATE names=yes" appears in
mdadm.conf
As this is a regression we need to always enable BLOCK_LEGACY_AUTOLOAD
for when MD is selected - at least until mdadm is updated and the
updates widely available.
Cc: stable@vger.kernel.org # v5.18+
Fixes: fbdee71bb5 ("block: deprecate autoloading based on dev_t")
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Song Liu <song@kernel.org>
While using iostat for raid, I observed very strange 'await'
occasionally, and turns out it's due to that 'ios' and 'sectors' is
counted in bdev_start_io_acct(), while 'nsecs' is counted in
bdev_end_io_acct(). I'm not sure why they are ccounted like that
but I think this behaviour is obviously wrong because user will get
wrong disk stats.
Fix the problem by counting 'ios' and 'sectors' when io is done, like
what rq-based device does.
Fixes: 394ffa503b ("blk: introduce generic io stat accounting help function")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230223091226.1135678-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In general, if swiotlb is sufficient, the logic of index =
wrap_area_index(mem, index + 1) is fine, it will quickly take a slot and
release the area->lock; But if swiotlb is insufficient and the device
has min_align_mask requirements, such as NVME, we may not be able to
satisfy index == wrap and exit the loop properly. In this case, other
kernel threads will not be able to acquire the area->lock and release
the slot, resulting in a deadlock.
The current implementation of wrap_area_index does not involve a modulo
operation, so adjusting the wrap to ensure the loop ends is not trivial.
Introduce a new variable to record the number of loops and exit the loop
after completing the traversal.
Backtraces:
Other CPUs are waiting this core to exit the swiotlb_do_find_slots
loop.
[10199.924391] RIP: 0010:swiotlb_do_find_slots+0x1fe/0x3e0
[10199.924403] Call Trace:
[10199.924404] <TASK>
[10199.924405] swiotlb_tbl_map_single+0xec/0x1f0
[10199.924407] swiotlb_map+0x5c/0x260
[10199.924409] ? nvme_pci_setup_prps+0x1ed/0x340
[10199.924411] dma_direct_map_page+0x12e/0x1c0
[10199.924413] nvme_map_data+0x304/0x370
[10199.924415] nvme_prep_rq.part.0+0x31/0x120
[10199.924417] nvme_queue_rq+0x77/0x1f0
...
[ 9639.596311] NMI backtrace for cpu 48
[ 9639.596336] Call Trace:
[ 9639.596337]
[ 9639.596338] _raw_spin_lock_irqsave+0x37/0x40
[ 9639.596341] swiotlb_do_find_slots+0xef/0x3e0
[ 9639.596344] swiotlb_tbl_map_single+0xec/0x1f0
[ 9639.596347] swiotlb_map+0x5c/0x260
[ 9639.596349] dma_direct_map_sg+0x7a/0x280
[ 9639.596352] __dma_map_sg_attrs+0x30/0x70
[ 9639.596355] dma_map_sgtable+0x1d/0x30
[ 9639.596356] nvme_map_data+0xce/0x370
...
[ 9639.595665] NMI backtrace for cpu 50
[ 9639.595682] Call Trace:
[ 9639.595682]
[ 9639.595683] _raw_spin_lock_irqsave+0x37/0x40
[ 9639.595686] swiotlb_release_slots.isra.0+0x86/0x180
[ 9639.595688] dma_direct_unmap_sg+0xcf/0x1a0
[ 9639.595690] nvme_unmap_data.part.0+0x43/0xc0
Fixes: 1f221a0d0d ("swiotlb: respect min_align_mask")
Signed-off-by: GuoRui.Yu <GuoRui.Yu@linux.alibaba.com>
Signed-off-by: Xiaokang Hu <xiaokang.hxk@alibaba-inc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
An nvme target ->queue_response() operation implementation may free the
request passed as argument. Such implementation potentially could result
in a use after free of the request pointer when percpu_ref_put() is
called in nvmet_req_complete().
Avoid such problem by using a local variable to save the sq pointer
before calling __nvmet_req_complete(), thus avoiding dereferencing the
req pointer after that function call.
Fixes: a07b4970f4 ("nvmet: add a generic NVMe target")
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
We have more commands to show in the trace. Sync up.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Make sure that we don't somehow mess up the wire structures in the spec.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kkch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
For non in-capsule writes we reuse the request pdu space for a h2cdata
pdu in order to avoid over allocating space (either preallocate or
dynamically upon receving an r2t pdu). However if the request times out
the core expects to find the opcode in the start of the request, which
we override.
In order to prevent that, without sacrificing additional 24 bytes per
request, we just use the tail of the command pdu space instead (last
24 bytes from the 72 bytes command pdu). That should make the command
opcode always available, and we get away from allocating more space.
If in the future we would need the last 24 bytes of the nvme command
available we would need to allocate a dedicated space for it in the
request, but until then we can avoid doing so.
Reported-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kkch@nvidia.com>
Tested-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
In case the nvme_probe teardown path is triggered the ctrl ref count does
not reach 0 thus creating a memory leak upon failure of nvme_probe.
Signed-off-by: Irvin Cote <irvincoteg@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
When investigating one customer report on warning in nvme_setup_discard,
we observed the controller(nvme/tcp) actually exposes
queue_max_discard_segments(req->q) == 1.
Obviously the current code can't handle this situation, since contiguity
merge like normal RW request is taken.
Fix the issue by building range from request sector/nr_sectors directly.
Fixes: b35ba01ea6 ("nvme: support ranged discard requests")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The T: entries shall be composed of a SCM tree type (git, hg, quilt, stgit
or topgit) and location.
Add the SCM tree type to the T: entry, and reorder the file entries in
alphabetical order.
Fixes: b508fc354f ("nvme: update maintainers information")
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
When injecting a fake timeout into the null_blk driver using
fail_io_timeout, the request timeout handler does not execute
blk_mq_complete_request(), so the complete callback is never executed
for a timedout request.
The null_blk driver also has a driver-specific fake timeout mechanism
which does not have this problem. Fix the problem with fail_io_timeout
by using the same meachanism as null_blk internal timeout feature, using
the fake_timeout field of null_blk commands.
Reported-by: Akinobu Mita <akinobu.mita@gmail.com>
Fixes: de3510e52b ("null_blk: fix command timeout completion handling")
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20230314041106.19173-2-damien.lemoal@opensource.wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The driver can be compile tested with !CONFIG_OF making certain data
unused:
drivers/net/wireless/marvell/mwifiex/sdio.c:498:34: error: ‘mwifiex_sdio_of_match_table’ defined but not used [-Werror=unused-const-variable=]
drivers/net/wireless/marvell/mwifiex/pcie.c:175:34: error: ‘mwifiex_pcie_of_match_table’ defined but not used [-Werror=unused-const-variable=]
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230312132523.352182-1-krzysztof.kozlowski@linaro.org
To support detection of read faults with Radix execute-only memory, the
vma_is_accessible() check in access_error() (which checks for PROT_NONE)
was replaced with a check to see if VM_READ was missing, and if so,
returns true to assert the fault was caused by a bad read.
This is incorrect, as it ignores that both VM_WRITE and VM_EXEC imply
read on powerpc, as defined in protection_map[]. This causes mappings
containing VM_WRITE or VM_EXEC without VM_READ to misreport the cause of
page faults, since the MMU is still allowing reads.
Correct this by restoring the original vma_is_accessible() check for
PROT_NONE mappings, and adding a separate check for Radix PROT_EXEC-only
mappings.
Fixes: 395cac7752 ("powerpc/mm: Support execute-only memory on the Radix MMU")
Reported-by: Michal Suchánek <msuchanek@suse.de>
Link: https://lore.kernel.org/r/20230308152702.GR19419@kitsune.suse.cz
Tested-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230310050834.63105-1-ruscur@russell.cc
Daniel Golle says:
====================
net: ethernet: mtk_eth_soc: minor SGMII fixes
This small series brings two minor fixes for the SGMII unit found in
MediaTek's router SoCs.
The first patch resets the PCS internal state machine on major
configuration changes, just like it is also done in MediaTek's SDK.
The second patch makes sure we only write values and restart AN if
actually needed, thus preventing unnesseray loss of an existing link
in some cases.
Both patches have previously been submitted as part of the series
"net: ethernet: mtk_eth_soc: various enhancements" which grew a bit
too big and it has correctly been criticized that some of the patches
should rather go as fixes to net-next.
This new series tries to address this.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Only restart auto-negotiation and write link timer if actually
necessary. This prevents losing the link in case of minor
changes.
Fixes: 7e53837269 ("net: ethernet: mediatek: Re-add support SGMII")
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reset the internal PCS state machine when changing interface mode.
This prevents confusing the state machine when changing interface
modes, e.g. from SGMII to 2500Base-X or vice-versa.
Fixes: 7e53837269 ("net: ethernet: mediatek: Re-add support SGMII")
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Packet length retrieved from skb data may be larger than
the actual socket buffer length (up to 9026 bytes). In such
case the cloned skb passed up the network stack will leak
kernel memory contents.
Fixes: d0cad87170 ("smsc75xx: SMSC LAN75xx USB gigabit ethernet adapter driver")
Signed-off-by: Szymon Heidrich <szymon.heidrich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wenjia Zhang says:
====================
net/smc: Fixes 2023-03-01
The 1st patch solves the problem that CLC message initialization was
not properly reversed in error handling path. And the 2nd one fixes
the possible deadlock triggered by cancel_delayed_work_sync().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Spectrum ASICs have a configurable limit on how deep into the packet
they parse. By default, the limit is 96 bytes.
There are several cases where this parsing depth is not enough and there
is a need to increase it. For example, timestamping of PTP packets and a
FIB multipath hash policy that requires hashing on inner fields. The
driver therefore maintains a reference count that reflects the number of
consumers that require an increased parsing depth.
During reload_down() the parsing depth reference count does not
necessarily drop to zero, but the parsing depth itself is restored to
the default during reload_up() when the firmware is reset. It is
therefore possible to end up in situations where the driver thinks that
the parsing depth was increased (reference count is non-zero), when it
is not.
Fix by making sure that all the consumers that increase the parsing
depth reference count also decrease it during reload_down().
Specifically, make sure that when the routing code is de-initialized it
drops the reference count if it was increased because of a FIB multipath
hash policy that requires hashing on inner fields.
Add a warning if the reference count is not zero after the driver was
de-initialized and explicitly reset it to zero during initialization for
good measures.
Fixes: 2d91f0803b ("mlxsw: spectrum: Add infrastructure for parsing configuration")
Reported-by: Maksym Yaremchuk <maksymy@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/9c35e1b3e6c1d8f319a2449d14e2b86373f3b3ba.1678727526.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This bug influences both st_nci_i2c_remove and st_nci_spi_remove.
Take st_nci_i2c_remove as an example.
In st_nci_i2c_probe, it called ndlc_probe and bound &ndlc->sm_work
with llt_ndlc_sm_work.
When it calls ndlc_recv or timeout handler, it will finally call
schedule_work to start the work.
When we call st_nci_i2c_remove to remove the driver, there
may be a sequence as follows:
Fix it by finishing the work before cleanup in ndlc_remove
CPU0 CPU1
|llt_ndlc_sm_work
st_nci_i2c_remove |
ndlc_remove |
st_nci_remove |
nci_free_device|
kfree(ndev) |
//free ndlc->ndev |
|llt_ndlc_rcv_queue
|nci_recv_frame
|//use ndlc->ndev
Fixes: 35630df68d ("NFC: st21nfcb: Add driver for STMicroelectronics ST21NFCB NFC chip")
Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230312160837.2040857-1-zyytlz.wz@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The test checks if (IPv4, IPv6) address pair properly conflict or not.
* IPv4
* 0.0.0.0
* 127.0.0.1
* IPv6
* ::
* ::1
If the IPv6 address is [::], the second bind() always fails.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use dh_listpackages to get a list of all binary packages.
With this, debian/control lists which binary packages will be produced.
Previously, ARCH=um listed linux-libc-dev in debian/control, but it
was not generated because each of mkdebian and builddeb independently
maintained the if-conditionals.
Another motivation is to allow scripts/package/builddeb to get the
package name (linux-image-*, etc.) dynamically from debian/control.
This will also allow the BuildProfile to control the generation of
the binary packages.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Commit 3ab18a625c ("kbuild: deb-pkg: improve the usability of source
package") set needless CROSS_COMPILE.
For example, 'make allnoconfig bindeb-pkg' on a x86_64 system will set
CROSS_COMPILE=i686-linux-gnu-, where the biarch compiler 'gcc' should
work for building the i386 kernel.
$ uname -m
x86_64
$ make allnoconfig bindeb-pkg >/dev/null
dpkg-architecture: warning: specified GNU system type i686-linux-gnu does not match CC system type x86_64-linux-gnu, try setting a correct CC environment variable
dpkg-source --before-build .
debian/rules binary
scripts/Kconfig.include:39: C compiler 'i686-linux-gnu-gcc' not found
make[6]: *** [scripts/kconfig/Makefile:77: olddefconfig] Error 1
make[5]: *** [Makefile:693: olddefconfig] Error 2
make[4]: *** [Makefile:358: __build_one_by_one] Error 2
make[3]: *** [debian/rules:7: build-arch] Error 2
dpkg-buildpackage: error: debian/rules binary subprocess returned exit status 2
make[2]: *** [scripts/Makefile.package:127: bindeb-pkg] Error 2
make[1]: *** [Makefile:1657: bindeb-pkg] Error 2
make: *** [Makefile:358: __build_one_by_one] Error 2
Check whether CROSS_COMPILE is defined, instead of whether it is non-empty.
If you invoke debian/rules via Kbuild, CROSS_COMPILE is always defined
in the top Makefile.
Fixes: 3ab18a625c ("kbuild: deb-pkg: improve the usability of source package")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
KERNELRELEASE does not need to match the package version in changelog.
Rather, it conventially matches what is called 'ABINAME', which is a
part of the binary package names.
Both are the same by default, but the former might be overridden by
KDEB_PKGVERSION. In this case, the resulting package would not boot
because /lib/modules/$(uname -r) does not point the module directory.
Partially revert 3ab18a625c ("kbuild: deb-pkg: improve the usability
of source package").
Reported-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Fixes: 3ab18a625c ("kbuild: deb-pkg: improve the usability of source package")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Since commit c5bf2efb05 ("kbuild: deb-pkg: fix binary-arch and clean
in debian/rules"), the source package generated by 'make deb-pkg' fails
to build.
I terribly missed the fact that the intdeb-pkg target may regenerate
include/config/kernel.release due to the following in the top Makefile:
%pkg: include/config/kernel.release FORCE
Restore KERNELRELEASE= option to avoid the kernel.release disagreement
between build-arch and binary-arch.
Fixes: c5bf2efb05 ("kbuild: deb-pkg: fix binary-arch and clean in debian/rules")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
That commit required the use of filechk_kernel.release for the
kernelrelease Makefile target. It is currently only being set when
KBUILD_EXTMOD is not set. Make sure it is set in that case as well.
Fixes: 1cb86b6c31 ("kbuild: save overridden KERNELRELEASE in include/config/kernel.release")
Signed-off-by: Tzafrir Cohen <nvidia@cohens.org.il>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Use DFS root session whenever possible to get new DFS referrals
otherwise we might end up with an IPC tcon (tcon->ses->tcon_ipc) that
doesn't respond to them. It should be safe accessing
@ses->dfs_root_ses directly in cifs_inval_name_dfs_link_error() as it
has same lifetime as of @tcon.
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Cc: stable@vger.kernel.org # 6.2
Signed-off-by: Steve French <stfrench@microsoft.com>
Return the DFS root session id in /proc/fs/cifs/DebugData to make it
easier to track which IPC tcon was used to get new DFS referrals for a
specific connection, and aids in debugging.
A simple output of it would be
Sessions:
1) Address: 192.168.1.13 Uses: 1 Capability: 0x300067 Session Status: 1
Security type: RawNTLMSSP SessionId: 0xd80000000009
User: 0 Cred User: 0
DFS root session id: 0x128006c000035
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Cc: stable@vger.kernel.org # 6.2
Signed-off-by: Steve French <stfrench@microsoft.com>
The getaffinity() system call uses 'cpumask_size()' to decide how big
the CPU mask is - so far so good. It is indeed the allocation size of a
cpumask.
But the code also assumes that the whole allocation is initialized
without actually doing so itself. That's wrong, because we might have
fixed-size allocations (making copying and clearing more efficient), but
not all of it is then necessarily used if 'nr_cpu_ids' is smaller.
Having checked other users of 'cpumask_size()', they all seem to be ok,
either using it purely for the allocation size, or explicitly zeroing
the cpumask before using the size in bytes to copy it.
See for example the ublk_ctrl_get_queue_affinity() function that uses
the proper 'zalloc_cpumask_var()' to make sure that the whole mask is
cleared, whether the storage is on the stack or if it was an external
allocation.
Fix this by just zeroing the allocation before using it. Do the same
for the compat version of sched_getaffinity(), which had the same logic.
Also, for consistency, make sched_getaffinity() use 'cpumask_bits()' to
access the bits. For a cpumask_var_t, it ends up being a pointer to the
same data either way, but it's just a good idea to treat it like you
would a 'cpumask_t'. The compat case already did that.
Reported-by: Ryan Roberts <ryan.roberts@arm.com>
Link: https://lore.kernel.org/lkml/7d026744-6bd6-6827-0471-b5e8eae0be3f@arm.com/
Cc: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Set the DFS root session pointer earlier when creating a new SMB
session to prevent racing with smb2_reconnect(), cifs_reconnect_tcon()
and DFS cache refresher.
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Cc: stable@vger.kernel.org # 6.2
Signed-off-by: Steve French <stfrench@microsoft.com>
When adapter is not found, pi->disconnect() is called without previous
pi->connect(). This results in error like this:
parport0: pata_parport tried to release parport when not owner
Add missing out_disconnect label and use it correctly.
Signed-off-by: Ondrej Zary <linux@zary.sk>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>