Commit Graph

1171294 Commits

Author SHA1 Message Date
Jens Axboe
890a2fb06e Merge tag 'nvme-6.3-2022-03-16' of git://git.infradead.org/nvme into block-6.3
Pull NVMe fixes from Christoph:

"nvme fixes for Linux 6.3

 - avoid potential UAF in nvmet_req_complete (Damien Le Moal)
 - more quirks (Elmer Miroslav Mosher Golovin, Philipp Geulen)
 - fix a memory leak in the nvme-pci probe teardown path (Irvin Cote)
 - repair the MAINTAINERS entry (Lukas Bulwahn)
 - fix handling single range discard request (Ming Lei)
 - show more opcode names in trace events (Minwoo Im)
 - fix nvme-tcp timeout reporting (Sagi Grimberg)"

* tag 'nvme-6.3-2022-03-16' of git://git.infradead.org/nvme:
  nvmet: avoid potential UAF in nvmet_req_complete()
  nvme-trace: show more opcode names
  nvme-tcp: add nvme-tcp pdu size build protection
  nvme-tcp: fix opcode reporting in the timeout handler
  nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM620
  nvme-pci: add NVME_QUIRK_BOGUS_NID for Netac NV3000
  nvme-pci: fixing memory leak in probe teardown path
  nvme: fix handling single range discard request
  MAINTAINERS: repair malformed T: entries in NVM EXPRESS DRIVERS
2023-03-16 07:01:48 -06:00
Andy Shevchenko
bbf860ed71 usb: gadget: Use correct endianness of the wLength field for WebUSB
WebUSB code uses wLength directly without proper endianness conversion.
Update it to use already prepared temporary variable w_length instead.

Fixes: 93c473948c ("usb: gadget: add WebUSB landing page support")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-By: Jó Ágila Bitsch <jgilab@gmail.com>
Link: https://lore.kernel.org/r/20230313154522.52684-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-16 12:45:35 +01:00
Yaroslav Furman
a37eb61b6e uas: Add US_FL_NO_REPORT_OPCODES for JMicron JMS583Gen 2
Just like other JMicron JMS5xx enclosures, it chokes on report-opcodes,
let's avoid them.

Signed-off-by: Yaroslav Furman <yaro330@gmail.com>
Cc: stable <stable@kernel.org>
Link: https://lore.kernel.org/r/20230312090745.47962-1-yaro330@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-16 12:44:17 +01:00
Pawel Laszczak
96b96b2a56 usb: cdnsp: changes PCI Device ID to fix conflict with CNDS3 driver
Patch changes CDNS_DEVICE_ID in USBSSP PCI Glue driver to remove
the conflict with Cadence USBSS driver.

cc: <stable@vger.kernel.org>
Fixes: 3d82904559 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver")
Signed-off-by: Pawel Laszczak <pawell@cadence.com>
Link: https://lore.kernel.org/r/20230309063048.299378-1-pawell@cadence.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-16 12:43:58 +01:00
Pawel Laszczak
1272fd652a usb: cdns3: Fix issue with using incorrect PCI device function
PCI based platform can have more than two PCI functions.
USBSS PCI Glue driver during initialization should
consider only DRD/HOST/DEVICE PCI functions and
all other should be ignored. This patch adds additional
condition which causes that only DRD and HOST/DEVICE
function will be accepted.

cc: <stable@vger.kernel.org>
Fixes: 7733f6c32e ("usb: cdns3: Add Cadence USB3 DRD Driver")
Signed-off-by: Pawel Laszczak <pawell@cadence.com>
Link: https://lore.kernel.org/r/20230308124427.311245-1-pawell@cadence.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-16 12:43:42 +01:00
Pawel Laszczak
5bc38d33a5 usb: cdnsp: Fixes issue with redundant Status Stage
In some cases, driver trees to send Status Stage twice.
The first one from upper layer of gadget usb subsystem and
second time from controller driver.
This patch fixes this issue and remove tricky handling of
SET_INTERFACE from controller driver which is no longer
needed.

cc: <stable@vger.kernel.org>
Fixes: 3d82904559 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver")
Signed-off-by: Pawel Laszczak <pawell@cadence.com>
Link: https://lore.kernel.org/r/20230307111420.376056-1-pawell@cadence.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-16 12:43:37 +01:00
Hongren Zheng
c3aa32ac86 MAINTAINERS: make me a reviewer of USB/IP
I think I am familiar enough with USB/IP and is adequate as a reviewer.

Every time there is some patch/bug, I wish I can get pinged
and I will feedback on that.

I had some contributions to USBIP and some support for it.

Contribution:
Commit 17af79321 ("docs: usbip: Fix major fields and
    descriptions in protocol")
Commit b737eecd4 ("usbip: tools: add options and examples
    in man page related to device mode")
Commit a58977b2f ("usbip: tools: add usage of device mode in
    usbip_list.c")

Support:
Commit 8f36b3b4e1 ("usbip: add USBIP_URB_* URB transfer flags")
Bug report: https://lore.kernel.org/lkml/ZBHxfUX60EyCMw5l@Sun/

I also have implemented a userspace usbip server in
https://github.com/canokeys/canokey-usbip
and maintain a list of usbip implementations
https://github.com/usbip/implementations

Signed-off-by: Hongren (Zenithal) Zheng <i@zenithal.me>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/ZBIBCRiFGSqQcOon@Sun
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-16 12:43:28 +01:00
Nikita Zhandarovich
cbebd68f59 x86/mm: Fix use of uninitialized buffer in sme_enable()
cmdline_find_option() may fail before doing any initialization of
the buffer array. This may lead to unpredictable results when the same
buffer is used later in calls to strncmp() function.  Fix the issue by
returning early if cmdline_find_option() returns an error.

Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.

Fixes: aca20d5462 ("x86/mm: Add support to make use of Secure Memory Encryption")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20230306160656.14844-1-n.zhandarovich@fintech.ru
2023-03-16 12:22:25 +01:00
Yu Zhe
7ad2c39860 xen: remove unnecessary (void*) conversions
Pointer variables of void * type do not require type cast.

Signed-off-by: Yu Zhe <yuzhe@nfschina.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230316083954.4223-1-yuzhe@nfschina.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-03-16 12:04:00 +01:00
Greg Kroah-Hartman
f551592f72 Merge tag 'icc-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/djakov/icc into char-misc-linus
Georgi writes:

interconnect fixes for v6.3-rc

This contains a bunch of fixes with the highlight being fixes for a race
condition that could sometimes occur during the interconnect provider
driver registration. There are also fixes for memory overallocation and
a memory leak.

- interconnect: qcom: osm-l3: fix icc_onecell_data allocation
- interconnect: qcom: sm8450: switch to qcom_icc_rpmh_* function
- interconnect: qcom: sm8550: switch to qcom_icc_rpmh_* function
- interconnect: qcom: qcm2290: Fix MASTER_SNOC_BIMC_NRT
- interconnect: fix mem leak when freeing nodes
- interconnect: fix icc_provider_del() error handling
- interconnect: fix provider registration API
- interconnect: imx: fix registration race
- interconnect: qcom: osm-l3: fix registration race
- interconnect: qcom: rpm: fix probe child-node error handling
- interconnect: qcom: rpm: fix registration race
- interconnect: qcom: rpmh: fix probe child-node error handling
- interconnect: qcom: rpmh: fix registration race
- interconnect: qcom: msm8974: fix registration race
- interconnect: exynos: fix node leak in probe PM QoS error path
- interconnect: exynos: fix registration race
- interconnect: exynos: drop redundant link destroy
- memory: tegra: fix interconnect registration race
- memory: tegra124-emc: fix interconnect registration race
- memory: tegra20-emc: fix interconnect registration race
- memory: tegra30-emc: fix interconnect registration race

Signed-off-by: Georgi Djakov <djakov@kernel.org>

* tag 'icc-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/djakov/icc: (21 commits)
  memory: tegra30-emc: fix interconnect registration race
  memory: tegra20-emc: fix interconnect registration race
  memory: tegra124-emc: fix interconnect registration race
  memory: tegra: fix interconnect registration race
  interconnect: exynos: drop redundant link destroy
  interconnect: exynos: fix registration race
  interconnect: exynos: fix node leak in probe PM QoS error path
  interconnect: qcom: msm8974: fix registration race
  interconnect: qcom: rpmh: fix registration race
  interconnect: qcom: rpmh: fix probe child-node error handling
  interconnect: qcom: rpm: fix registration race
  interconnect: qcom: rpm: fix probe child-node error handling
  interconnect: qcom: osm-l3: fix registration race
  interconnect: imx: fix registration race
  interconnect: fix provider registration API
  interconnect: fix icc_provider_del() error handling
  interconnect: fix mem leak when freeing nodes
  interconnect: qcom: qcm2290: Fix MASTER_SNOC_BIMC_NRT
  interconnect: qcom: sm8550: switch to qcom_icc_rpmh_* function
  interconnect: qcom: sm8450: switch to qcom_icc_rpmh_* function
  ...
2023-03-16 10:55:42 +01:00
Ondrej Zary
5bc9e2d43f ata: pata_parport: fix memory leaks
When ida_alloc() fails, "pi" is not freed although the misleading
comment says otherwise.
Move the ida_alloc() call up so we really don't have to free "pi" in
case of ida_alloc() failure.

Also move ida_free() call from pi_remove_one() to
pata_parport_dev_release(). It was dereferencing already freed dev
pointer.

Testing revealed leak even in non-failure case which was tracked down
to missing put_device() call after bus_find_device_by_name(). As a
result, pata_parport_dev_release() was never called.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/r/202303111822.IHNchbkp-lkp@intel.com/
Signed-off-by: Ondrej Zary <linux@zary.sk>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2023-03-16 16:54:38 +09:00
Eric Biggers
a075bacde2 fsverity: don't drop pagecache at end of FS_IOC_ENABLE_VERITY
The full pagecache drop at the end of FS_IOC_ENABLE_VERITY is causing
performance problems and is hindering adoption of fsverity.  It was
intended to solve a race condition where unverified pages might be left
in the pagecache.  But actually it doesn't solve it fully.

Since the incomplete solution for this race condition has too much
performance impact for it to be worth it, let's remove it for now.

Fixes: 3fda4c617e ("fs-verity: implement FS_IOC_ENABLE_VERITY ioctl")
Cc: stable@vger.kernel.org
Reviewed-by: Victor Hsieh <victorhsieh@google.com>
Link: https://lore.kernel.org/r/20230314235332.50270-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-03-15 22:50:41 -07:00
Vladimir Oltean
cd356010ce net: phy: mscc: fix deadlock in phy_ethtool_{get,set}_wol()
Since the blamed commit, phy_ethtool_get_wol() and phy_ethtool_set_wol()
acquire phydev->lock, but the mscc phy driver implementations,
vsc85xx_wol_get() and vsc85xx_wol_set(), acquire the same lock as well,
resulting in a deadlock.

$ ip link set swp3 down
============================================
WARNING: possible recursive locking detected
mscc_felix 0000:00:00.5 swp3: Link is Down
--------------------------------------------
ip/375 is trying to acquire lock:
ffff3d7e82e987a8 (&dev->lock){+.+.}-{4:4}, at: vsc85xx_wol_get+0x2c/0xf4

but task is already holding lock:
ffff3d7e82e987a8 (&dev->lock){+.+.}-{4:4}, at: phy_ethtool_get_wol+0x3c/0x6c

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&dev->lock);
  lock(&dev->lock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

2 locks held by ip/375:
 #0: ffffd43b2a955788 (rtnl_mutex){+.+.}-{4:4}, at: rtnetlink_rcv_msg+0x144/0x58c
 #1: ffff3d7e82e987a8 (&dev->lock){+.+.}-{4:4}, at: phy_ethtool_get_wol+0x3c/0x6c

Call trace:
 __mutex_lock+0x98/0x454
 mutex_lock_nested+0x2c/0x38
 vsc85xx_wol_get+0x2c/0xf4
 phy_ethtool_get_wol+0x50/0x6c
 phy_suspend+0x84/0xcc
 phy_state_machine+0x1b8/0x27c
 phy_stop+0x70/0x154
 phylink_stop+0x34/0xc0
 dsa_port_disable_rt+0x2c/0xa4
 dsa_slave_close+0x38/0xec
 __dev_close_many+0xc8/0x16c
 __dev_change_flags+0xdc/0x218
 dev_change_flags+0x24/0x6c
 do_setlink+0x234/0xea4
 __rtnl_newlink+0x46c/0x878
 rtnl_newlink+0x50/0x7c
 rtnetlink_rcv_msg+0x16c/0x58c

Removing the mutex_lock(&phydev->lock) calls from the driver restores
the functionality.

Fixes: 2f987d4866 ("net: phy: Add locks to ethtool functions")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230314153025.2372970-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15 21:33:22 -07:00
Shawn Bohrer
7c10131803 veth: Fix use after free in XDP_REDIRECT
Commit 718a18a0c8 ("veth: Rework veth_xdp_rcv_skb in order
to accept non-linear skb") introduced a bug where it tried to
use pskb_expand_head() if the headroom was less than
XDP_PACKET_HEADROOM.  This however uses kmalloc to expand the head,
which will later allow consume_skb() to free the skb while is it still
in use by AF_XDP.

Previously if the headroom was less than XDP_PACKET_HEADROOM we
continued on to allocate a new skb from pages so this restores that
behavior.

BUG: KASAN: use-after-free in __xsk_rcv+0x18d/0x2c0
Read of size 78 at addr ffff888976250154 by task napi/iconduit-g/148640

CPU: 5 PID: 148640 Comm: napi/iconduit-g Kdump: loaded Tainted: G           O       6.1.4-cloudflare-kasan-2023.1.2 #1
Hardware name: Quanta Computer Inc. QuantaPlex T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
Call Trace:
  <TASK>
  dump_stack_lvl+0x34/0x48
  print_report+0x170/0x473
  ? __xsk_rcv+0x18d/0x2c0
  kasan_report+0xad/0x130
  ? __xsk_rcv+0x18d/0x2c0
  kasan_check_range+0x149/0x1a0
  memcpy+0x20/0x60
  __xsk_rcv+0x18d/0x2c0
  __xsk_map_redirect+0x1f3/0x490
  ? veth_xdp_rcv_skb+0x89c/0x1ba0 [veth]
  xdp_do_redirect+0x5ca/0xd60
  veth_xdp_rcv_skb+0x935/0x1ba0 [veth]
  ? __netif_receive_skb_list_core+0x671/0x920
  ? veth_xdp+0x670/0x670 [veth]
  veth_xdp_rcv+0x304/0xa20 [veth]
  ? do_xdp_generic+0x150/0x150
  ? veth_xdp_rcv_one+0xde0/0xde0 [veth]
  ? _raw_spin_lock_bh+0xe0/0xe0
  ? newidle_balance+0x887/0xe30
  ? __perf_event_task_sched_in+0xdb/0x800
  veth_poll+0x139/0x571 [veth]
  ? veth_xdp_rcv+0xa20/0xa20 [veth]
  ? _raw_spin_unlock+0x39/0x70
  ? finish_task_switch.isra.0+0x17e/0x7d0
  ? __switch_to+0x5cf/0x1070
  ? __schedule+0x95b/0x2640
  ? io_schedule_timeout+0x160/0x160
  __napi_poll+0xa1/0x440
  napi_threaded_poll+0x3d1/0x460
  ? __napi_poll+0x440/0x440
  ? __kthread_parkme+0xc6/0x1f0
  ? __napi_poll+0x440/0x440
  kthread+0x2a2/0x340
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x22/0x30
  </TASK>

Freed by task 148640:
  kasan_save_stack+0x23/0x50
  kasan_set_track+0x21/0x30
  kasan_save_free_info+0x2a/0x40
  ____kasan_slab_free+0x169/0x1d0
  slab_free_freelist_hook+0xd2/0x190
  __kmem_cache_free+0x1a1/0x2f0
  skb_release_data+0x449/0x600
  consume_skb+0x9f/0x1c0
  veth_xdp_rcv_skb+0x89c/0x1ba0 [veth]
  veth_xdp_rcv+0x304/0xa20 [veth]
  veth_poll+0x139/0x571 [veth]
  __napi_poll+0xa1/0x440
  napi_threaded_poll+0x3d1/0x460
  kthread+0x2a2/0x340
  ret_from_fork+0x22/0x30

The buggy address belongs to the object at ffff888976250000
  which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 340 bytes inside of
  2048-byte region [ffff888976250000, ffff888976250800)

The buggy address belongs to the physical page:
page:00000000ae18262a refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x976250
head:00000000ae18262a order:3 compound_mapcount:0 compound_pincount:0
flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
raw: 002ffff800010200 0000000000000000 dead000000000122 ffff88810004cf00
raw: 0000000000000000 0000000080080008 00000002ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff888976250000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff888976250080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff888976250100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                  ^
  ffff888976250180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff888976250200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 718a18a0c8 ("veth: Rework veth_xdp_rcv_skb in order to accept non-linear skb")
Signed-off-by: Shawn Bohrer <sbohrer@cloudflare.com>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Acked-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@kernel.org>
Link: https://lore.kernel.org/r/20230314153351.2201328-1-sbohrer@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-15 21:11:06 -07:00
Lars-Peter Clausen
ab00709310 hwmon: (ltc2992) Set can_sleep flag for GPIO chip
The ltc2992 drivers uses a mutex and I2C bus access in its GPIO chip `set`
and `get` implementation. This means these functions can sleep and the GPIO
chip should set the `can_sleep` property to true.

This will ensure that a warning is printed when trying to set or get the
GPIO value from a context that potentially can't sleep.

Fixes: 9ca26df1ba ("hwmon: (ltc2992) Add support for GPIOs.")
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Link: https://lore.kernel.org/r/20230314093146.2443845-2-lars@metafoo.de
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2023-03-15 19:15:00 -07:00
Lars-Peter Clausen
a5bb73b3f5 hwmon: (adm1266) Set can_sleep flag for GPIO chip
The adm1266 driver uses I2C bus access in its GPIO chip `set` and `get`
implementation. This means these functions can sleep and the GPIO chip
should set the `can_sleep` property to true.

This will ensure that a warning is printed when trying to set or get the
GPIO value from a context that potentially can't sleep.

Fixes: d98dfad35c ("hwmon: (pmbus/adm1266) Add support for GPIOs")
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Link: https://lore.kernel.org/r/20230314093146.2443845-1-lars@metafoo.de
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2023-03-15 19:14:04 -07:00
Li zeming
6acd352dfe io_uring: rsrc: Optimize return value variable 'ret'
The initialization assignment of the variable ret is changed to 0, only
in 'goto fail;' Use the ret variable as the function return value.

Signed-off-by: Li zeming <zeming@nfschina.com>
Link: https://lore.kernel.org/r/20230317182538.3027-1-zeming@nfschina.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-15 19:59:11 -06:00
Oz Shlomo
c7b7c64ab5 net/mlx5e: TC, Remove error message log print
The cited commit attempts to update the hw stats when dumping tc actions.
However, the driver may be called to update the stats of a police action
that may not be in hardware. In such cases the driver will fail to lookup
the police action object and will output an error message both to extack
and dmesg. The dmesg error is confusing as it may not indicate an actual
error.

Remove the dmesg error.

Fixes: 2b68d659a7 ("net/mlx5e: TC, support per action stats")
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15 15:50:18 -07:00
Oz Shlomo
b23bf10cca net/mlx5e: TC, fix cloned flow attribute
Currently the cloned flow attr resets the original tc action cookies
count.
Fix that by resetting the cloned flow attribute.

Fixes: cca7eac138 ("net/mlx5e: TC, store tc action cookies per attr")
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15 15:50:18 -07:00
Oz Shlomo
1166add424 net/mlx5e: TC, fix missing error code
Missing error code when mlx5e_tc_act_stats_create fails

Fixes: d13674b1d1 ("net/mlx5e: TC, map tc action cookie to a hw counter")
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15 15:50:17 -07:00
Oz Shlomo
d1a0075ad6 net/sched: TC, fix raw counter initialization
Freed counters may be reused by fs core.
As such, raw counters may not be initialized to zero.

Cache the counter values when the action stats object is initialized to
have a proper base value for calculating the difference from the previous
query.

Fixes: 2b68d659a7 ("net/mlx5e: TC, support per action stats")
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15 15:50:17 -07:00
Adham Faris
78dee7befd net/mlx5e: Lower maximum allowed MTU in XSK to match XDP prerequisites
XSK redirecting XDP programs require linearity, hence applies
restrictions on the MTU. For PAGE_SIZE=4K, MTU shouldn't exceed 3498.

Features that contradict with XDP such HW-LRO and HW-GRO are enforced
by the driver in advance, during XSK params validation, except for MTU,
which was not enforced before this patch.

This has been spotted during test scenario described below:
Attaching xdpsock program (PAGE_SIZE=4K), with MTU < 3498, detaching
XDP program, changing the MTU to arbitrary value in the range
[3499, 3754], attaching XDP program again, which ended up with failure
since MTU is > 3498.

This commit lowers the XSK MTU limitation to be aligned with XDP MTU
limitation, since XSK socket is meaningless without XDP program.

Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15 15:50:17 -07:00
Shay Drory
031a163f2c net/mlx5: Set BREAK_FW_WAIT flag first when removing driver
Currently, BREAK_FW_WAIT flag is set after syncing with fw_reset.
However, fw_reset can call mlx5_load_one() which is waiting for fw
init bit and BREAK_FW_WAIT flag is intended to stop. e.g.: the driver
might wait on a loop it should exit.
Fix it by setting the flag before syncing with fw_reset.

Fixes: 8324a02c34 ("net/mlx5: Add exit route when waiting for FW")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15 15:50:17 -07:00
Gal Pressman
dd64572490 net/mlx5e: kTLS, Fix missing error unwind on unsupported cipher type
Do proper error unwinding when adding an unsupported TX/RX cipher type.
Move the switch case prior to key creation so there's less to unwind,
and change the goto label name to describe the action performed instead
of what failed.

Fixes: 4960c414db ("net/mlx5e: Support 256 bit keys with kTLS device offload")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15 15:50:17 -07:00
Paul Blakey
c9668f0b1d net/mlx5e: Fix cleanup null-ptr deref on encap lock
During module is unloaded while a peer tc flow is still offloaded,
first the peer uplink rep profile is changed to a nic profile, and so
neigh encap lock is destroyed. Next during unload, the VF reps netdevs
are unregistered which causes the original non-peer tc flow to be deleted,
which deletes the peer flow. The peer flow deletion detaches the encap
entry and try to take the already destroyed encap lock, causing the
below trace.

Fix this by clearing peer flows during tc eswitch cleanup
(mlx5e_tc_esw_cleanup()).

Relevant trace:
[ 4316.837128] BUG: kernel NULL pointer dereference, address: 00000000000001d8
[ 4316.842239] RIP: 0010:__mutex_lock+0xb5/0xc40
[ 4316.851897] Call Trace:
[ 4316.852481]  <TASK>
[ 4316.857214]  mlx5e_rep_neigh_entry_release+0x93/0x790 [mlx5_core]
[ 4316.858258]  mlx5e_rep_encap_entry_detach+0xa7/0xf0 [mlx5_core]
[ 4316.859134]  mlx5e_encap_dealloc+0xa3/0xf0 [mlx5_core]
[ 4316.859867]  clean_encap_dests.part.0+0x5c/0xe0 [mlx5_core]
[ 4316.860605]  mlx5e_tc_del_fdb_flow+0x32a/0x810 [mlx5_core]
[ 4316.862609]  __mlx5e_tc_del_fdb_peer_flow+0x1a2/0x250 [mlx5_core]
[ 4316.863394]  mlx5e_tc_del_flow+0x(/0x630 [mlx5_core]
[ 4316.864090]  mlx5e_flow_put+0x5f/0x100 [mlx5_core]
[ 4316.864771]  mlx5e_delete_flower+0x4de/0xa40 [mlx5_core]
[ 4316.865486]  tc_setup_cb_reoffload+0x20/0x80
[ 4316.865905]  fl_reoffload+0x47c/0x510 [cls_flower]
[ 4316.869181]  tcf_block_playback_offloads+0x91/0x1d0
[ 4316.869649]  tcf_block_unbind+0xe7/0x1b0
[ 4316.870049]  tcf_block_offload_cmd.isra.0+0x1ee/0x270
[ 4316.879266]  tcf_block_offload_unbind+0x61/0xa0
[ 4316.879711]  __tcf_block_put+0xa4/0x310

Fixes: 04de7dda73 ("net/mlx5e: Infrastructure for duplicated offloading of TC flows")
Fixes: 1418ddd96a ("net/mlx5e: Duplicate offloaded TC eswitch rules under uplink LAG")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15 15:50:16 -07:00
Maor Dickman
28d3815a62 net/mlx5: E-switch, Fix missing set of split_count when forward to ovs internal port
Rules with mirror actions are split to two FTEs when the actions after the mirror
action contains pedit, vlan push/pop or ct. Forward to ovs internal port adds
implicit header rewrite (pedit) but missing trigger to do split.

Fix by setting split_count when forwarding to ovs internal port which
will trigger split in mirror rules.

Fixes: 27484f7170 ("net/mlx5e: Offload tc rules that redirect to ovs internal port")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15 15:50:16 -07:00
Maor Dickman
1313d78ac0 net/mlx5: E-switch, Fix wrong usage of source port rewrite in split rules
In few cases, rules with mirror use case are split to two FTEs, one which
do the mirror action and forward to second FTE which do the rest of the rule
actions and the second redirect action.
In case of mirror rules which do split and forward to ovs internal port or
VF stack devices, source port rewrite should be used in the second FTE but
it is wrongly also set in the first FTE which break the offload.

Fix this issue by removing the wrong check if source port rewrite is needed to
be used on the first FTE of the split and instead return EOPNOTSUPP which will
block offload of rules which mirror to ovs internal port or VF stack devices
which isn't supported.

Fixes: 10742efc20 ("net/mlx5e: VF tunnel TX traffic offloading")
Fixes: a508728a4c ("net/mlx5e: VF tunnel RX traffic offloading")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15 15:50:16 -07:00
Daniel Jurgens
7ba930fc25 net/mlx5: Disable eswitch before waiting for VF pages
The offending commit changed the ordering of moving to legacy mode and
waiting for the VF pages. Moving to legacy mode is important in
bluefield, because it sends the host driver into error state, and frees
its pages. Without this transition we end up waiting 2 minutes for
pages that aren't coming before carrying on with the unload process.

Fixes: f019679ea5 ("net/mlx5: E-switch, Remove dependency between sriov and eswitch mode")
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15 15:50:16 -07:00
Parav Pandit
ba5d8f72b8 net/mlx5: Fix setting ec_function bit in MANAGE_PAGES
When ECPF is a page supplier, reclaim pages missed to honor the
ec_function bit provided by the firmware. It always used the ec_function
to true during driver unload flow for ECPF. This is incorrect.

Honor the ec_function bit provided by device during page allocation
request event.

Fixes: d6945242f4 ("net/mlx5: Hold pages RB tree per VF")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15 15:50:16 -07:00
Parav Pandit
9a92fe1db9 net/mlx5e: Don't cache tunnel offloads capability
When mlx5e attaches again after device health recovery, the device
capabilities might have changed by the eswitch manager.

For example in one flow when ECPF changes the eswitch mode between
legacy and switchdev, it updates the flow table tunnel capability.

The cached value is only used in one place, so just check the capability
there instead.

Fixes: 5bef709d76 ("net/mlx5: Enable host PF HCA after eswitch is initialized")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15 15:50:15 -07:00
Emeel Hakim
37beabe9a8 net/mlx5e: Fix macsec ASO context alignment
Currently mlx5e_macsec_umr struct does not satisfy hardware memory
alignment requirement. Hence the result of querying advanced steering
operation (ASO) is not copied to the memory region as expected.

Fix by satisfying hardware memory alignment requirement and move
context to be first field in struct for better readability.

Fixes: 1f53da6764 ("net/mlx5e: Create advanced steering operation (ASO) object for MACsec")
Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-15 15:50:15 -07:00
Felix Kuehling
f3921a9a64 drm/amdgpu: Don't resume IOMMU after incomplete init
Check kfd->init_complete in kgd2kfd_iommu_resume, consistent with other
kgd2kfd calls. This should fix IOMMU errors on resume from suspend when
KFD IOMMU initialization failed.

Reported-by: Matt Fagnani <matt.fagnani@bell.net>
Link: https://lore.kernel.org/r/4a3b225c-2ffd-e758-4de1-447375e34cad@bell.net/
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217170
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2454
Cc: Vasant Hegde <vasant.hegde@amd.com>
Cc: Linux regression tracking (Thorsten Leemhuis) <regressions@leemhuis.info>
Cc: stable@vger.kernel.org
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Matt Fagnani <matt.fagnani@bell.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-15 18:21:51 -04:00
David Belanger
20bc9f76b6 drm/amdkfd: Fixed kfd_process cleanup on module exit.
Handle case when module is unloaded (kfd_exit) before a process space
(mm_struct) is released.

v2: Fixed potential race conditions by removing all kfd_process from
the process table first, then working on releasing the resources.

v3: Fixed loop element access / synchronization.  Fixed extra empty lines.

Signed-off-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-15 18:21:05 -04:00
Shawn Wang
0424a7dfe9 x86/resctrl: Clear staged_config[] before and after it is used
As a temporary storage, staged_config[] in rdt_domain should be cleared
before and after it is used. The stale value in staged_config[] could
cause an MSR access error.

Here is a reproducer on a system with 16 usable CLOSIDs for a 15-way L3
Cache (MBA should be disabled if the number of CLOSIDs for MB is less than
16.) :
	mount -t resctrl resctrl -o cdp /sys/fs/resctrl
	mkdir /sys/fs/resctrl/p{1..7}
	umount /sys/fs/resctrl/
	mount -t resctrl resctrl /sys/fs/resctrl
	mkdir /sys/fs/resctrl/p{1..8}

An error occurs when creating resource group named p8:
    unchecked MSR access error: WRMSR to 0xca0 (tried to write 0x00000000000007ff) at rIP: 0xffffffff82249142 (cat_wrmsr+0x32/0x60)
    Call Trace:
     <IRQ>
     __flush_smp_call_function_queue+0x11d/0x170
     __sysvec_call_function+0x24/0xd0
     sysvec_call_function+0x89/0xc0
     </IRQ>
     <TASK>
     asm_sysvec_call_function+0x16/0x20

When creating a new resource control group, hardware will be configured
by the following process:
    rdtgroup_mkdir()
      rdtgroup_mkdir_ctrl_mon()
        rdtgroup_init_alloc()
          resctrl_arch_update_domains()

resctrl_arch_update_domains() iterates and updates all resctrl_conf_type
whose have_new_ctrl is true. Since staged_config[] holds the same values as
when CDP was enabled, it will continue to update the CDP_CODE and CDP_DATA
configurations. When group p8 is created, get_config_index() called in
resctrl_arch_update_domains() will return 16 and 17 as the CLOSIDs for
CDP_CODE and CDP_DATA, which will be translated to an invalid register -
0xca0 in this scenario.

Fix it by clearing staged_config[] before and after it is used.

[reinette: re-order commit tags]

Fixes: 75408e4350 ("x86/resctrl: Allow different CODE/DATA configurations to be staged")
Suggested-by: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Shawn Wang <shawnwang@linux.alibaba.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/2fad13f49fbe89687fc40e9a5a61f23a28d1507a.1673988935.git.reinette.chatre%40intel.com
2023-03-15 15:19:43 -07:00
Ayush Gupta
7304ee979b drm/amd/display: disconnect MPCC only on OTG change
[Why]
Framedrops are observed while playing Vp9 and Av1 10 bit
video on 8k resolution using VSR while playback controls
are disappeared/appeared

[How]
Now ODM 2 to 1 is disabled for 5k or greater resolutions on VSR.

Cc: stable@vger.kernel.org
Cc: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Ayush Gupta <ayugupta@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-15 18:19:35 -04:00
Cruise Hung
cbd6c1b17d drm/amd/display: Fix DP MST sinks removal issue
[Why]
In USB4 DP tunneling, it's possible to have this scenario that
the path becomes unavailable and CM tears down the path a little bit late.
So, in this case, the HPD is high but fails to read any DPCD register.
That causes the link connection type to be set to sst.
And not all sinks are removed behind the MST branch.

[How]
Restore the link connection type if it fails to read DPCD register.

Cc: stable@vger.kernel.org
Cc: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Cruise Hung <Cruise.Hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-15 18:18:45 -04:00
Wesley Chalmers
56574f89db drm/amd/display: Do not set DRR on pipe Commit
[WHY]
Writing to DRR registers such as OTG_V_TOTAL_MIN on the same frame as a
pipe commit can cause underflow.

Cc: stable@vger.kernel.org
Cc: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Wesley Chalmers <Wesley.Chalmers@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-15 18:18:00 -04:00
Saaem Rizvi
709671ffb1 drm/amd/display: Remove OTG DIV register write for Virtual signals.
[WHY]
Hot plugging and then hot unplugging leads to k1 and k2 values to
change, as signal is detected as a virtual signal on hot unplug. Writing
these values to OTG_PIXEL_RATE_DIV register might cause primary display
to blank (known hw bug).

[HOW]
No longer write k1 and k2 values to register if signal is virtual, we
have safe guards in place in the case that k1 and k2 is unassigned so
that an unknown value is not written to the register either.

Cc: stable@vger.kernel.org
Cc: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Samson Tam <Samson.Tam@amd.com>
Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Saaem Rizvi <SyedSaaem.Rizvi@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-15 18:17:22 -04:00
Budimir Markovic
fd0815f632 perf: Fix check before add_event_to_groups() in perf_group_detach()
Events should only be added to a groups rb tree if they have not been
removed from their context by list_del_event(). Since remove_on_exec
made it possible to call list_del_event() on individual events before
they are detached from their group, perf_group_detach() should check each
sibling's attach_state before calling add_event_to_groups() on it.

Fixes: 2e498d0a74 ("perf: Add support for event removal on exec")
Signed-off-by: Budimir Markovic <markovicbudimir@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/ZBFzvQV9tEqoHEtH@gentoo
2023-03-15 21:49:47 +01:00
Song Liu
baf1b12a67 perf: fix perf_event_context->time
Time readers rely on perf_event_context->[time|timestamp|timeoffset] to get
accurate time_enabled and time_running for an event. The difference between
ctx->timestamp and ctx->time is the among of time when the context is not
enabled. __update_context_time(ctx, false) is used to increase timestamp,
but not time. Therefore, it should only be called in ctx_sched_in() when
EVENT_TIME was not enabled.

Fixes: 09f5e7dc7a ("perf: Fix perf_event_read_local() time")
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/r/20230313171608.298734-1-song@kernel.org
2023-03-15 21:49:46 +01:00
Yang Jihong
eb81a2ed4f perf/core: Fix perf_output_begin parameter is incorrectly invoked in perf_event_bpf_output
syzkaller reportes a KASAN issue with stack-out-of-bounds.
The call trace is as follows:
  dump_stack+0x9c/0xd3
  print_address_description.constprop.0+0x19/0x170
  __kasan_report.cold+0x6c/0x84
  kasan_report+0x3a/0x50
  __perf_event_header__init_id+0x34/0x290
  perf_event_header__init_id+0x48/0x60
  perf_output_begin+0x4a4/0x560
  perf_event_bpf_output+0x161/0x1e0
  perf_iterate_sb_cpu+0x29e/0x340
  perf_iterate_sb+0x4c/0xc0
  perf_event_bpf_event+0x194/0x2c0
  __bpf_prog_put.constprop.0+0x55/0xf0
  __cls_bpf_delete_prog+0xea/0x120 [cls_bpf]
  cls_bpf_delete_prog_work+0x1c/0x30 [cls_bpf]
  process_one_work+0x3c2/0x730
  worker_thread+0x93/0x650
  kthread+0x1b8/0x210
  ret_from_fork+0x1f/0x30

commit 267fb27352 ("perf: Reduce stack usage of perf_output_begin()")
use on-stack struct perf_sample_data of the caller function.

However, perf_event_bpf_output uses incorrect parameter to convert
small-sized data (struct perf_bpf_event) into large-sized data
(struct perf_sample_data), which causes memory overwriting occurs in
__perf_event_header__init_id.

Fixes: 267fb27352 ("perf: Reduce stack usage of perf_output_begin()")
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230314044735.56551-1-yangjihong1@huawei.com
2023-03-15 21:49:46 +01:00
Naohiro Aota
e15acc2588 btrfs: zoned: drop space_info->active_total_bytes
The space_info->active_total_bytes is no longer necessary as we now
count the region of newly allocated block group as zone_unusable. Drop
its usage.

Fixes: 6a921de589 ("btrfs: zoned: introduce space_info->active_total_bytes")
CC: stable@vger.kernel.org # 6.1+
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-15 20:51:07 +01:00
Naohiro Aota
fa2068d7e9 btrfs: zoned: count fresh BG region as zone unusable
The naming of space_info->active_total_bytes is misleading. It counts
not only active block groups but also full ones which are previously
active but now inactive. That confusion results in a bug not counting
the full BGs into active_total_bytes on mount time.

For a background, there are three kinds of block groups in terms of
activation.

  1. Block groups never activated
  2. Block groups currently active
  3. Block groups previously active and currently inactive (due to fully
     written or zone finish)

What we really wanted to exclude from "total_bytes" is the total size of
BGs #1. They seem empty and allocatable but since they are not activated,
we cannot rely on them to do the space reservation.

And, since BGs #1 never get activated, they should have no "used",
"reserved" and "pinned" bytes.

OTOH, BGs #3 can be counted in the "total", since they are already full
we cannot allocate from them anyway. For them, "total_bytes == used +
reserved + pinned + zone_unusable" should hold.

Tracking #2 and #3 as "active_total_bytes" (current implementation) is
confusing. And, tracking #1 and subtract that properly from "total_bytes"
every time you need space reservation is cumbersome.

Instead, we can count the whole region of a newly allocated block group as
zone_unusable. Then, once that block group is activated, release
[0 ..  zone_capacity] from the zone_unusable counters. With this, we can
eliminate the confusing ->active_total_bytes and the code will be common
among regular and the zoned mode. Also, no additional counter is needed
with this approach.

Fixes: 6a921de589 ("btrfs: zoned: introduce space_info->active_total_bytes")
CC: stable@vger.kernel.org # 6.1+
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-15 20:51:07 +01:00
Josef Bacik
df384da5a4 btrfs: use temporary variable for space_info in btrfs_update_block_group
We do

  cache->space_info->counter += num_bytes;

everywhere in here.  This is makes the lines longer than they need to
be, and will be especially noticeable when we add the active tracking in,
so add a temp variable for the space_info so this is cleaner.

Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-15 20:51:06 +01:00
Josef Bacik
bf1f1fec27 btrfs: rename BTRFS_FS_NO_OVERCOMMIT to BTRFS_FS_ACTIVE_ZONE_TRACKING
This flag only gets set when we're doing active zone tracking, and we're
going to need to use this flag for things related to this behavior.
Rename the flag to represent what it actually means for the file system
so it can be used in other ways and still make sense.

Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-15 20:51:06 +01:00
Naohiro Aota
9e1cdf0c35 btrfs: zoned: fix btrfs_can_activate_zone() to support DUP profile
btrfs_can_activate_zone() returns true if at least one device has one zone
available for activation. This is OK for the single profile, but not OK for
DUP profile. We need two zones to create a DUP block group. Fix it by
properly handling the case with the profile flags.

Fixes: 265f7237dd ("btrfs: zoned: allow DUP on meta-data block groups")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-15 20:51:06 +01:00
Sweet Tea Dorminy
10a8857a1b btrfs: fix compiler warning on SPARC/PA-RISC handling fscrypt_setup_filename
Commit 1ec49744ba ("btrfs: turn on -Wmaybe-uninitialized") exposed
that on SPARC and PA-RISC, gcc is unaware that fscrypt_setup_filename()
only returns negative error values or 0. This ultimately results in a
maybe-uninitialized warning in btrfs_lookup_dentry().

Change to only return negative error values or 0 from
fscrypt_setup_filename() at the relevant call site, and assert that no
positive error codes are returned (which would have wider implications
involving other users).

Reported-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/all/481b19b5-83a0-4793-b4fd-194ad7b978c3@roeck-us.net/
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-15 20:51:06 +01:00
Qu Wenruo
1c3ab6dfa0 btrfs: handle missing chunk mapping more gracefully
[BUG]
During my scrub rework, I did a stupid thing like this:

        bio->bi_iter.bi_sector = stripe->logical;
        btrfs_submit_bio(fs_info, bio, stripe->mirror_num);

Above bi_sector assignment is using logical address directly, which
lacks ">> SECTOR_SHIFT".

This results a read on a range which has no chunk mapping.

This results the following crash:

  BTRFS critical (device dm-1): unable to find logical 11274289152 length 65536
  assertion failed: !IS_ERR(em), in fs/btrfs/volumes.c:6387

Sure this is all my fault, but this shows a possible problem in real
world, that some bit flip in file extents/tree block can point to
unmapped ranges, and trigger above ASSERT(), or if CONFIG_BTRFS_ASSERT
is not configured, cause invalid pointer access.

[PROBLEMS]
In the above call chain, we just don't handle the possible error from
btrfs_get_chunk_map() inside __btrfs_map_block().

[FIX]
The fix is straightforward, replace the ASSERT() with proper error
handling (callers handle errors already).

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-15 20:51:05 +01:00
Linus Torvalds
9c1bec9c0b Merge tag 'linux-kselftest-fixes-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
 "A fix to amd-pstate test Makefile and a fix to LLVM build for x86 in
  kselftest common lib.mk"

* tag 'linux-kselftest-fixes-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests: fix LLVM build for i386 and x86_64
  selftests: amd-pstate: fix TEST_FILES
2023-03-15 12:20:37 -07:00
Jens Axboe
23e5b9307e Merge branch 'md-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-6.3
Pull MD fixes from Song:

"This set contains two fixes for old issues (by Neil) and one fix
 for 6.3 (by Xiao)."

* 'md-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
  md: select BLOCK_LEGACY_AUTOLOAD
  md: avoid signed overflow in slot_store()
  md: Free resources in __md_stop
2023-03-15 12:18:07 -06:00