'skb->len' can vary when we partially read the data, this complicates the
calculation of credit to be updated in 'virtio_transport_inc_rx_pkt()/
virtio_transport_dec_rx_pkt()'.
Also in 'virtio_transport_dec_rx_pkt()' we were miscalculating the
credit since 'skb->len' was redundant.
For these reasons, let's replace the use of skbuff state to calculate new
'rx_bytes'/'fwd_cnt' values with explicit value as input argument. This
makes code more simple, because it is not needed to change skbuff state
before each call to update 'rx_bytes'/'fwd_cnt'.
Fixes: 71dc9ec9ac ("virtio/vsock: replace virtio_vsock_pkt with sk_buff")
Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The check introduced in the commit a5fd39464a ("igc: Lift TAPRIO schedule
restriction") can detect a false positive error in some corner case.
For instance,
tc qdisc replace ... taprio num_tc 4
...
sched-entry S 0x01 100000 # slot#1
sched-entry S 0x03 100000 # slot#2
sched-entry S 0x04 100000 # slot#3
sched-entry S 0x08 200000 # slot#4
flags 0x02 # hardware offload
Here the queue#0 (the first queue) is on at the slot#1 and #2,
and off at the slot#3 and #4. Under the current logic, when the slot#4
is examined, validate_schedule() returns *false* since the enablement
count for the queue#0 is two and it is already off at the previous slot
(i.e. #3). But this definition is truely correct.
Let's fix the logic to enforce a strict validation for consecutively-opened
slots.
Fixes: a5fd39464a ("igc: Lift TAPRIO schedule restriction")
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
vf reset nack actually represents the reset operation itself is
performed but no address is assigned. Therefore, e1000_reset_hw_vf
should fill the "perm_addr" with the zero address and return success on
such an occasion. This prevents its callers in netdev.c from saying PF
still resetting, and instead allows them to correctly report that no
address is assigned.
Fixes: 6ddbc4cf1f ("igb: Indicate failure on vf reset for empty mac address")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Enabling SR-IOV causes the virtual functions to make requests to the
PF via the mailbox. Notably, E1000_VF_RESET request will happen during
the initialization of the VF. However, unless the reinit is done, the
VMMB interrupt, which delivers mailbox interrupt from VF to PF will be
kept masked and such requests will be silently ignored.
Enable SR-IOV at the very end of the procedure to configure the device
for SR-IOV so that the PF is configured properly for SR-IOV when a VF is
activated.
Fixes: fa44f2f185 ("igb: Enable SR-IOV configuration via PCI sysfs interface")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When an interface with the maximum number of VLAN filters is brought up,
a spurious error is logged:
[257.483082] 8021q: adding VLAN 0 to HW filter on device enp0s3
[257.483094] iavf 0000:00:03.0 enp0s3: Max allowed VLAN filters 8. Remove existing VLANs or disable filtering via Ethtool if supported.
The VF driver complains that it cannot add the VLAN 0 filter.
On the other hand, the PF driver always adds VLAN 0 filter on VF
initialization. The VF does not need to ask the PF for that filter at
all.
Fix the error by not tracking VLAN 0 filters altogether. With that, the
check added by commit 0e710a3ffd ("iavf: Fix VF driver counting VLAN 0
filters") in iavf_virtchnl.c is useless and might be confusing if left as
it suggests that we track VLAN 0.
Fixes: 0e710a3ffd ("iavf: Fix VF driver counting VLAN 0 filters")
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently, IAVF's decode_rx_desc_ptype() correctly reports payload type
of L4 for IPv4 UDP packets and IPv{4,6} TCP, but only L3 for IPv6 UDP.
Originally, i40e, ice and iavf were affected.
Commit 73df8c9e3e ("i40e: Correct UDP packet header for non_tunnel-ipv6")
fixed that in i40e, then
commit 638a0c8c88 ("ice: fix incorrect payload indicator on PTYPE")
fixed that for ice.
IPv6 UDP is L4 obviously. Fix it and make iavf report correct L4 hash
type for such packets, so that the stack won't calculate it on CPU when
needs it.
Fixes: 206812b5fc ("i40e/i40evf: i40e implementation for skb_set_hash")
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Condition, which checks whether the netdev has hashing enabled is
inverted. Basically, the tagged commit effectively disabled passing flow
hash from descriptor to skb, unless user *disables* it via Ethtool.
Commit a876c3ba59 ("i40e/i40evf: properly report Rx packet hash")
fixed this problem, but only for i40e.
Invert the condition now in iavf and unblock passing hash to skbs again.
Fixes: 857942fd1a ("i40e: Fix Rx hash reported to the stack by our driver")
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Before commit bdc1ddad3e ("compat_ioctl: block: move
blkdev_compat_ioctl() into ioctl.c"), the config BLOCK_COMPAT was used to
include compat_ioctl.c into the kernel build. With this commit, the code
is moved into ioctl.c and included with the config COMPAT. So, since then,
the config BLOCK_COMPAT has no effect and any further purpose.
Remove this obsolete config BLOCK_COMPAT.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230316111630.4897-1-lukas.bulwahn@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
| BUG: Bad page state in process kworker/u8:0 pfn:5c001
| page:00000000bfda61c8 refcount:0 mapcount:0 mapping:0000000000000000 index:0x20001 pfn:0x5c001
| head:0000000011409842 order:9 entire_mapcount:0 nr_pages_mapped:0 pincount:1
| anon flags: 0x3fffc00000b0004(uptodate|head|mappedtodisk|swapbacked|node=0|zone=0|lastcpupid=0xffff)
| raw: 03fffc0000000000 fffffc0000700001 ffffffff00700903 0000000100000000
| raw: 0000000000000200 0000000000000000 00000000ffffffff 0000000000000000
| head: 03fffc00000b0004 dead000000000100 dead000000000122 ffff00000a809dc1
| head: 0000000000020000 0000000000000000 00000000ffffffff 0000000000000000
| page dumped because: nonzero pincount
| CPU: 3 PID: 9 Comm: kworker/u8:0 Not tainted 6.3.0-rc2-00001-gc6811bf0cd87 #1
| Hardware name: linux,dummy-virt (DT)
| Workqueue: events_unbound io_ring_exit_work
| Call trace:
| dump_backtrace+0x13c/0x208
| show_stack+0x34/0x58
| dump_stack_lvl+0x150/0x1a8
| dump_stack+0x20/0x30
| bad_page+0xec/0x238
| free_tail_pages_check+0x280/0x350
| free_pcp_prepare+0x60c/0x830
| free_unref_page+0x50/0x498
| free_compound_page+0xcc/0x100
| free_transhuge_page+0x1f0/0x2b8
| destroy_large_folio+0x80/0xc8
| __folio_put+0xc4/0xf8
| gup_put_folio+0xd0/0x250
| unpin_user_page+0xcc/0x128
| io_buffer_unmap+0xec/0x2c0
| __io_sqe_buffers_unregister+0xa4/0x1e0
| io_ring_exit_work+0x68c/0x1188
| process_one_work+0x91c/0x1a58
| worker_thread+0x48c/0xe30
| kthread+0x278/0x2f0
| ret_from_fork+0x10/0x20
Mark reports an issue with the recent patches coalescing compound pages
while registering them in io_uring. The reason is that we try to drop
excessive references with folio_put_refs(), but pages were acquired
with pin_user_pages(), which has extra accounting and so should be put
down with matching unpin_user_pages() or at least gup_put_folio().
As a fix unpin_user_pages() all but first page instead, and let's figure
out a better API after.
Fixes: 57bebf807e ("io_uring/rsrc: optimise registered huge pages")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/10efd5507d6d1f05ea0f3c601830e08767e189bd.1678980230.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
msg_ring requests transferring files support auto index selection via
IORING_FILE_INDEX_ALLOC, however they don't return the selected index
to the target ring and there is no other good way for the userspace to
know where is the receieved file.
Return the index for allocated slots and 0 otherwise, which is
consistent with other fixed file installing requests.
Cc: stable@vger.kernel.org # v6.0+
Fixes: e6130eba8a ("io_uring: add support for passing fixed file descriptors")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://github.com/axboe/liburing/issues/809
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull NVMe fixes from Christoph:
"nvme fixes for Linux 6.3
- avoid potential UAF in nvmet_req_complete (Damien Le Moal)
- more quirks (Elmer Miroslav Mosher Golovin, Philipp Geulen)
- fix a memory leak in the nvme-pci probe teardown path (Irvin Cote)
- repair the MAINTAINERS entry (Lukas Bulwahn)
- fix handling single range discard request (Ming Lei)
- show more opcode names in trace events (Minwoo Im)
- fix nvme-tcp timeout reporting (Sagi Grimberg)"
* tag 'nvme-6.3-2022-03-16' of git://git.infradead.org/nvme:
nvmet: avoid potential UAF in nvmet_req_complete()
nvme-trace: show more opcode names
nvme-tcp: add nvme-tcp pdu size build protection
nvme-tcp: fix opcode reporting in the timeout handler
nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM620
nvme-pci: add NVME_QUIRK_BOGUS_NID for Netac NV3000
nvme-pci: fixing memory leak in probe teardown path
nvme: fix handling single range discard request
MAINTAINERS: repair malformed T: entries in NVM EXPRESS DRIVERS
Since the blamed commit, phy_ethtool_get_wol() and phy_ethtool_set_wol()
acquire phydev->lock, but the mscc phy driver implementations,
vsc85xx_wol_get() and vsc85xx_wol_set(), acquire the same lock as well,
resulting in a deadlock.
$ ip link set swp3 down
============================================
WARNING: possible recursive locking detected
mscc_felix 0000:00:00.5 swp3: Link is Down
--------------------------------------------
ip/375 is trying to acquire lock:
ffff3d7e82e987a8 (&dev->lock){+.+.}-{4:4}, at: vsc85xx_wol_get+0x2c/0xf4
but task is already holding lock:
ffff3d7e82e987a8 (&dev->lock){+.+.}-{4:4}, at: phy_ethtool_get_wol+0x3c/0x6c
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&dev->lock);
lock(&dev->lock);
*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by ip/375:
#0: ffffd43b2a955788 (rtnl_mutex){+.+.}-{4:4}, at: rtnetlink_rcv_msg+0x144/0x58c
#1: ffff3d7e82e987a8 (&dev->lock){+.+.}-{4:4}, at: phy_ethtool_get_wol+0x3c/0x6c
Call trace:
__mutex_lock+0x98/0x454
mutex_lock_nested+0x2c/0x38
vsc85xx_wol_get+0x2c/0xf4
phy_ethtool_get_wol+0x50/0x6c
phy_suspend+0x84/0xcc
phy_state_machine+0x1b8/0x27c
phy_stop+0x70/0x154
phylink_stop+0x34/0xc0
dsa_port_disable_rt+0x2c/0xa4
dsa_slave_close+0x38/0xec
__dev_close_many+0xc8/0x16c
__dev_change_flags+0xdc/0x218
dev_change_flags+0x24/0x6c
do_setlink+0x234/0xea4
__rtnl_newlink+0x46c/0x878
rtnl_newlink+0x50/0x7c
rtnetlink_rcv_msg+0x16c/0x58c
Removing the mutex_lock(&phydev->lock) calls from the driver restores
the functionality.
Fixes: 2f987d4866 ("net: phy: Add locks to ethtool functions")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230314153025.2372970-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The cited commit attempts to update the hw stats when dumping tc actions.
However, the driver may be called to update the stats of a police action
that may not be in hardware. In such cases the driver will fail to lookup
the police action object and will output an error message both to extack
and dmesg. The dmesg error is confusing as it may not indicate an actual
error.
Remove the dmesg error.
Fixes: 2b68d659a7 ("net/mlx5e: TC, support per action stats")
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently the cloned flow attr resets the original tc action cookies
count.
Fix that by resetting the cloned flow attribute.
Fixes: cca7eac138 ("net/mlx5e: TC, store tc action cookies per attr")
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Freed counters may be reused by fs core.
As such, raw counters may not be initialized to zero.
Cache the counter values when the action stats object is initialized to
have a proper base value for calculating the difference from the previous
query.
Fixes: 2b68d659a7 ("net/mlx5e: TC, support per action stats")
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
XSK redirecting XDP programs require linearity, hence applies
restrictions on the MTU. For PAGE_SIZE=4K, MTU shouldn't exceed 3498.
Features that contradict with XDP such HW-LRO and HW-GRO are enforced
by the driver in advance, during XSK params validation, except for MTU,
which was not enforced before this patch.
This has been spotted during test scenario described below:
Attaching xdpsock program (PAGE_SIZE=4K), with MTU < 3498, detaching
XDP program, changing the MTU to arbitrary value in the range
[3499, 3754], attaching XDP program again, which ended up with failure
since MTU is > 3498.
This commit lowers the XSK MTU limitation to be aligned with XDP MTU
limitation, since XSK socket is meaningless without XDP program.
Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently, BREAK_FW_WAIT flag is set after syncing with fw_reset.
However, fw_reset can call mlx5_load_one() which is waiting for fw
init bit and BREAK_FW_WAIT flag is intended to stop. e.g.: the driver
might wait on a loop it should exit.
Fix it by setting the flag before syncing with fw_reset.
Fixes: 8324a02c34 ("net/mlx5: Add exit route when waiting for FW")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Do proper error unwinding when adding an unsupported TX/RX cipher type.
Move the switch case prior to key creation so there's less to unwind,
and change the goto label name to describe the action performed instead
of what failed.
Fixes: 4960c414db ("net/mlx5e: Support 256 bit keys with kTLS device offload")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
During module is unloaded while a peer tc flow is still offloaded,
first the peer uplink rep profile is changed to a nic profile, and so
neigh encap lock is destroyed. Next during unload, the VF reps netdevs
are unregistered which causes the original non-peer tc flow to be deleted,
which deletes the peer flow. The peer flow deletion detaches the encap
entry and try to take the already destroyed encap lock, causing the
below trace.
Fix this by clearing peer flows during tc eswitch cleanup
(mlx5e_tc_esw_cleanup()).
Relevant trace:
[ 4316.837128] BUG: kernel NULL pointer dereference, address: 00000000000001d8
[ 4316.842239] RIP: 0010:__mutex_lock+0xb5/0xc40
[ 4316.851897] Call Trace:
[ 4316.852481] <TASK>
[ 4316.857214] mlx5e_rep_neigh_entry_release+0x93/0x790 [mlx5_core]
[ 4316.858258] mlx5e_rep_encap_entry_detach+0xa7/0xf0 [mlx5_core]
[ 4316.859134] mlx5e_encap_dealloc+0xa3/0xf0 [mlx5_core]
[ 4316.859867] clean_encap_dests.part.0+0x5c/0xe0 [mlx5_core]
[ 4316.860605] mlx5e_tc_del_fdb_flow+0x32a/0x810 [mlx5_core]
[ 4316.862609] __mlx5e_tc_del_fdb_peer_flow+0x1a2/0x250 [mlx5_core]
[ 4316.863394] mlx5e_tc_del_flow+0x(/0x630 [mlx5_core]
[ 4316.864090] mlx5e_flow_put+0x5f/0x100 [mlx5_core]
[ 4316.864771] mlx5e_delete_flower+0x4de/0xa40 [mlx5_core]
[ 4316.865486] tc_setup_cb_reoffload+0x20/0x80
[ 4316.865905] fl_reoffload+0x47c/0x510 [cls_flower]
[ 4316.869181] tcf_block_playback_offloads+0x91/0x1d0
[ 4316.869649] tcf_block_unbind+0xe7/0x1b0
[ 4316.870049] tcf_block_offload_cmd.isra.0+0x1ee/0x270
[ 4316.879266] tcf_block_offload_unbind+0x61/0xa0
[ 4316.879711] __tcf_block_put+0xa4/0x310
Fixes: 04de7dda73 ("net/mlx5e: Infrastructure for duplicated offloading of TC flows")
Fixes: 1418ddd96a ("net/mlx5e: Duplicate offloaded TC eswitch rules under uplink LAG")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Rules with mirror actions are split to two FTEs when the actions after the mirror
action contains pedit, vlan push/pop or ct. Forward to ovs internal port adds
implicit header rewrite (pedit) but missing trigger to do split.
Fix by setting split_count when forwarding to ovs internal port which
will trigger split in mirror rules.
Fixes: 27484f7170 ("net/mlx5e: Offload tc rules that redirect to ovs internal port")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In few cases, rules with mirror use case are split to two FTEs, one which
do the mirror action and forward to second FTE which do the rest of the rule
actions and the second redirect action.
In case of mirror rules which do split and forward to ovs internal port or
VF stack devices, source port rewrite should be used in the second FTE but
it is wrongly also set in the first FTE which break the offload.
Fix this issue by removing the wrong check if source port rewrite is needed to
be used on the first FTE of the split and instead return EOPNOTSUPP which will
block offload of rules which mirror to ovs internal port or VF stack devices
which isn't supported.
Fixes: 10742efc20 ("net/mlx5e: VF tunnel TX traffic offloading")
Fixes: a508728a4c ("net/mlx5e: VF tunnel RX traffic offloading")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The offending commit changed the ordering of moving to legacy mode and
waiting for the VF pages. Moving to legacy mode is important in
bluefield, because it sends the host driver into error state, and frees
its pages. Without this transition we end up waiting 2 minutes for
pages that aren't coming before carrying on with the unload process.
Fixes: f019679ea5 ("net/mlx5: E-switch, Remove dependency between sriov and eswitch mode")
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When ECPF is a page supplier, reclaim pages missed to honor the
ec_function bit provided by the firmware. It always used the ec_function
to true during driver unload flow for ECPF. This is incorrect.
Honor the ec_function bit provided by device during page allocation
request event.
Fixes: d6945242f4 ("net/mlx5: Hold pages RB tree per VF")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When mlx5e attaches again after device health recovery, the device
capabilities might have changed by the eswitch manager.
For example in one flow when ECPF changes the eswitch mode between
legacy and switchdev, it updates the flow table tunnel capability.
The cached value is only used in one place, so just check the capability
there instead.
Fixes: 5bef709d76 ("net/mlx5: Enable host PF HCA after eswitch is initialized")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently mlx5e_macsec_umr struct does not satisfy hardware memory
alignment requirement. Hence the result of querying advanced steering
operation (ASO) is not copied to the memory region as expected.
Fix by satisfying hardware memory alignment requirement and move
context to be first field in struct for better readability.
Fixes: 1f53da6764 ("net/mlx5e: Create advanced steering operation (ASO) object for MACsec")
Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Handle case when module is unloaded (kfd_exit) before a process space
(mm_struct) is released.
v2: Fixed potential race conditions by removing all kfd_process from
the process table first, then working on releasing the resources.
v3: Fixed loop element access / synchronization. Fixed extra empty lines.
Signed-off-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
In USB4 DP tunneling, it's possible to have this scenario that
the path becomes unavailable and CM tears down the path a little bit late.
So, in this case, the HPD is high but fails to read any DPCD register.
That causes the link connection type to be set to sst.
And not all sinks are removed behind the MST branch.
[How]
Restore the link connection type if it fails to read DPCD register.
Cc: stable@vger.kernel.org
Cc: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Cruise Hung <Cruise.Hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[WHY]
Hot plugging and then hot unplugging leads to k1 and k2 values to
change, as signal is detected as a virtual signal on hot unplug. Writing
these values to OTG_PIXEL_RATE_DIV register might cause primary display
to blank (known hw bug).
[HOW]
No longer write k1 and k2 values to register if signal is virtual, we
have safe guards in place in the case that k1 and k2 is unassigned so
that an unknown value is not written to the register either.
Cc: stable@vger.kernel.org
Cc: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Samson Tam <Samson.Tam@amd.com>
Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Saaem Rizvi <SyedSaaem.Rizvi@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pull kselftest fixes from Shuah Khan:
"A fix to amd-pstate test Makefile and a fix to LLVM build for x86 in
kselftest common lib.mk"
* tag 'linux-kselftest-fixes-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: fix LLVM build for i386 and x86_64
selftests: amd-pstate: fix TEST_FILES
Pull MD fixes from Song:
"This set contains two fixes for old issues (by Neil) and one fix
for 6.3 (by Xiao)."
* 'md-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
md: select BLOCK_LEGACY_AUTOLOAD
md: avoid signed overflow in slot_store()
md: Free resources in __md_stop
When BLOCK_LEGACY_AUTOLOAD is not enable, mdadm is not able to
activate new arrays unless "CREATE names=yes" appears in
mdadm.conf
As this is a regression we need to always enable BLOCK_LEGACY_AUTOLOAD
for when MD is selected - at least until mdadm is updated and the
updates widely available.
Cc: stable@vger.kernel.org # v5.18+
Fixes: fbdee71bb5 ("block: deprecate autoloading based on dev_t")
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Song Liu <song@kernel.org>
While using iostat for raid, I observed very strange 'await'
occasionally, and turns out it's due to that 'ios' and 'sectors' is
counted in bdev_start_io_acct(), while 'nsecs' is counted in
bdev_end_io_acct(). I'm not sure why they are ccounted like that
but I think this behaviour is obviously wrong because user will get
wrong disk stats.
Fix the problem by counting 'ios' and 'sectors' when io is done, like
what rq-based device does.
Fixes: 394ffa503b ("blk: introduce generic io stat accounting help function")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230223091226.1135678-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
An nvme target ->queue_response() operation implementation may free the
request passed as argument. Such implementation potentially could result
in a use after free of the request pointer when percpu_ref_put() is
called in nvmet_req_complete().
Avoid such problem by using a local variable to save the sq pointer
before calling __nvmet_req_complete(), thus avoiding dereferencing the
req pointer after that function call.
Fixes: a07b4970f4 ("nvmet: add a generic NVMe target")
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
We have more commands to show in the trace. Sync up.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Make sure that we don't somehow mess up the wire structures in the spec.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kkch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
For non in-capsule writes we reuse the request pdu space for a h2cdata
pdu in order to avoid over allocating space (either preallocate or
dynamically upon receving an r2t pdu). However if the request times out
the core expects to find the opcode in the start of the request, which
we override.
In order to prevent that, without sacrificing additional 24 bytes per
request, we just use the tail of the command pdu space instead (last
24 bytes from the 72 bytes command pdu). That should make the command
opcode always available, and we get away from allocating more space.
If in the future we would need the last 24 bytes of the nvme command
available we would need to allocate a dedicated space for it in the
request, but until then we can avoid doing so.
Reported-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kkch@nvidia.com>
Tested-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
In case the nvme_probe teardown path is triggered the ctrl ref count does
not reach 0 thus creating a memory leak upon failure of nvme_probe.
Signed-off-by: Irvin Cote <irvincoteg@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
When investigating one customer report on warning in nvme_setup_discard,
we observed the controller(nvme/tcp) actually exposes
queue_max_discard_segments(req->q) == 1.
Obviously the current code can't handle this situation, since contiguity
merge like normal RW request is taken.
Fix the issue by building range from request sector/nr_sectors directly.
Fixes: b35ba01ea6 ("nvme: support ranged discard requests")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>