Downstream patches will add support for hardware lag with
more than 2 ports. Add a way for users to query the number of lag ports.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The bt number of cqc_timer of HIP09 increases compared with that of HIP08.
Therefore, cqc_timer_bt_num and num_cqc_timer do not match. As a result,
the driver may fail to allocate cqc_timer. So the driver needs to uniquely
uses cqc_timer_bt_num to represent the bt number of cqc_timer.
Fixes: 0e40dc2f70 ("RDMA/hns: Add timer allocation support for hip08")
Link: https://lore.kernel.org/r/20220429093545.58070-1-liangwenpeng@huawei.com
Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
CMDQ may fail during HNS ROCEE initialization. The following is the log
when the execution fails:
hns3 0000:bd:00.2: In reset process RoCE client reinit.
hns3 0000:bd:00.2: CMDQ move tail from 840 to 839
hns3 0000:bd:00.2 hns_2: failed to set gid, ret = -11!
hns3 0000:bd:00.2: CMDQ move tail from 840 to 839
<...>
hns3 0000:bd:00.2: CMDQ move tail from 840 to 839
hns3 0000:bd:00.2: CMDQ move tail from 840 to 0
hns3 0000:bd:00.2: [cmd]token 14e mailbox 20 timeout.
hns3 0000:bd:00.2 hns_2: set HEM step 0 failed!
hns3 0000:bd:00.2 hns_2: set HEM address to HW failed!
hns3 0000:bd:00.2 hns_2: failed to alloc mtpt, ret = -16.
infiniband hns_2: Couldn't create ib_mad PD
infiniband hns_2: Couldn't open port 1
hns3 0000:bd:00.2: Reset done, RoCE client reinit finished.
However, even if ib_mad client registration failed, ib_register_device()
still returns success to the driver.
In the device initialization process, CMDQ execution fails because HW/FW
is abnormal. Therefore, if CMDQ fails, the initialization function should
set CMDQ to a fatal error state and return a failure to the caller.
Fixes: 9a4435375c ("IB/hns: Add driver files for hns RoCE driver")
Link: https://lore.kernel.org/r/20220429093104.26687-1-liangwenpeng@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
QP destroy is synchronous and waits for its refcnt to be decremented in
irdma_cm_node_free_cb (for iWARP) which fires after the RCU grace period
elapses.
Applications running a large number of connections are exposed to high
wait times on destroy QP for events like SIGABORT.
The long pole for this wait time is the firing of the call_rcu callback
during a CM node destroy which can be slow. It holds the QP reference
count and blocks the destroy QP from completing.
call_rcu only needs to make sure that list walkers have a reference to the
cm_node object before freeing it and thus need to wait for grace period
elapse. The rest of the connection teardown in irdma_cm_node_free_cb is
moved out of the grace period wait in irdma_destroy_connection. Also,
replace call_rcu with a simple kfree_rcu as it just needs to do a kfree on
the cm_node
Fixes: 146b9756f1 ("RDMA/irdma: Add connection manager")
Link: https://lore.kernel.org/r/20220425181703.1634-3-shiraz.saleem@intel.com
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
When connection establishment fails in iWARP mode, an app can drain the
QPs and hang because flush isn't issued when the QP is modified from RTR
state to error. Issue a flush in this case using function
irdma_cm_disconn().
Update irdma_cm_disconn() to do flush when cm_id is NULL, which is the
case when the QP is in RTR state and there is an error in the connection
establishment.
Fixes: b48c24c2d7 ("RDMA/irdma: Implement device supported verb APIs")
Link: https://lore.kernel.org/r/20220425181703.1634-2-shiraz.saleem@intel.com
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Introduce mlx5_umr_post_send_wait() that uses a UMR adjusted flow for
posting WQEs. The next patches will gradually move UMR operations to use
this flow. Once done, will get rid of mlx5_ib_post_send_wait().
mlx5_umr_post_send_wait gets already written WQE segments and will only
memcpy it to the SQ. This way, we avoid packing all the data in a WR just
to unpack it into the WQE.
Link: https://lore.kernel.org/r/f027dd592fde62402b2d49efded8d1d22229d22b.1649747695.git.leonro@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The commit mentioned in Fixes line removed the function that was
called to check validity of esp_aes_gcm attribute. Sadly, that
is_valid_esp_aes_gcm() returned success even for specs without
esp_aes_gcm at all.
So the right fix will be to remove whole if () and such fix
the following error observed in smatch too.
drivers/infiniband/hw/mlx5/fs.c:1126 _create_flow_rule()
warn: duplicate check 'is_egress' (previous on line 1098)
Fixes: de8bdb4769 ("RDMA/mlx5: Drop crypto flow steering API")
Link: https://lore.kernel.org/r/11b31c1f85bc8c8add385529aa3f307c3b383a11.1649842371.git.leonro@nvidia.com
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
There is a deadlock in irdma_cleanup_cm_core(), which is shown below:
(Thread 1) | (Thread 2)
| irdma_schedule_cm_timer()
irdma_cleanup_cm_core() | add_timer()
spin_lock_irqsave() //(1) | (wait a time)
... | irdma_cm_timer_tick()
del_timer_sync() | spin_lock_irqsave() //(2)
(wait timer to stop) | ...
We hold cm_core->ht_lock in position (1) of thread 1 and use
del_timer_sync() to wait timer to stop, but timer handler also need
cm_core->ht_lock in position (2) of thread 2. As a result,
irdma_cleanup_cm_core() will block forever.
This patch removes the check of timer_pending() in
irdma_cleanup_cm_core(), because the del_timer_sync() function will just
return directly if there isn't a pending timer. As a result, the lock is
redundant, because there is no resource it could protect.
Link: https://lore.kernel.org/r/20220418153322.42524-1-duoming@zju.edu.cn
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Leon Romanovsky says:
====================
Mellanox shared branch that includes:
* Removal of FPGA TLS code https://lore.kernel.org/all/cover.1649073691.git.leonro@nvidia.com
Mellanox INNOVA TLS cards are EOL in May, 2018 [1]. As such, the code
is unmaintained, untested and not in-use by any upstream/distro oriented
customers. In order to reduce code complexity, drop the kernel code,
clean build config options and delete useless kTLS vs. TLS separation.
[1] https://network.nvidia.com/related-docs/eol/LCR-000286.pdf
* Removal of FPGA IPsec code https://lore.kernel.org/all/cover.1649232994.git.leonro@nvidia.com
Together with FPGA TLS, the IPsec went to EOL state in the November of
2019 [1]. Exactly like FPGA TLS, no active customers exist for this
upstream code and all the complexity around that area can be deleted.
[2] https://network.nvidia.com/related-docs/eol/LCR-000535.pdf
* Fix to undefined behavior from Borislav https://lore.kernel.org/all/20220405151517.29753-11-bp@alien8.de
====================
* 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
net/mlx5: Remove not-implemented IPsec capabilities
net/mlx5: Remove ipsec_ops function table
net/mlx5: Reduce kconfig complexity while building crypto support
net/mlx5: Move IPsec file to relevant directory
net/mlx5: Remove not-needed IPsec config
net/mlx5: Align flow steering allocation namespace to common style
net/mlx5: Unify device IPsec capabilities check
net/mlx5: Remove useless IPsec device checks
net/mlx5: Remove ipsec vs. ipsec offload file separation
RDMA/core: Delete IPsec flow action logic from the core
RDMA/mlx5: Drop crypto flow steering API
RDMA/mlx5: Delete never supported IPsec flow action
net/mlx5: Remove FPGA ipsec specific statistics
net/mlx5: Remove XFRM no_trailer flag
net/mlx5: Remove not-used IDA field from IPsec struct
net/mlx5: Delete metadata handling logic
net/mlx5_fpga: Drop INNOVA IPsec support
IB/mlx5: Fix undefined behavior due to shift overflowing the constant
net/mlx5: Cleanup kTLS function names and their exposure
net/mlx5: Remove tls vs. ktls separation as it is the same
net/mlx5: Remove indirection in TLS build
net/mlx5: Reliably return TLS device capabilities
net/mlx5_fpga: Drop INNOVA TLS support
Link: https://lore.kernel.org/r/20220409055303.1223644-1-leon@kernel.org
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
The mlx5 flow steering crypto API was intended to be used in FPGA
devices, which is not supported for years already. The removal of
mlx5 crypto FPGA code together with inability to configure encryption
keys makes the low steering API completely unusable.
So delete the code, so any ESP flow steering requests will fail with
not supported error, as it is happening now anyway as no device support
this type of API.
Link: https://lore.kernel.org/r/634a5face7734381463d809bfb89850f6998deac.1649232994.git.leonro@nvidia.com
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Under certain conditions, such as MPI_Abort, the hfi1 cleanup code may
represent the last reference held on the task mm.
hfi1_mmu_rb_unregister() then drops the last reference and the mm is freed
before the final use in hfi1_release_user_pages(). A new task may
allocate the mm structure while it is still being used, resulting in
problems. One manifestation is corruption of the mmap_sem counter leading
to a hang in down_write(). Another is corruption of an mm struct that is
in use by another task.
Fixes: 3d2a9d6425 ("IB/hfi1: Ensure correct mm is used at all times")
Link: https://lore.kernel.org/r/20220408133523.122165.72975.stgit@awfm-01.cornelisnetworks.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Douglas Miller <doug.miller@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Split out flags from ib_device::device_cap_flags that are only used
internally to the kernel into kernel_cap_flags that is not part of the
uapi. This limits the device_cap_flags to being the same bitmap that will
be copied to userspace.
This cleanly splits out the uverbs flags from the kernel flags to avoid
confusion in the flags bitmap.
Add some short comments describing which each of the kernel flags is
connected to. Remove unused kernel flags.
Link: https://lore.kernel.org/r/0-v2-22c19e565eef+139a-kern_caps_jgg@nvidia.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Pull rdma updates from Jason Gunthorpe:
- Minor bug fixes in mlx5, mthca, pvrdma, rtrs, mlx4, hfi1, hns
- Minor cleanups: coding style, useless includes and documentation
- Reorganize how multicast processing works in rxe
- Replace a red/black tree with xarray in rxe which improves performance
- DSCP support and HW address handle re-use in irdma
- Simplify the mailbox command handling in hns
- Simplify iser now that FMR is eliminated
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (93 commits)
RDMA/nldev: Prevent underflow in nldev_stat_set_counter_dynamic_doit()
IB/iser: Fix error flow in case of registration failure
IB/iser: Generalize map/unmap dma tasks
IB/iser: Use iser_fr_desc as registration context
IB/iser: Remove iser_reg_data_sg helper function
RDMA/rxe: Use standard names for ref counting
RDMA/rxe: Replace red-black trees by xarrays
RDMA/rxe: Shorten pool names in rxe_pool.c
RDMA/rxe: Move max_elem into rxe_type_info
RDMA/rxe: Replace obj by elem in declaration
RDMA/rxe: Delete _locked() APIs for pool objects
RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC
RDMA/rxe: Replace mr by rkey in responder resources
RDMA/rxe: Fix ref error in rxe_av.c
RDMA/hns: Use the reserved loopback QPs to free MR before destroying MPT
RDMA/irdma: Add support for address handle re-use
RDMA/qib: Fix typos in comments
RDMA/mlx5: Fix memory leak in error flow for subscribe event routine
Revert "RDMA/core: Fix ib_qp_usecnt_dec() called when error"
RDMA/rxe: Remove useless argument for update_state()
...
My static checker complains that:
drivers/infiniband/hw/irdma/ctrl.c:3605 irdma_sc_ceq_init()
warn: can subtract underflow 'info->dev->hmc_fpm_misc.max_ceqs'?
It appears that "info->dev->hmc_fpm_misc.max_ceqs" comes from the firmware
in irdma_sc_parse_fpm_query_buf() so, yes, there is a chance that it could
be zero. Even if we trust the firmware, it's easy enough to change the
condition just as a hardenning measure.
Fixes: 3f49d68425 ("RDMA/irdma: Implement HW Admin Queue OPs")
Link: https://lore.kernel.org/r/20220307125928.GE16710@kili
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Move the debugfs entry pointers under priv to their own struct.
Add get function for device debugfs root.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>