Pull rdma updates from Doug Ledford:
"Initial roundup of 4.5 merge window patches
- Remove usage of ib_query_device and instead store attributes in
ib_device struct
- Move iopoll out of block and into lib, rename to irqpoll, and use
in several places in the rdma stack as our new completion queue
polling library mechanism. Update the other block drivers that
already used iopoll to use the new mechanism too.
- Replace the per-entry GID table locks with a single GID table lock
- IPoIB multicast cleanup
- Cleanups to the IB MR facility
- Add support for 64bit extended IB counters
- Fix for netlink oops while parsing RDMA nl messages
- RoCEv2 support for the core IB code
- mlx4 RoCEv2 support
- mlx5 RoCEv2 support
- Cross Channel support for mlx5
- Timestamp support for mlx5
- Atomic support for mlx5
- Raw QP support for mlx5
- MAINTAINERS update for mlx4/mlx5
- Misc ocrdma, qib, nes, usNIC, cxgb3, cxgb4, mlx4, mlx5 updates
- Add support for remote invalidate to the iSER driver (pushed
through the RDMA tree due to dependencies, acknowledged by nab)
- Update to NFSoRDMA (pushed through the RDMA tree due to
dependencies, acknowledged by Bruce)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (169 commits)
IB/mlx5: Unify CQ create flags check
IB/mlx5: Expose Raw Packet QP to user space consumers
{IB, net}/mlx5: Move the modify QP operation table to mlx5_ib
IB/mlx5: Support setting Ethernet priority for Raw Packet QPs
IB/mlx5: Add Raw Packet QP query functionality
IB/mlx5: Add create and destroy functionality for Raw Packet QP
IB/mlx5: Refactor mlx5_ib_qp to accommodate other QP types
IB/mlx5: Allocate a Transport Domain for each ucontext
net/mlx5_core: Warn on unsupported events of QP/RQ/SQ
net/mlx5_core: Add RQ and SQ event handling
net/mlx5_core: Export transport objects
IB/mlx5: Expose CQE version to user-space
IB/mlx5: Add CQE version 1 support to user QPs and SRQs
IB/mlx5: Fix data validation in mlx5_ib_alloc_ucontext
IB/sa: Fix netlink local service GFP crash
IB/srpt: Remove redundant wc array
IB/qib: Improve ipoib UD performance
IB/mlx4: Advertise RoCE v2 support
IB/mlx4: Create and use another QP1 for RoCEv2
IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers
...
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).
Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Remove the unused ib_allow_mw and ib_bind_mw functions, remove the
unused IB_WR_BIND_MW and IB_WC_BIND_MW opcodes and move ib_dealloc_mw
into the uverbs module.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Fix the spelling of user_credit_return_threshold, it was incorrectly
spelled as user_credit_return_theshold causing two module parameters,
one with typo, to be shown in modinfo
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
num_rcv_contexts sets the number of user contexts, both receive and send.
Renaming it to num_user_contexts makes sense to reflect its true meaning.
When num_rcv_contexts is 0, the default behavior is the number of CPU
cores instead of 0 contexts. This commit changes the variable
num_rcv_contexts to num_user_contexts, and it also makes any negative
value for this variable default to the number of CPU cores, so if
num_user_contexts is set >= 0, the value will number of contexts.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If driver is loaded with parameter hdrq_entsize=0, then
there's a NULL dereference when the driver gets unloaded.
This causes a kernel Oops and prevents the module from
being unloaded. This patch fixes this issue by making sure
-EINVAL gets returned when hdrq_entsize=0.
Reviewed-by: Chegondi, Harish <harish.chegondi@intel.com>
Reviewed-by: Haralanov, Mitko <mitko.haralanov@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A code inspection pointed out that kmalloc_array may return NULL and
memset doesn't check the input pointer for NULL, resulting in a possible
NULL dereference. This patch fixes this.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The kmem_cache_destroy() function tests whether its argument is NULL
and then returns immediately.
Thus the NULL check before calling this function is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Yash Shah <yshah1@visteon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Profiling has shown the the atomic is a performance issue
for the pio hot path.
If multiple cpus allocated an sc's buffer, the cacheline
containing the atomic will bounce from L0 to L0.
Convert the atomic to a percpu variable.
Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There are holes in the sdma build support routines that do
not clean any partially built sdma descriptors after mapping or
allocate failures.
This patch corrects these issues.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The allocation code assumes that the shadow ring cannot
be overrun because the credits will limit the allocation.
Unfortuately, the progress mechanism in sc_release_update() updates
the free count prior to processing the shadow ring, allowing the
shadow ring to be overrun by an allocation.
Reviewed-by: Mark Debbage <mark.debbage@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It is possible for an SDMA transmission error to happen
during the processing of an user SDMA transfer. In that
case it is better to detect it early and abort any further
attempts to send more packets.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The driver pins pages on behalf of user processes in two
separate instances - when the process has submitted a
SDMA transfer and when the process programs an expected
receive buffer.
When pinning pages, the driver is required to observe the
locked page limit set by the system administrator and refuse
to lock more pages than allowed. Such a check was done for
expected receives but was missing from the SDMA transfer
code path.
This commit adds the missing check for SDMA transfers. As of
this commit, user SDMA or expected receive requests will be
rejected if the number of pages required to be pinned will
exceed the set limit.
Due to the fact that the driver needs to take the MM semaphore
in order to update the locked page count (which can sleep), this
cannot be done by the callback function as it [the callback] is
executed in interrupt context. Therefore, it is necessary to put
all the completed SDMA tx requests onto a separate list (txcmp) and
offload the actual clean-up and unpinning work to a workqueue.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Convert hfi1_get_user_pages() to use get_user_pages_fast(),
which is much fatster. The mm semaphore is still taken to
update the pinned page count but is for a much shorter
amount of time.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is no need to cleck if the packet queue is allocated
when cleaning up a user context. The hfi1_user_sdma_free_queues()
function already does all the required checks.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch avoids issues while calling into copy from/to user while holding the
lock by only taking the lock when it is absolutely required.
The only commands which require the snoop lock are: *Set Filter *Clear Filter
*Clear Queue
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sizeof should use the variable rather than the struct definition to ensure that
type changes are properly accounted for.
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Else statements should continue using braces even if there is only 1 line in
the block. Found by checkpatch --strict
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use !foo rather than (foo == NULL) as recommended by checkpatch --strict
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Place logical operators at the end of the previous line when using a multi-line
statement. Found by checkpatch --strict
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix line alignment in various places as caught by checkpatch --strict.
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changing the 32-bit reserved field in opa_port_data_counters_msg
to the new 'resolution' field. PMA will use resolutions to right-
shift values for LocalLinkIntegrity and LinkErrorRecovery when
computing the ErrorCounterSummary for a DataPortCounters request.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Andrea Lowe <andrea.l.lowe@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, only MTUs of VLs 0-7 are checked when calculating the maximum VL
MTU which is used to set the port MTU capability in DCC_CFG_PORT_CONFIG CSR
This can cause a port MTU capability to be set to 0 if MTUs of VLs 0-7 is 0
This would affect the VL15 traffic.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Arthur Kepner <arthur.kepner@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Correctly reduce the number of VLs when limited by the number
of SDMA engines.
The hardware has multiple egress mechanisms, SDMA and pio, and multiples
of those. These mechanisms are chosen using the VL (8)
The fix corrects a panic issue with one of the platforms that doesn't have
enough SDMA (4) mechanisms for the typical number of VLs.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The longest quiet timeout is now 6s. Extend the driver wait to 6s.
The driver wasn't following our internal specification: 6 seconds.
This patch corrects that issue.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add aeth name syndrome decode to enhance debugging.
The IBTA RC ACK contains an ACK extended transport header.
Part of that header is the syndrome field that qualifies the RC ACK as an
ACK, NAK, or RNR NAK.
Without the patch here is the syndrome decode:
aeth syn 0x00
Here is the decode with the fix:
aeth syn 0x00 ACK
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>