Commit Graph

127 Commits

Author SHA1 Message Date
Zhu Yanjun
3fbbdb44ce IB/rxe: avoid double kfree_skb
[ Upstream commit 9fd4350ba8 ]

When skb is sent, it will pass the following functions in soft roce.

rxe_send [rdma_rxe]
    ip_local_out
        __ip_local_out
        ip_output
            ip_finish_output
                ip_finish_output2
                    dev_queue_xmit
                        __dev_queue_xmit
                            dev_hard_start_xmit

In the above functions, if error occurs in the above functions or
iptables rules drop skb after ip_local_out, kfree_skb will be called.
So it is not necessary to call kfree_skb in soft roce module again.
Or else crash will occur.

The steps to reproduce:

     server                       client
    ---------                    ---------
    |1.1.1.1|<----rxe-channel--->|1.1.1.2|
    ---------                    ---------

On server: rping -s -a 1.1.1.1 -v -C 10000 -S 512
On client: rping -c -a 1.1.1.1 -v -C 10000 -S 512

The kernel configs CONFIG_DEBUG_KMEMLEAK and
CONFIG_DEBUG_OBJECTS are enabled on both server and client.

When rping runs, run the following command in server:

iptables -I OUTPUT -p udp  --dport 4791 -j DROP

Without this patch, crash will occur.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-21 04:02:51 +09:00
Jianchao Wang
b4f6e28c80 IB/rxe: add RXE_START_MASK for rxe_opcode IB_OPCODE_RC_SEND_ONLY_INV
[ Upstream commit 2da36d44a9 ]

w/o RXE_START_MASK, the last_psn of IB_OPCODE_RC_SEND_ONLY_INV
will not be updated in update_wqe_psn, and the corresponding
wqe will not be acked in rxe_completer due to its last_psn is
zero. Finally, the other wqe will also not be able to be acked,
because the wqe of IB_OPCODE_RC_SEND_ONLY_INV with last_psn 0
is still there. This causes large amount of io timeout when
nvmeof is over rxe.

Add RXE_START_MASK for IB_OPCODE_RC_SEND_ONLY_INV to fix this.

Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-21 04:02:51 +09:00
Mikhail Malygin
9ebe297713 IB/rxe: Fix for oops in rxe_register_device on ppc64le arch
[ Upstream commit efc365e729 ]

On ppc64le arch rxe_add command causes oops in kernel log:

[   92.495140] Oops: Kernel access of bad area, sig: 11 [#1]
[   92.499710] SMP NR_CPUS=2048 NUMA pSeries
[   92.499792] Modules linked in: ipt_MASQUERADE(E) nf_nat_masquerade_ipv4(E) nf_conntrack_netlink(E) nfnetlink(E) xfrm_user(E) iptable
_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) xt_addrtype(E) iptable_filter(E) ip_tables(E) xt_conntrack(E) x_tables(E)
 nf_nat(E) nf_conntrack(E) br_netfilter(E) bridge(E) stp(E) llc(E) overlay(E) af_packet(E) rpcrdma(E) ib_isert(E) iscsi_target_mod(E) i
b_iser(E) libiscsi(E) ib_srpt(E) target_core_mod(E) ib_srp(E) ib_ipoib(E) rdma_ucm(E) ib_ucm(E) ib_uverbs(E) ib_umad(E) bochs_drm(E) tt
m(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) drm(E) agpgart(E) virtio_rng(E) virtio_console(E) rtc_
generic(E) dm_ec(OEN) ttln_rdma(OEN) rdma_cm(E) configfs(E) iw_cm(E) ib_cm(E) rdma_rxe(E) ip6_udp_tunnel(E) udp_tunnel(E) ib_core(E) ql
a2xxx(E)
[   92.499832]  scsi_transport_fc(E) nvme_fc(E) nvme_fabrics(E) nvme_core(E) ipmi_watchdog(E) ipmi_ssif(E) ipmi_poweroff(E) ipmi_powernv(EX) ipmi_devintf(E) ipmi_msghandler(E) dummy(E) ext4(E) crc16(E) jbd2(E) mbcache(E) dm_service_time(E) scsi_transport_iscsi(E) sd_mod(E) sr_mod(E) cdrom(E) hid_generic(E) usbhid(E) virtio_blk(E) virtio_scsi(E) virtio_net(E) ibmvscsi(EX) scsi_transport_srp(E) xhci_pci(E) xhci_hcd(E) usbcore(E) usb_common(E) virtio_pci(E) virtio_ring(E) virtio(E) sunrpc(E) dm_mirror(E) dm_region_hash(E) dm_log(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) autofs4(E)
[   92.499834] Supported: No, Unsupported modules are loaded
[   92.499839] CPU: 3 PID: 5576 Comm: sh Tainted: G           OE   NX 4.4.120-ttln.17-default #1
[   92.499841] task: c0000000afe8a490 ti: c0000000beba8000 task.ti: c0000000beba8000
[   92.499842] NIP: c00000000008ba3c LR: c000000000027644 CTR: c00000000008ba10
[   92.499844] REGS: c0000000bebab750 TRAP: 0300   Tainted: G           OE   NX  (4.4.120-ttln.17-default)
[   92.499850] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28424428  XER: 20000000
[   92.499871] CFAR: 0000000000002424 DAR: 0000000000000208 DSISR: 40000000 SOFTE: 1
               GPR00: c000000000027644 c0000000bebab9d0 c000000000f09700 0000000000000000
               GPR04: d0000000043d7192 0000000000000002 000000000000001a fffffffffffffffe
               GPR08: 000000000000009c c00000000008ba10 d0000000043e5848 d0000000043d3828
               GPR12: c00000000008ba10 c000000007a02400 0000000010062e38 0000010020388860
               GPR16: 0000000000000000 0000000000000000 00000100203885f0 00000000100f6c98
               GPR20: c0000000b3f1fcc0 c0000000b3f1fc48 c0000000b3f1fbd0 c0000000b3f1fb58
               GPR24: c0000000b3f1fae0 c0000000b3f1fa68 00000000000005dc c0000000b3f1f9f0
               GPR28: d0000000043e5848 c0000000b3f1f900 c0000000b3f1f320 c0000000b3f1f000
[   92.499881] NIP [c00000000008ba3c] dma_get_required_mask_pSeriesLP+0x2c/0x1a0
[   92.499885] LR [c000000000027644] dma_get_required_mask+0x44/0xac
[   92.499886] Call Trace:
[   92.499891] [c0000000bebab9d0] [c0000000bebaba30] 0xc0000000bebaba30 (unreliable)
[   92.499894] [c0000000bebaba10] [c000000000027644] dma_get_required_mask+0x44/0xac
[   92.499904] [c0000000bebaba30] [d0000000043cb4b4] rxe_register_device+0xc4/0x430 [rdma_rxe]
[   92.499910] [c0000000bebabab0] [d0000000043c06c8] rxe_add+0x448/0x4e0 [rdma_rxe]
[   92.499915] [c0000000bebabb30] [d0000000043d28dc] rxe_net_add+0x4c/0xf0 [rdma_rxe]
[   92.499921] [c0000000bebabb60] [d0000000043d305c] rxe_param_set_add+0x6c/0x1ac [rdma_rxe]
[   92.499924] [c0000000bebabbf0] [c0000000000e78c0] param_attr_store+0xa0/0x180
[   92.499927] [c0000000bebabc70] [c0000000000e6448] module_attr_store+0x48/0x70
[   92.499932] [c0000000bebabc90] [c000000000391f60] sysfs_kf_write+0x70/0xb0
[   92.499935] [c0000000bebabcb0] [c000000000390f1c] kernfs_fop_write+0x18c/0x1e0
[   92.499939] [c0000000bebabd00] [c0000000002e22ac] __vfs_write+0x4c/0x1d0
[   92.499942] [c0000000bebabd90] [c0000000002e2f94] vfs_write+0xc4/0x200
[   92.499945] [c0000000bebabde0] [c0000000002e488c] SyS_write+0x6c/0x110
[   92.499948] [c0000000bebabe30] [c000000000009384] system_call+0x38/0xe4
[   92.499949] Instruction dump:
[   92.499954] 4e800020 3c4c00e8 3842dcf0 7c0802a6 f8010010 60000000 7c0802a6 fba1ffe8
[   92.499958] fbc1fff0 fbe1fff8 f8010010 f821ffc1 <e9230208> 7c7e1b78 2fa90000 419e0078
[   92.499962] ---[ end trace bed077e15eb420cf ]---

It fails in dma_get_required_mask, that has ppc-specific implementation,
and fail if provided device argument is NULL

Signed-off-by: Mikhail Malygin <mikhail@malygin.me>
Reviewed-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30 07:52:24 +02:00
Bart Van Assche
d3b14a66e1 RDMA/rxe: Fix an out-of-bounds read
commit a6544a624c upstream.

This patch avoids that KASAN reports the following when the SRP initiator
calls srp_post_send():

==================================================================
BUG: KASAN: stack-out-of-bounds in rxe_post_send+0x5c4/0x980 [rdma_rxe]
Read of size 8 at addr ffff880066606e30 by task 02-mq/1074

CPU: 2 PID: 1074 Comm: 02-mq Not tainted 4.16.0-rc3-dbg+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Call Trace:
dump_stack+0x85/0xc7
print_address_description+0x65/0x270
kasan_report+0x231/0x350
rxe_post_send+0x5c4/0x980 [rdma_rxe]
srp_post_send.isra.16+0x149/0x190 [ib_srp]
srp_queuecommand+0x94d/0x1670 [ib_srp]
scsi_dispatch_cmd+0x1c2/0x550 [scsi_mod]
scsi_queue_rq+0x843/0xa70 [scsi_mod]
blk_mq_dispatch_rq_list+0x143/0xac0
blk_mq_do_dispatch_ctx+0x1c5/0x260
blk_mq_sched_dispatch_requests+0x2bf/0x2f0
__blk_mq_run_hw_queue+0xdb/0x160
__blk_mq_delay_run_hw_queue+0xba/0x100
blk_mq_run_hw_queue+0xf2/0x190
blk_mq_sched_insert_request+0x163/0x2f0
blk_execute_rq+0xb0/0x130
scsi_execute+0x14e/0x260 [scsi_mod]
scsi_probe_and_add_lun+0x366/0x13d0 [scsi_mod]
__scsi_scan_target+0x18a/0x810 [scsi_mod]
scsi_scan_target+0x11e/0x130 [scsi_mod]
srp_create_target+0x1522/0x19e0 [ib_srp]
kernfs_fop_write+0x180/0x210
__vfs_write+0xb1/0x2e0
vfs_write+0xf6/0x250
SyS_write+0x99/0x110
do_syscall_64+0xee/0x2b0
entry_SYSCALL_64_after_hwframe+0x42/0xb7

The buggy address belongs to the page:
page:ffffea0001998180 count:0 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0x4000000000000000()
raw: 4000000000000000 0000000000000000 0000000000000000 00000000ffffffff
raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff880066606d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1
ffff880066606d80: f1 00 f2 f2 f2 f2 f2 f2 f2 00 00 f2 f2 f2 f2 f2
>ffff880066606e00: f2 00 00 00 00 00 f2 f2 f2 f3 f3 f3 f3 00 00 00
                                    ^
ffff880066606e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff880066606f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Moni Shoua <monis@mellanox.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-24 09:36:31 +02:00
Bart Van Assche
75a3f11c7b RDMA/rxe: Fix rxe_qp_cleanup()
commit bb3ffb7ad4 upstream.

rxe_qp_cleanup() can sleep so it must be run in thread context and
not in atomic context. This patch avoids that the following bug is
triggered:

Kernel BUG at 00000000560033f3 [verbose debug info unavailable]
BUG: sleeping function called from invalid context at net/core/sock.c:2761
in_atomic(): 1, irqs_disabled(): 0, pid: 7, name: ksoftirqd/0
INFO: lockdep is turned off.
Preemption disabled at:
[<00000000b6e69628>] __do_softirq+0x4e/0x540
CPU: 0 PID: 7 Comm: ksoftirqd/0 Not tainted 4.15.0-rc7-dbg+ #4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Call Trace:
 dump_stack+0x85/0xbf
 ___might_sleep+0x177/0x260
 lock_sock_nested+0x1d/0x90
 inet_shutdown+0x2e/0xd0
 rxe_qp_cleanup+0x107/0x140 [rdma_rxe]
 rxe_elem_release+0x18/0x80 [rdma_rxe]
 rxe_requester+0x1cf/0x11b0 [rdma_rxe]
 rxe_do_task+0x78/0xf0 [rdma_rxe]
 tasklet_action+0x99/0x270
 __do_softirq+0xc0/0x540
 run_ksoftirqd+0x1c/0x70
 smpboot_thread_fn+0x1be/0x270
 kthread+0x117/0x130
 ret_from_fork+0x24/0x30

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-22 15:42:15 +01:00
Bart Van Assche
571cb36fac RDMA/rxe: Fix a race condition in rxe_requester()
commit 65567e4121 upstream.

The rxe driver works as follows:
* The send queue, receive queue and completion queues are implemented as
  circular buffers.
* ib_post_send() and ib_post_recv() calls are serialized through a spinlock.
* Removing elements from various queues happens from tasklet
  context. Tasklets are guaranteed to run on at most one CPU. This serializes
  access to these queues. See also rxe_completer(), rxe_requester() and
  rxe_responder().
* rxe_completer() processes the skbs queued onto qp->resp_pkts.
* rxe_requester() handles the send queue (qp->sq.queue).
* rxe_responder() processes the skbs queued onto qp->req_pkts.

Since rxe_drain_req_pkts() processes qp->req_pkts, calling
rxe_drain_req_pkts() from rxe_requester() is racy. Hence this patch.

Reported-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: stable@vger.kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-22 15:42:15 +01:00
Bart Van Assche
7b4e8a46d4 RDMA/rxe: Fix a race condition related to the QP error state
commit 6f301e06de upstream.

The following sequence:
* Change queue pair state into IB_QPS_ERR.
* Post a work request on the queue pair.

Triggers the following race condition in the rdma_rxe driver:
* rxe_qp_error() triggers an asynchronous call of rxe_completer(), the function
  that examines the QP send queue.
* rxe_post_send() posts a work request on the QP send queue.

If rxe_completer() runs prior to rxe_post_send(), it will drain the send
queue and the driver will assume no further action is necessary.
However, once we post the send to the send queue, because the queue is
in error, no send completion will ever happen and the send will get
stuck.  In order to process the send, we need to make sure that
rxe_completer() gets run after a send is posted to a queue pair in an
error state.  This patch ensures that happens.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Moni Shoua <monis@mellanox.com>
Cc: <stable@vger.kernel.org> # v4.8
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-22 15:42:15 +01:00
Colin Ian King
afdbec5d3c IB/rxe: check for allocation failure on elem
[ Upstream commit 4831ca9e4a ]

The allocation for elem may fail (especially because we're using
GFP_ATOMIC) so best to check for a null return.  This fixes a potential
null pointer dereference when assigning elem->pool.

Detected by CoverityScan CID#1357507 ("Dereference null return value")

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-25 14:26:26 +01:00
Greg Kroah-Hartman
b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Andrew Boyer
5c50f1d18f IB/rxe: Handle NETDEV_CHANGE events
Without this fix, ports configured on top of ixgbe miss link up
notifications. ibv_query_port() will continue to return IBV_PORT_DOWN even
though the port is up and working.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28 19:12:36 -04:00
Andrew Boyer
13eb1e21d6 IB/rxe: Avoid ICRC errors by copying into the skb first
The current process is to first calculate the CRC and then copy the client
data into the packet. This leaves a window in which the packet contents and
CRC can get out of sync, if the client changes the data after the CRC is
calculated but before the data is copied.

By copying the data into the packet and then calculating the CRC directly
from the packet contents we eliminate the window.

This can be seen with qperf's ud_bi_bw test. This seems like very
strange/reckless client behavior, but whether the client has mangled its
data or not RXE should be able to transfer it reliably.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28 19:12:36 -04:00
Andrew Boyer
1223a1af75 IB/rxe: Another fix for broken receive queue draining
This fixes another path in rxe_requester() that might overlook stale SKBs,
preventing cleanup.

Fixes: 1217197142 ("rxe: fix broken receive queue draining")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28 19:12:35 -04:00
Andrew Boyer
2418adaed1 IB/rxe: Remove unneeded initialization in prepare6()
Fixes: 4ed6ad1eb3 ("IB/rxe: Cache dst in QP instead of getting it...")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28 19:12:35 -04:00
Andrew Boyer
825a51a4af IB/rxe: Fix up rxe_qp_cleanup()
Replace sk_dst_get()/dst_release() in rxe_qp_cleanup() with sk_dst_reset().
sk_dst_get() takes a new reference on dst, so the dst_release() doesn't
actually release the original reference, which was the design intent.

Fixes: 4ed6ad1eb3 ("IB/rxe: Cache dst in QP instead of getting it...")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28 19:12:34 -04:00
Andrew Boyer
48c22be4ab IB/rxe: Add dst_clone() in prepare_ipv6_hdr()
Otherwise the reference count goes negative as IPv6 packets complete.

Fixes: 4ed6ad1eb3 ("IB/rxe: Cache dst in QP instead of getting it...")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28 19:12:34 -04:00
Andrew Boyer
b9109b7ddb IB/rxe: Fix destination cache for IPv6
To successfully match an IPv6 path, the path cookie must match. Store it
in the QP so that the IPv6 path can be reused.

Replace open-coded version of dst_check() with the actual call, fixing the
logic. The open-coded version skips the check call if dst->obsolete is 0
(DST_OBSOLETE_NONE), proceeding to replace the route. DST_OBSOLETE_NONE
means that the route may continue to be used, though.

Fixes: 4ed6ad1eb3 ("IB/rxe: Cache dst in QP instead of getting it...")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28 19:12:33 -04:00
Andrew Boyer
d45d29567f IB/rxe: Fix up the responder's find_resources() function
The resource array is sized by max_dest_rd_atomic, not max_rd_atomic.
Iterating over max_rd_atomic entries of qp->resp.resources[] will cause
incorrect behavior when the two attributes are different (or even
crash if max_rd_atomic is larger).

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28 19:12:33 -04:00
Andrew Boyer
cffec53daf IB/rxe: Remove dangling prototype
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28 19:12:32 -04:00
Andrew Boyer
bfc3ae0566 IB/rxe: Disable completion upcalls when a CQ is destroyed
This prevents the stack from accessing userspace objects while they
are being torn down.

One possible sequence of events:
 - Userspace program exits
 - ib_uverbs_cleanup_ucontext() runs, calling ib_destroy_qp(),
   ib_destroy_cq(), etc. and releasing/freeing the UCQ
   - The QP still has tasklets running, so it isn't destroyed yet
   - The CQ is referenced by the QP, so the CQ isn't destroyed yet
   - The UCQ is kfree()'d anyway
 - A send work request completes
 - rxe_send_complete() calls cq->ibcq.comp_handler()
 - ib_uverbs_comp_handler() runs and crashes; the event queue is checked
   for is_closed, but it has no way to check the ib_ucq_object before
   accessing it

The reference counting on the CQ doesn't protect against this since the CQ
hasn't been destroyed yet.
There's no available interface to deregister the UCQ from the CQ, and it
didn't appear that attempting to add reference counting to the UCQ was
going to be a good way to go since this solution is much simpler.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28 19:12:32 -04:00
Andrew Boyer
9eb7f8e44d IB/rxe: Move refcounting earlier in rxe_send()
The network stack will call nskb's destructor, rxe_skb_tx_dtor(), if the
packet gets dropped by ip_local_out()/ip6_local_out(). Thus we need to add
the QP ref before output to avoid extra dereferences during network
congestion. This could lead to unwanted destruction of the QP.

Fix up the skb_out accounting, too.

Fixes: fda85ce912 ("IB/rxe: Fix kernel panic from skb destructor")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28 19:12:31 -04:00
Kamal Heib
fab773cb51 IB/rxe: Make rxe_counter_name static
rxe_counter_name is used in rxe_hw_counters.c only. Make it static.

Fixes: 0b1e5b99a4 ('IB/rxe: Add port protocol stats')
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Reviewed-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24 16:44:48 -04:00
Doug Ledford
d3cf4d9915 Merge branch 'misc' into k.o/for-next
Conflicts:
	drivers/infiniband/core/iwcm.c - The rdma_netlink patches in
	HEAD and the iwarp cm workqueue fix (don't use WQ_MEM_RECLAIM,
	we aren't safe for that context) touched the same code.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-18 14:10:23 -04:00
Yuval Shaia
660b1de13c IB/rxe: Remove unneeded check
Port validation is performed in ib_core, no need to duplicate it here.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-18 14:01:10 -04:00
Yuval Shaia
8b62cbd13a IB/rxe: Convert pr_info to pr_warn
This message is warning so let's print it accordingly.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-18 14:01:09 -04:00
Doug Ledford
a5f66725c7 Merge branch 'misc' into k.o/for-next 2017-07-27 09:00:38 -04:00
Leon Romanovsky
e1267b0124 RDMA: Remove useless MODULE_VERSION
All modules in drivers/infiniband defined and used MODULE_VERSION, which
was pointless because the kernel version describes their state more accurate
then those arbitrary numbers.

Signed-off-by: Leon Romanovsky <leon@kernel.org>
Acked-by: Sagi Grimbrg <sagi@grimberg.me>
Reviewed-by: Sagi Grimberg <sagi@grimbeg.me>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Acked-by: Ram Amrani <Ram.Amrani@cavium.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 08:45:11 -04:00
Yuval Shaia
d41861942f IB/core: Add generic function to extract IB speed from netdev
Logic of retrieving netdev speed from net_device and translating it to
IB speed is implemented in rxe, in usnic and in bnxt drivers.

Define new function which merges all.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Christian Benvenuti <benve@cisco.com>
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 08:45:11 -04:00
Kamal Heib
b4fbec9673 IB/rxe: Constify static rxe_vm_ops
Constify static rxe_vm_ops that is never modified.

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 08:43:12 -04:00
Kamal Heib
61013828f6 IB/rxe: Use __func__ to print function's name
Its better to use __func__ to print functions name instead of writing
the name in the print statement.

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 08:43:12 -04:00
Kamal Heib
c05d26647b IB/rxe: Use DEVICE_ATTR_RO macro to show parent field
Use DEVICE_ATTR RO() macro and rename the show function accordingly.

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 08:43:12 -04:00
Kamal Heib
c498e82e3c IB/rxe: Prefer 'unsigned int' to bare use of 'unsigned'
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 08:43:12 -04:00
Kamal Heib
3363828758 IB/rxe: Use "foo *bar" instead of "foo * bar"
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 08:43:12 -04:00
Vijay Immanuel
1217197142 rxe: fix broken receive queue draining
If we modified the qp to ERROR state, and
drained the recieve queue, post_recv must
trigger the responder task to complete
the drain work request.

Cc: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Vijay Immanuel <vijayi@attalasystems.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>--
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20 11:20:50 -04:00
yonatanc
56012e1cad IB/rxe: Set dma_mask and coherent_dma_mask
The RXE coupled with dummy device causes to the kernel panic attached
below.  The panic happens when ib_register_device tries to set dma_mask
by accessing a NULLed parent device.

The RXE does not actually use DMA, so we can set the dma_mask
to architecture value.

[16240.199689] RIP: 0010:ib_register_device+0x468/0x5a0 [ib_core]
[16240.205289] RSP: 0018:ffffc9000220fc10 EFLAGS: 00010246
[16240.209909] RAX: 0000000000000024 RBX: ffff880220d1a2a8 RCX: 0000000000000000
[16240.212244] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
[16240.214385] RBP: ffffc9000220fcb0 R08: 0000000000000000 R09: 000000000000023f
[16240.254465] R10: 0000000000000007 R11: 0000000000000000 R12: 0000000000000000
[16240.259467] R13: 0000000000000000 R14: 0000000000000000 R15: ffff880220d1a2a8
[16240.263314] FS:  00007fd8ecca0740(0000) GS:ffff8802364c0000(0000) knlGS:0000000000000000
[16240.267292] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[16240.273503] CR2: 0000000000000218 CR3: 00000002253ba000 CR4: 00000000000006e0
[16240.277066] Call Trace:
[16240.281836]  ? __kmalloc+0x26f/0x280
[16240.286596]  rxe_register_device+0x297/0x300 [rdma_rxe]
[16240.291377]  rxe_add+0x535/0x5b0 [rdma_rxe]
[16240.297586]  rxe_net_add+0x3e/0xc0 [rdma_rxe]
[16240.302375]  rxe_param_set_add+0x65/0x144 [rdma_rxe]
[16240.307769]  param_attr_store+0x68/0xd0
[16240.311640]  module_attr_store+0x1d/0x30
[16240.316421]  sysfs_kf_write+0x3a/0x50
[16240.317802]  kernfs_fop_write+0xff/0x180
[16240.322989]  __vfs_write+0x37/0x140
[16240.328164]  ? handle_mm_fault+0xce/0x240
[16240.333340]  vfs_write+0xb2/0x1b0
[16240.335013]  SyS_write+0x55/0xc0
[16240.340632]  entry_SYSCALL_64_fastpath+0x1a/0xa9

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-17 21:21:27 -04:00
Yonatan Cohen
fda85ce912 IB/rxe: Fix kernel panic from skb destructor
In the time between rxe_send has finished and skb destructor
called, the QP's ref count might be 0, leading to a possible
QP destruction. This will lead to a kernel panic when the destructor
dereferences the QP.

The operation of incrementing QP ref count at rxe_send and decrementing
from skb destructor will prevent this crash.

BUG: unable to handle kernel NULL pointer dereference at 000000000000072c
IP: [<ffffffffa05df765>] rxe_skb_tx_dtor+0x15/0x50 [rdma_rxe]
PGD 0 [16240.211178]
Oops: 0002 [#1] SMP
CPU: 3 PID: 0 Comm: swapper/3 Tainted: G           OE   4.9.0-mlnx #1
Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
task: ffff88042d6b1480 task.stack: ffffc90001904000
RIP: 0010:[<ffffffffa05df765>]  [<ffffffffa05df765>] rxe_skb_tx_dtor+0x15/0x50 [rdma_rxe]
RSP: 0018:ffff88043fcc3df0  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff880429684700 RCX: ffff88042d248200
RDX: 00000000ffffffff RSI: 00000000fffffe01 RDI: ffff880429684700
RBP: ffff88043fcc3e00 R08: ffff88043fcda240 R09: 00000000ff2d1de6
R10: 0000000000000000 R11: 00000000f49cf6fe R12: ffff880429684700
R13: ffffffff81893f96 R14: ffffffff817d66f0 R15: ffff880427f74200
FS:  0000000000000000(0000) GS:ffff88043fcc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000000072c CR3: 000000041d3df000 CR4: 00000000000006e0
Stack:
 ffffffff817b29cf ffff880429684700 ffff88043fcc3e18 ffffffff817b42c2
 ffff880429684700 ffff88043fcc3e40 ffffffff817b4332 ffff880429684700
 ffff880427f74238 ffff880427f74228 ffff88043fcc3e58 ffffffff81893f96
Call Trace:
 <IRQ> [16240.336345]  [<ffffffff817b29cf>] ? skb_release_head_state+0x4f/0xb0
 [<ffffffff817b42c2>] skb_release_all+0x12/0x30
 [<ffffffff817b4332>] kfree_skb+0x32/0x90
 [<ffffffff81893f96>] ndisc_error_report+0x36/0x40
 [<ffffffff817d4de1>] neigh_invalidate+0x81/0xf0
 [<ffffffff817d68f7>] neigh_timer_handler+0x207/0x2b0
 [<ffffffff81109295>] call_timer_fn+0x35/0x120
 [<ffffffff81109db7>] run_timer_softirq+0x1d7/0x460
 [<ffffffff8106155e>] ? kvm_sched_clock_read+0x1e/0x30
 [<ffffffff810366b9>] ? sched_clock+0x9/0x10
 [<ffffffff810cfed2>] ? sched_clock_cpu+0x72/0xa0
 [<ffffffff818dd537>] __do_softirq+0xd7/0x289
 [<ffffffff810a6c95>] irq_exit+0xb5/0xc0
 [<ffffffff818dd372>] smp_apic_timer_interrupt+0x42/0x50
 [<ffffffff818dc682>] apic_timer_interrupt+0x82/0x90
 <EOI> [16240.395776]  [<ffffffff818da156>] ? native_safe_halt+0x6/0x10
 [<ffffffff818d9e6e>] default_idle+0x1e/0xd0
 [<ffffffff8103797f>] arch_cpu_idle+0xf/0x20
 [<ffffffff818da2c5>] default_idle_call+0x35/0x40
 [<ffffffff810e3eb5>] cpu_startup_entry+0x185/0x210
 [<ffffffff81050433>] start_secondary+0x103/0x130
RIP  [<ffffffffa05df765>] rxe_skb_tx_dtor+0x15/0x50 [rdma_rxe]

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-17 21:21:26 -04:00
Kees Cook
4c93496f18 IB/rxe: do not copy extra stack memory to skb
This fixes a over-read condition detected by FORTIFY_SOURCE for this
line:

	memcpy(SKB_TO_PKT(skb), &ack_pkt, sizeof(skb->cb));

The error was:

  In file included from ./include/linux/bitmap.h:8:0,
                   from ./include/linux/cpumask.h:11,
                   from ./include/linux/mm_types_task.h:13,
                   from ./include/linux/mm_types.h:4,
                   from ./include/linux/kmemcheck.h:4,
                   from ./include/linux/skbuff.h:18,
                   from drivers/infiniband/sw/rxe/rxe_resp.c:34:
  In function 'memcpy',
      inlined from 'send_atomic_ack.constprop' at drivers/infiniband/sw/rxe/rxe_resp.c:998:2,
      inlined from 'acknowledge' at drivers/infiniband/sw/rxe/rxe_resp.c:1026:3,
      inlined from 'rxe_responder' at drivers/infiniband/sw/rxe/rxe_resp.c:1286:10:
  ./include/linux/string.h:309:4: error: call to '__read_overflow2' declared with attribute error: detected read beyond size of object passed as 2nd parameter
      __read_overflow2();

Daniel Micay noted that struct rxe_pkt_info is 32 bytes on 32-bit
architectures, but skb->cb is still 64.  The memcpy() over-reads 32
bytes.  This fixes it by zeroing the unused bytes in skb->cb.

Link: http://lkml.kernel.org/r/1497903987-21002-5-git-send-email-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Moni Shoua <monis@mellanox.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:03 -07:00
Linus Torvalds
51ce5f3329 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma fixes from Doug Ledford:

 "I had thought at the time of the last pull request that there wouldn't
  be much more to go, but several things just kept trickling in over the
  last week.

  Instead of just the six patches to bnxt_re that I had anticipated,
  there are another five IPoIB patches, two qedr patches, and a few
  other miscellaneous patches.

  The bnxt_re patches are more lines of diff than I like to submit this
  late in the game. That's mostly because of the first two patches in
  the series of six. I almost dropped them just because of the lines of
  churn, but on a close review, a lot of the churn came from removing
  duplicated code sections and consolidating them into callable
  routines. I felt like this made the number of lines of change more
  acceptable, and they address problems, so I left them. The remainder
  of the patches are all small, well contained, and well understood.

  These have passed 0day testing, but have not been submitted to
  linux-next (but a local merge test with your current master was
  without any conflicts).

  Summary:

   - A fix for fix eea40b8f62 ("infiniband: call ipv6 route lookup via
     the stub interface")

   - Six patches against bnxt_re...the first two are considerably larger
     than I would like, but as they address real issues I went ahead and
     submitted them (it also helped that a good deal of the churn was
     removing code repeated in multiple places and consolidating it to
     one common function)

   - Two fixes against qedr that just came in

   - One fix against rxe that took a few revisions to get right plus
     time to get the proper reviews

   - Five late breaking IPoIB fixes

   - One late cxgb4 fix"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  rdma/cxgb4: Fix memory leaks during module exit
  IB/ipoib: Fix memory leak in create child syscall
  IB/ipoib: Fix access to un-initialized napi struct
  IB/ipoib: Delete napi in device uninit default
  IB/ipoib: Limit call to free rdma_netdev for capable devices
  IB/ipoib: Fix memory leaks for child interfaces priv
  rxe: Fix a sleep-in-atomic bug in post_one_send
  RDMA/qedr: Add 64KB PAGE_SIZE support to user-space queues
  RDMA/qedr: Initialize byte_len in WC of READ and SEND commands
  RDMA/bnxt_re: Remove FMR support
  RDMA/bnxt_re: Fix RQE posting logic
  RDMA/bnxt_re: Add HW workaround for avoiding stall for UD QPs
  RDMA/bnxt_re: Dereg MR in FW before freeing the fast_reg_page_list
  RDMA/bnxt_re: HW workarounds for handling specific conditions
  RDMA/bnxt_re: Fixing the Control path command and response handling
  IB/addr: Fix setting source address in addr6_resolve()
2017-06-16 17:38:23 +09:00
Jia-Ju Bai
07d432bb97 rxe: Fix a sleep-in-atomic bug in post_one_send
The driver may sleep under a spin lock, and the function call path is:
post_one_send (acquire the lock by spin_lock_irqsave)
  init_send_wqe
    copy_from_user --> may sleep

There is no flow that makes "qp->is_user" true, and copy_from_user may
cause bug when a non-user pointer is used. So the lines of copy_from_user
and check of "qp->is_user" are removed.

Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-14 13:02:01 -04:00
David Miller
d41519a69b crypto: Work around deallocated stack frame reference gcc bug on sparc.
On sparc, if we have an alloca() like situation, as is the case with
SHASH_DESC_ON_STACK(), we can end up referencing deallocated stack
memory.  The result can be that the value is clobbered if a trap
or interrupt arrives at just the right instruction.

It only occurs if the function ends returning a value from that
alloca() area and that value can be placed into the return value
register using a single instruction.

For example, in lib/libcrc32c.c:crc32c() we end up with a return
sequence like:

        return  %i7+8
         lduw   [%o5+16], %o0   ! MEM[(u32 *)__shash_desc.1_10 + 16B],

%o5 holds the base of the on-stack area allocated for the shash
descriptor.  But the return released the stack frame and the
register window.

So if an intererupt arrives between 'return' and 'lduw', then
the value read at %o5+16 can be corrupted.

Add a data compiler barrier to work around this problem.  This is
exactly what the gcc fix will end up doing as well, and it absolutely
should not change the code generated for other cpus (unless gcc
on them has the same bug :-)

With crucial insight from Eric Sandeen.

Cc: <stable@vger.kernel.org>
Reported-by: Anatoly Pugachev <matorola@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-06-08 17:36:03 +08:00
Sagi Grimberg
67cf3623e0 rxe: expose num_possible_cpus() cnum_comp_vectors
They're completely logical, so don't impose an artificial limitation.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-04 19:33:02 -04:00
Leon Romanovsky
af5df5fb59 IB/rxe: Update caller's CRC for RXE_MEM_TYPE_DMA memory type
Callers of rxe_mem_copy() provide pointer to store updated CRC
value. That pointer was supposed to be updated, but the
commit cee2688e3c ("IB/rxe: Offload CRC calculation when possible")
mistakenly removed that assignment for RXE_MEM_TYPE_DMA memory type.

The code worked because there are no actual callers with
RXE_MEM_TYPE_DMA, who are interested in returned value of crcp.
The one caller in read_reply(), who uses the returned crcp didn't
set RXE_MEM_TYPE_DMA as mem->type.

Fixes: cee2688e3c ("IB/rxe: Offload CRC calculation when possible")
Reported-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Andrew Boyer <andrew.boyer@dell.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-04 19:31:46 -04:00
Johannes Thumshirn
d52418502e IB/rxe: Don't clamp residual length to mtu
When reading a RDMA WRITE FIRST packet we copy the DMA length from the RDMA
header into the qp->resp.resid variable for later use. Later in check_rkey()
we clamp it to the MTU if the packet is an  RDMA WRITE packet and has a
residual length bigger than the MTU. Later in write_data_in() we subtract the
payload of the packet from the residual length. If the packet happens to have a
payload of exactly the MTU size we end up with a residual length of 0 despite
the packet not being the last in the conversation. When the next packet in the
conversation arrives, we don't have any residual length left and thus set the QP
into an error state.

This broke NVMe over Fabrics functionality over rdma_rxe.ko

The patch was verified using the following test.

 # echo eth0 > /sys/module/rdma_rxe/parameters/add
 # nvme connect -t rdma -a 192.168.155.101 -s 1023 -n nvmf-test
 # mkfs.xfs -fK /dev/nvme0n1
 meta-data=/dev/nvme0n1           isize=256    agcount=4, agsize=65536 blks
          =                       sectsz=4096  attr=2, projid32bit=1
          =                       crc=0        finobt=0, sparse=0
 data     =                       bsize=4096   blocks=262144, imaxpct=25
          =                       sunit=0      swidth=0 blks
 naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
 log      =internal log           bsize=4096   blocks=2560, version=2
          =                       sectsz=4096  sunit=1 blks, lazy-count=1
 realtime =none                   extsz=4096   blocks=0, rtextents=0
 # mount /dev/nvme0n1 /tmp/
 [  148.923263] XFS (nvme0n1): Mounting V4 Filesystem
 [  148.961196] XFS (nvme0n1): Ending clean mount
 # dd if=/dev/urandom of=test.bin bs=1M count=128
 128+0 records in
 128+0 records out
 134217728 bytes (134 MB, 128 MiB) copied, 0.437991 s, 306 MB/s
 # sha256sum test.bin
 cde42941f045efa8c4f0f157ab6f29741753cdd8d1cff93a6b03649d83c4129a  test.bin
 # cp test.bin /tmp/
 sha256sum /tmp/test.bin
 cde42941f045efa8c4f0f157ab6f29741753cdd8d1cff93a6b03649d83c4129a  /tmp/test.bin

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Max Gurtovoy <maxg@mellanox.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:42:58 -04:00
Dasaratharaman Chandramouli
44c58487d5 IB/core: Define 'ib' and 'roce' rdma_ah_attr types
rdma_ah_attr can now be either ib or roce allowing
core components to use one type or the other and also
to define attributes unique to a specific type. struct
ib_ah is also initialized with the type when its first
created. This ensures that calls such as modify_ah
dont modify the type of the address handle attribute.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:32:43 -04:00
Dasaratharaman Chandramouli
d8966fcd4c IB/core: Use rdma_ah_attr accessor functions
Modify core and driver components to use accessor functions
introduced to access individual fields of rdma_ah_attr

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:32:43 -04:00
Dasaratharaman Chandramouli
90898850ec IB/core: Rename struct ib_ah_attr to rdma_ah_attr
This patch simply renames struct ib_ah_attr to
rdma_ah_attr as these fields specify attributes that are
not necessarily specific to IB.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:32:43 -04:00
Dasaratharaman Chandramouli
eca7ddf965 IB/rxe: Initialize ib_ah_attr during query_ah
Zero out ib_ah_attr before calling query_ah. Set ah_flags
appropriately.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:32:43 -04:00
Colin Ian King
27b0b83233 IB/rxe: fix typo: "algorithmi" -> "algorithm"
trivial fix to typo in pr_err message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:07:58 -04:00
Artemy Kovalyov
3e7e1193e2 IB: Replace ib_umem page_size by page_shift
Size of pages are held by struct ib_umem in page_size field.

It is better to store it as an exponent, because page size by nature
is always power-of-two and used as a factor, divisor or ilog2's argument.

The conversion of page_size to be page_shift allows to have portable
code and avoid following error while compiling on ARM:

  ERROR: "__aeabi_uldivmod" [drivers/infiniband/core/ib_core.ko] undefined!

CC: Selvin Xavier <selvin.xavier@broadcom.com>
CC: Steve Wise <swise@chelsio.com>
CC: Lijun Ou <oulijun@huawei.com>
CC: Shiraz Saleem <shiraz.saleem@intel.com>
CC: Adit Ranadive <aditr@vmware.com>
CC: Dennis Dalessandro <dennis.dalessandro@intel.com>
CC: Ram Amrani <Ram.Amrani@Cavium.com>
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Acked-by: Ram Amrani <Ram.Amrani@cavium.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25 15:40:28 -04:00
Yuval Shaia
4d6f28591f {net,IB}/{rxe,usnic}: Utilize generic mac to eui32 function
This logic seems to be duplicated in (at least) three separate files.
Move it to one place so code can be re-use.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2017-04-25 14:21:34 -04:00
yonatanc
4ed6ad1eb3 IB/rxe: Cache dst in QP instead of getting it for each send
In RC QP there is no need to resolve the outgoing interface
for each packet, as this does not change during QP life cycle.

Instead cache the interface on the socket and use that one.
This improves performance by 12% by sparing redundant
calls to rxe_find_route.

ib_send_bw -d rxe0  -x 1 -n 9000 -e  -s $((1024 * 1024 )) -l 100

----------------------------------------------------------------------------------------
|        | bytes   | iterations | BW peak[MB/sec] | BW average[MB/sec] | MsgRate[Mpps] |
----------------------------------------------------------------------------------------
| before | 1048576 | 9000       | inf             | 551.21             | 0.000551      |
| after  | 1048576 | 9000       | inf             | 615.54             | 0.000616      |
----------------------------------------------------------------------------------------

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 10:45:02 -04:00