Using GPNFT/GNNFT command will be able to cover switch database with less
number of scans. This patch removes Get NportID with provided WWPN/GIDPN
switch command. By making this change, in large fabric with lots of remote
port or NPIV ports with noisy SAN, the number of GIDPN commands issued by a
port when it detects large number of remote ports going away or coming back,
can overwhelmn the switch and it can becomde unresponsive. In a case where the
fabric has not change, GIDPN is not required.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
- Reduce sess_lock holding to prevent CPU Lock up. sess_lock was held across
fc_port registration and deletion. These calls can be blocked by upper
layer. Sess_lock is also being accessed by interrupt thread.
- Reduce number of loops in processing work_list to prevent kernel complaint
of CPU lockup or holding sess_lock.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently, qla2x00_[get_sp|rel_sp] routines does {get|release} of srb
resource/srb_mempool directly from qla_hw_data. qla2x00_start_sp() is used to
issue management commands through the default Request Q 0 & Response Q 0 or
base_qpair. This patch moves access of these resources through
base_qpair. Instead of having knowledge of specific Q number and lock to
rsp/req queue, this change will key off the qpair that is assigned to the srb
resource. This lays the ground work for other routines to see this resource
through the qpair.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add sysfs support to control zio6 interrupt threshold. Using this sysfs hook
user can set when to generate interrupts. This value will be used to tell
firmware to generate interrupt at a certain interval. If the number of
exchanges/commands fall below defined setting, then the interrupt will be
generated immediately by the firmware.
By default ZIO6 will coalesce interrupts to a specified interval
regardless of low traffic or high traffic.
[mkp: fixed several typos]
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Move ATIO queue processing out of hardware_lock to prevent deadlock.
Fixes: 3bb67df5b5 ("qla2xxx: Check for online flag instead of active reset when transmitting responses")
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently, the rport registration is being called from a single work element
that is used to process QLA internal "work_list". This work_list is meant for
quick and simple task (ie no sleep). The Rport registration process sometime
can be delayed by upper layer. This causes back pressure with the internal
queue where other jobs are unable to move forward.
This patch will schedule the registration process with a new work element
(fc_port.reg_work). While the RPort is being registered, the current state of
the fcport will not move forward until the registration is done. If the state
of the fabric has changed, a new field/next_disc_state will record the next
action on whether to 'DELETE' or 'Reverify the session/ADISC'.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
On Abort of initiator scsi command, the abort needs to follow the same qpair
as the the scsi command to prevent out of order processing.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull SCSI updates from James Bottomley:
"This is mostly updates to the usual drivers: mpt3sas, lpfc, qla2xxx,
hisi_sas, smartpqi, megaraid_sas, arcmsr.
In addition, with the continuing absence of Nic we have target updates
for tcmu and target core (all with reviews and acks).
The biggest observable change is going to be that we're (again) trying
to switch to mulitqueue as the default (a user can still override the
setting on the kernel command line).
Other major core stuff is the removal of the remaining Microchannel
drivers, an update of the internal timers and some reworks of
completion and result handling"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (203 commits)
scsi: core: use blk_mq_run_hw_queues in scsi_kick_queue
scsi: ufs: remove unnecessary query(DM) UPIU trace
scsi: qla2xxx: Fix issue reported by static checker for qla2x00_els_dcmd2_sp_done()
scsi: aacraid: Spelling fix in comment
scsi: mpt3sas: Fix calltrace observed while running IO & reset
scsi: aic94xx: fix an error code in aic94xx_init()
scsi: st: remove redundant pointer STbuffer
scsi: qla2xxx: Update driver version to 10.00.00.08-k
scsi: qla2xxx: Migrate NVME N2N handling into state machine
scsi: qla2xxx: Save frame payload size from ICB
scsi: qla2xxx: Fix stalled relogin
scsi: qla2xxx: Fix race between switch cmd completion and timeout
scsi: qla2xxx: Fix Management Server NPort handle reservation logic
scsi: qla2xxx: Flush mailbox commands on chip reset
scsi: qla2xxx: Fix unintended Logout
scsi: qla2xxx: Fix session state stuck in Get Port DB
scsi: qla2xxx: Fix redundant fc_rport registration
scsi: qla2xxx: Silent erroneous message
scsi: qla2xxx: Prevent sysfs access when chip is down
scsi: qla2xxx: Add longer window for chip reset
...
This patch fixes regression introduced for the N2N support for FC-NVMe. For
FC-NVMe with N2N connection, instead of FW initiating the Login, Driver
starts Login process. This patch migrates that new process from a
standalone path into existing session management state machine. With this
state change now driver will not wait for pull NPort ID from FW.
Fixes: edd05de197 ("scsi: qla2xxx: Changes to support N2N logins")
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Save frame payload size from init control block. This field/data is used
to register with switch database. This allows the init control block temp
buf to be reused.
[mkp: remove unused variable]
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch sets and clears FCF_ASYNC_{SENT|ACTIVE} flags to prevent
stalling of relogin attempt. Once flag are correctly set/cleared, relogin
timer can retry relogin attempt for driver to continue login.
Fixes: fa83e65885 ("scsi: qla2xxx: ensure async flags are reset correctly")
Cc: stable@vger.kernel.org #4.17
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix race condition between switch cmd completion and timeout timer. Timer
has popped triggers command free. On IOCB completion, stale sp point was
reused. Instead, an abort will be sent to FW to nudge the command out of FW
where the normal completion will take place.
RIP: 0010:qla2x00_chk_ms_status+0xf3/0x1b0 [qla2xxx]
Call Trace:
<IRQ>
qla24xx_els_ct_entry.isra.15+0x1d4/0x2b0 [qla2xxx]
qla24xx_msix_rsp_q+0x39/0xf0 [qla2xxx]
qla24xx_process_response_queue+0xbc/0x2b0 [qla2xxx]
qla24xx_msix_rsp_q+0x8a/0xf0 [qla2xxx]
__handle_irq_event_percpu+0xa0/0x1f0
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
After selecting the NPort handle/loop_id, set a bit in the loop_id_map to
prevent others from selecting the same NPort handle.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Flush pending mailbox commands on chip reset. Wake up command that's
waiting for an interrupt and wait for mailbox counters to go to zero.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch sets discovery state back to GNL (Get Name List) when session is
stuck at GPDB (Get Port DataBase). This will allow state machine to retry
login and move session state ahead in discovery.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In case of N2N connect, sg_reset for bus/device/host was causing driver and
firmware state to go out of sync. This patch fixes this link instablity
when reconnect is attempted after link flap.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull SCSI fixes from James Bottomley:
"This is a set of minor (and safe changes) that didn't make the initial
pull request plus some bug fixes"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qla2xxx: Mask off Scope bits in retry delay
scsi: qla2xxx: Fix crash on qla2x00_mailbox_command
scsi: aic7xxx: aic79xx: fix potential null pointer dereference on ahd
scsi: mpt3sas: Add an I/O barrier
scsi: qla2xxx: Fix setting lower transfer speed if GPSC fails
scsi: hpsa: disable device during shutdown
scsi: sd_zbc: Fix sd_zbc_check_zone_size() error path
scsi: aacraid: remove bogus GFP_DMA32 specifies
This patch prevents driver from setting lower default speed of 1 GB/sec,
if the switch does not support Get Port Speed Capabilities (GPSC)
command. Setting this default speed results into much lower write
performance for large sequential WRITE. This patch modifies driver to
check for gpsc_supported flags and prevents driver from issuing
MBC_SET_PORT_PARAM (001Ah) to set default speed of 1 GB/sec. If driver
does not send this mailbox command, firmware assumes maximum supported
link speed and will operate at the max speed.
Cc: stable@vger.kernel.org
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reported-by: Eda Zhou <ezhou@redhat.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Tested-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Move GPSC & GFPNID commands out of session management to reduce time lag
in reporting the session state to remote port. These commands are not
essential when it comes to maintaining the rport state. Delay sending
these commands after rport state is set to Online.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For each RSCN that triggers a rescan of the fabric, ADISC is used to
revalidate an existing session. If the RSCN is not affecting all
existing sessions, then driver should not send redundant ADISC for all
existing sessions.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch fixes login_retry login for ADISC command.
when login_retry count reaches 0, further attempt to send ADISC command
is ignored by the code. Remove this redundant login_retry count check
from qla24xx_fcport_handle_login()
[mkp: fix typo]
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
qla2x00_init_timer() calls add_timer() on the iocb timeout timer, which
means the timeout function pointer and any data that the function depends on
must be initialised beforehand.
Move this initialisation before each call to qla2x00_init_timer(). In some
cases qla2x00_init_timer() initialises a completion structure needed by the
timeout function, so move the call to add_timer() after that.
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
qla2x00_tmf_sp_done() now deletes the timer that will run
qla2x00_tmf_iocb_timeout(), but doesn't check whether the timer already
expired. Check the return value from del_timer() to avoid calling
complete() a second time.
Fixes: 4440e46d5d ("[SCSI] qla2xxx: Add IOCB Abort command asynchronous ...")
Fixes: 1514839b36 ("scsi: qla2xxx: Fix NULL pointer crash due to active ...")
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Somewhat nasty merge due to conflicts between "33b28357dd00 scsi:
qla2xxx: Fix Async GPN_FT for FCP and FC-NVMe scan" and "2b5b96473efc
scsi: qla2xxx: Fix FC-NVMe LUN discovery"
Merge is non-trivial and has been verified by Qlogic (Cavium)
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
Commit 7d64c39e64310 fixed regression of FCP discovery when Nport Handle
is in-use and relogin is triggered. However, during FCP and FC-NVMe
discovery this resulted into only discovering NVMe LUNs.
This patch fixes issue where FCP and FC-NVMe protocol is used on same
port where assigning FC_NO_LOOP_ID will result into discovery failure
for FCP LUNs.
Fixes: a084fd68e1 ("scsi: qla2xxx: Fix re-login for Nport Handle in use")
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
commit a4239945b8 ("scsi: qla2xxx: Add switch command to simplify
fabric discovery") introduced regression when it did not consider
FC-NVMe code path which broke NVMe LUN discovery.
Fixes: a4239945b8 ("scsi: qla2xxx: Add switch command to simplify fabric discovery")
Signed-off-by: Darren Trapp <darren.trapp@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The fcport flags FCF_ASYNC_ACTIVE and FCF_ASYNC_SENT are used to
throttle the state machine, so we need to ensure to always set and unset
them correctly. Not doing so will lead to the state machine getting
confused and no login attempt into remote ports.
Cc: Quinn Tran <quinn.tran@cavium.com>
Cc: Himanshu Madhani <himanshu.madhani@cavium.com>
Fixes: 3dbec59bdf ("scsi: qla2xxx: Prevent multiple active discovery commands per session")
Signed-off-by: Hannes Reinecke <hare@suse.com>
Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit d8630bb95f ('Serialize session deletion by using work_lock')
tries to fixup a deadlock when deleting sessions, but fails to take into
account the locking rules. This patch resolves the situation by
introducing a separate lock for processing the GNLIST response, and
ensures that sess_lock is released before calling
qlt_schedule_sess_delete().
Cc: Himanshu Madhani <himanshu.madhani@cavium.com>
Cc: Quinn Tran <quinn.tran@cavium.com>
Fixes: d8630bb95f ("scsi: qla2xxx: Serialize session deletion by using work_lock")
Signed-off-by: Hannes Reinecke <hare@suse.com>
Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch is based on Max's original patch.
When the qla2xxx firmware is unavailable, eventually
qla2x00_sp_timeout() is reached, which calls the timeout function and
frees the srb_t instance.
The timeout function always resolves to qla2x00_async_iocb_timeout(),
which invokes another callback function called "done". All of these
qla2x00_*_sp_done() callbacks also free the srb_t instance; after
returning to qla2x00_sp_timeout(), it is freed again.
The fix is to remove the "sp->free(sp)" call from qla2x00_sp_timeout()
and add it to those code paths in qla2x00_async_iocb_timeout() which
do not already free the object.
This is how it looks like with KASAN:
BUG: KASAN: use-after-free in qla2x00_sp_timeout+0x228/0x250
Read of size 8 at addr ffff88278147a590 by task swapper/2/0
Allocated by task 1502:
save_stack+0x33/0xa0
kasan_kmalloc+0xa0/0xd0
kmem_cache_alloc+0xb8/0x1c0
mempool_alloc+0xd6/0x260
qla24xx_async_gnl+0x3c5/0x1100
Freed by task 0:
save_stack+0x33/0xa0
kasan_slab_free+0x72/0xc0
kmem_cache_free+0x75/0x200
qla24xx_async_gnl_sp_done+0x556/0x9e0
qla2x00_async_iocb_timeout+0x1c7/0x420
qla2x00_sp_timeout+0x16d/0x250
call_timer_fn+0x36/0x200
The buggy address belongs to the object at ffff88278147a440
which belongs to the cache qla2xxx_srbs of size 344
The buggy address is located 336 bytes inside of
344-byte region [ffff88278147a440, ffff88278147a598)
Reported-by: Max Kellermann <mk@cm4all.com>
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Cc: Max Kellermann <mk@cm4all.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>