linux

mirror of https://github.com/raspberrypi/linux.git synced 2025-12-24 11:02:51 +00:00

Author	SHA1	Message	Date
Justin Tee	e780c9423b	scsi: lpfc: Change lpfc_hba hba_flag member into a bitmask In attempt to reduce the amount of unnecessary phba->hbalock acquisitions in the lpfc driver, change hba_flag into an unsigned long bitmask and use clear_bit/test_bit bitwise atomic APIs instead of reliance on phba->hbalock for synchronization. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240429221547.6842-6-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2024-05-06 21:53:58 -04:00
Justin Tee	115d137aa9	scsi: lpfc: Define lpfc_dmabuf type for ctx_buf ptr In LPFC_MBOXQ_t, the ctx_buf ptr shouldn't be defined as a generic void ptr. It is named ctx_buf and it should only be used as an lpfc_dmabuf ptr. Due to the void* declaration, there have been abuses of ctx_buf for things not related to lpfc_dmabuf. So, set the ptr type for ctx_buf as lpfc_dmabuf. Remove all type casts on ctx_buf because it is no longer a void ptr. Convert the abuse of ctx_buf for something not related to lpfc_dmabuf to use the void context3 ptr. A particular abuse of the ctx_buf warranted a new void ext_buf ptr. However, the usage of this new void *ext_buf is not generic. It is intended to only hold virtual addresses for extended mailbox commands. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240305200503.57317-10-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2024-03-10 18:56:44 -04:00
Justin Tee	18f7fe44bc	scsi: lpfc: Define lpfc_nodelist type for ctx_ndlp ptr In LPFC_MBOXQ_t data structure, the ctx_ndlp ptr shouldn't be defined as a generic void ptr. It is named ctx_ndlp and it should only be used as an lpfc_nodelist ptr. Due to the void* declaration, there have been abuses of ctx_ndlp for things not related to ndlp. So, set the ptr type for ctx_ndlp as lpfc_nodelist. Remove all type casts on ctx_ndlp because it is no longer a void ptr. Convert the abuse of ctx_ndlp for things not related to ndlps to use the void *context3 ptr. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240305200503.57317-9-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2024-03-10 18:56:44 -04:00
Justin Tee	ded20192df	scsi: lpfc: Release hbalock before calling lpfc_worker_wake_up() lpfc_worker_wake_up() calls the lpfc_work_done() routine, which takes the hbalock. Thus, lpfc_worker_wake_up() should not be called while holding the hbalock to avoid potential deadlock. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240305200503.57317-7-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2024-03-10 18:56:43 -04:00
Justin Tee	ea4044e4dd	scsi: lpfc: Copyright updates for 14.4.0.0 patches Update copyrights to 2024 for files modified in the 14.4.0.0 patch set. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240131185112.149731-18-justintee8345@gmail.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2024-02-05 20:51:36 -05:00
Justin Tee	e39811bec6	scsi: lpfc: Change lpfc_vport load_flag member into a bitmask In attempt to reduce the amount of unnecessary shost_lock acquisitions in the lpfc driver, change load_flag into an unsigned long bitmask and use clear_bit/test_bit bitwise atomic APIs instead of reliance on shost_lock for synchronization. Also, correct the test for FC_UNLOADING in lpfc_ct_handle_mibreq, which incorrectly tests vport->fc_flag rather than vport->load_flag. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240131185112.149731-16-justintee8345@gmail.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2024-02-05 20:51:36 -05:00
Justin Tee	a645b8c1f5	scsi: lpfc: Change lpfc_vport fc_flag member into a bitmask In attempt to reduce the amount of unnecessary shost_lock acquisitions in the lpfc driver, change fc_flag into an unsigned long bitmask and use clear_bit/test_bit bitwise atomic APIs instead of reliance on shost_lock for synchronization. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240131185112.149731-15-justintee8345@gmail.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2024-02-05 20:51:36 -05:00
Justin Tee	9bb36777d0	scsi: lpfc: Protect vport fc_nodes list with an explicit spin lock In attempt to reduce the amount of unnecessary shost_lock acquisitions in the lpfc driver, replace shost_lock with an explicit fc_nodes_list_lock spinlock when accessing vport->fc_nodes lists. Although vport memory region is owned by shost->hostdata, it is driver private memory and an explicit fc_nodes list lock for fc_nodes list mutations is more appropriate than locking the entire shost. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240131185112.149731-14-justintee8345@gmail.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2024-02-05 20:51:36 -05:00
Justin Tee	0dfd9cbc18	scsi: lpfc: Change nlp state statistic counters into atomic_t There is no reason to use the shost_lock to synchronize an LLDD statistics counter. Convert all the nlp state statistic counters into atomic_t. Corresponding zeroing, increments, and reads are converted to atomic versions. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240131185112.149731-13-justintee8345@gmail.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2024-02-05 20:51:35 -05:00
Justin Tee	a801d57a11	scsi: lpfc: Remove NLP_RCV_PLOGI early return during RSCN processing for ndlps Upon first RSCN receipt of a target server's remote port that is initially acting as an initiator function, the driver marks the ndlp->nlp_type as an initiator role. Then later, when processing an RSCN for a target function role switch, that ndlp remote port is permanently stuck as an initiator role and can never transition to be discovered as an updated target role function. Remove the NLP_RCV_PLOGI early return if statement clause so that the NLP_NPR_2B_DISC flag gets set. This allows for role change detections. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240131185112.149731-7-justintee8345@gmail.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2024-02-05 20:51:35 -05:00
Justin Tee	e07ac2d2aa	scsi: lpfc: Eliminate unnecessary relocking in lpfc_check_nlp_post_devloss() In lpfc_check_nlp_post_devloss(), retaking of the ndlp lock in the if statement is useless because the very next line unlocks. Simply return to avoid relocking. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20231031191224.150862-5-justintee8345@gmail.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-11-15 09:52:57 -05:00
Martin K. Petersen	af46076d66	Merge patch series "lpfc: Update lpfc to revision 14.2.0.15" Justin Tee <justintee8345@gmail.com> says: Update lpfc to revision 14.2.0.15 This patch set contains error handling fixes, ELS bug fixes, and logging improvements. The patches were cut against Martin's 6.7/scsi-queue tree. Link: https://lore.kernel.org/r/20231009161812.97232-1-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-10-13 17:00:47 -04:00
Justin Tee	41c831bbb0	scsi: lpfc: Introduce LOG_NODE_VERBOSE messaging flag The preexisting LOG_NODE message flag frequently spams a subset of the same log messages during normal FC driver operations. When analyzing driver logs, this sometimes leads to difficulty in troubleshooting. Because LOG_IP log message flag is unused, convert it to a new LOG_NODE_VERBOSE flag. The LOG_NODE_VERBOSE shall specifically be used for diagnosing issues that require precise ndlp tracking detail. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20231009161812.97232-6-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-10-13 16:58:27 -04:00
Justin Tee	dae40be7a1	scsi: lpfc: Prevent use-after-free during rmmod with mapped NVMe rports During rmmod, when dev_loss_tmo callback is called, an ndlp kref count is decremented twice. Once for SCSI transport registration and second to remove the initial node allocation kref. If there is also an NVMe transport registration, another reference count decrement is expected in lpfc_nvme_unregister_port(). Race conditions between the NVMe transport remoteport_delete and dev_loss_tmo callbacks sometimes results in premature ndlp object release resulting in use-after-free issues. Fix by not dropping the ndlp object in dev_loss_tmo callback with an outstanding NVMe transport registration. Inversely, mark the final NLP_DROPPED flag in lpfc_nvme_unregister_port when rmmod flag is set. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230908211923.37603-1-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-09-13 20:51:16 -04:00
Justin Tee	9c3034968e	scsi: lpfc: Early return after marking final NLP_DROPPED flag in dev_loss_tmo When a dev_loss_tmo event occurs, an ndlp lock is taken before checking nlp_flag for NLP_DROPPED. There is an attempt to restore the ndlp lock when exiting the if statement, but the nlp_put kref could be the final decrement causing a use-after-free memory access on a released ndlp object. Instead of trying to reacquire the ndlp lock after checking nlp_flag, just return after calling nlp_put. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230908211852.37576-1-justintee8345@gmail.com Reviewed-by: "Ewan D. Milne" <emilne@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-09-13 20:49:34 -04:00
James Bottomley	e03843a0f0	Merge branch 'fixes' into misc	2023-09-02 08:25:19 +01:00
Justin Tee	dded1dc31a	scsi: lpfc: Modify when a node should be put in device recovery mode during RSCN Only nodes whose state is at least past a PLOGI issue and strictly less than a PRLI issue should be put into device recovery mode upon RSCN receipt. Previously, the allowance of LOGO and PRLI completion states did not make sense because those nodes should be allowed to flow through and marked as NPort dissappeared as is normally done. A follow up RSCN GID_FT would recover those nodes in such cases. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230804195546.157839-1-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-08-07 21:25:47 -04:00
Justin Tee	90cec07f53	scsi: lpfc: Revise ndlp kref handling for dev_loss_tmo_callbk and lpfc_drop_node The ndlp kref count implementation in lpfc_dev_loss_tmo_callbk() removes the initial node reference when a vport is unloading. When lpfc_cleanup() sends a DEVICE_RM event and is in NPR state, the driver calls lpfc_drop_node(). Subsequently, lpfc_drop_node() also removes an ndlp kref thinking it is the initial reference. This unintentionally introduces an extra kref decrement on the ndlp object. Fix by using the NLP_DROPPED node flag in lpfc_dev_loss_tmo_callbk() and lpfc_drop_node() to coordinate the removal of the initial node reference. In lpfc_dev_loss_tmo_callbk(), remove the SCSI transport reference provided the node is registered in the dev_loss context because the driver cannot call the SCSI transport in dev_loss context or afterwards. And, have lpfc_drop_node() not remove a reference if another thread is acting or has already acted on it. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230712180522.112722-6-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-07-23 16:17:07 -04:00
Justin Tee	377d7abadd	scsi: lpfc: Qualify ndlp discovery state when processing RSCN Conditionalize when to put an ndlp into recovery mode when processing RSCNs. As long as an ndlp state is beyond a PLOGI issue and has been mapped to a transport layer before, the ndlp qualifies to be put into recovery mode. Otherwise, treat the ndlp rport normally through the discovery engine. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230712180522.112722-5-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-07-23 16:17:07 -04:00
Tuo Li	0e881c0a4b	scsi: lpfc: Fix a possible data race in lpfc_unregister_fcf_rescan() The variable phba->fcf.fcf_flag is often protected by the lock phba->hbalock() when is accessed. Here is an example in lpfc_unregister_fcf_rescan(): spin_lock_irq(&phba->hbalock); phba->fcf.fcf_flag \|= FCF_INIT_DISC; spin_unlock_irq(&phba->hbalock); However, in the same function, phba->fcf.fcf_flag is assigned with 0 without holding the lock, and thus can cause a data race: phba->fcf.fcf_flag = 0; To fix this possible data race, a lock and unlock pair is added when accessing the variable phba->fcf.fcf_flag. Reported-by: BassCheck <bass@buaa.edu.cn> Signed-off-by: Tuo Li <islituo@gmail.com> Link: https://lore.kernel.org/r/20230630024748.1035993-1-islituo@gmail.com Reviewed-by: Justin Tee <justin.tee@broadcom.com> Reviewed-by: Laurence Oberman <loberman@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-07-05 21:14:22 -04:00
Martin K. Petersen	21be4d0344	Merge patch series "lpfc: Update lpfc to revision 14.2.0.13" Justin Tee <justintee8345@gmail.com> says: Update lpfc to revision 14.2.0.13 This patch set contains discovery bug fixes, firmware logging improvements, clean up of CQ handling, and statistics collection enhancements. The patches were cut against Martin's 6.5/scsi-queue tree. Link: https://lore.kernel.org/r/20230523183206.7728-1-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-05-31 18:15:07 -04:00
Justin Tee	73ded37869	scsi: lpfc: Account for fabric domain ctlr device loss recovery Pre-existing device loss recovery logic via the NLP_IN_RECOV_POST_DEV_LOSS flag only handled Fabric Port Login, Fabric Controller, Management, and Name Server addresses. Fabric domain controllers fall under the same category for usage of the NLP_IN_RECOV_POST_DEV_LOSS flag. Add a default case statement to mark an ndlp for device loss recovery. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230523183206.7728-4-justintee8345@gmail.com Acked-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-05-31 18:14:20 -04:00
Justin Tee	fd57a687d4	scsi: lpfc: Clear NLP_IN_DEV_LOSS flag if already in rediscovery In dev_loss_tmo callback routine, we early return if the ndlp is in a state of rediscovery. This occurs when a target proactively PLOGIs or PRLIs after an RSCN before the dev_loss_tmo callback routine is scheduled to run. Move clear of the NLP_IN_DEV_LOSS flag before the ndlp state check in such cases. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230523183206.7728-3-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-05-31 18:14:20 -04:00
Justin Tee	a4157aaf0f	scsi: lpfc: Fix use-after-free rport memory access in lpfc_register_remote_port() Due to a target port D_ID swap, it is possible for the lpfc_register_remote_port() routine to touch post mortem fc_rport memory when trying to access fc_rport->dd_data. The D_ID swap causes a simultaneous call to lpfc_unregister_remote_port(), where fc_remote_port_delete() reclaims fc_rport memory. Remove the fc_rport->dd_data->pnode NULL assignment because the following line reassigns ndlp->rport with an fc_rport object from fc_remote_port_add() anyways. The pnode nullification is superfluous. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230523183206.7728-2-justintee8345@gmail.com Acked-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-05-31 18:14:19 -04:00
Azeem Shaikh	73be26b12d	scsi: lpfc: Replace all non-returning strlcpy() with strscpy() strlcpy() reads the entire source buffer first. This read may exceed the destination size limit. This is both inefficient and can lead to linear read overflows if a source string is not NUL-terminated [1]. In an effort to remove strlcpy() completely [2], replace strlcpy() here with strscpy(). No return values were used, so direct replacement is safe. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy [2] https://github.com/KSPP/linux/issues/89 Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com> Link: https://lore.kernel.org/r/20230530155745.343032-1-azeemshaikh38@gmail.com Reviewed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-05-31 17:59:06 -04:00
Justin Tee	97f975823f	scsi: lpfc: Fix double free in lpfc_cmpl_els_logo_acc() caused by lpfc_nlp_not_used() Smatch detected a double free path because lpfc_nlp_not_used() releases an ndlp object before reaching lpfc_nlp_put() at the end of lpfc_cmpl_els_logo_acc(). Remove the outdated lpfc_nlp_not_used() routine. In lpfc_mbx_cmpl_ns_reg_login(), replace the call with lpfc_nlp_put(). In lpfc_cmpl_els_logo_acc(), replace the call with lpfc_unreg_rpi() and keep the lpfc_nlp_put() at the end of the routine. If ndlp's rpi was registered, then lpfc_unreg_rpi()'s completion routine performs the final ndlp clean up after lpfc_nlp_put() is called from lpfc_cmpl_els_logo_acc(). Otherwise if ndlp has no rpi registered, the lpfc_nlp_put() at the end of lpfc_cmpl_els_logo_acc() is the final ndlp clean up. Fixes: `4430f7fd09` ("scsi: lpfc: Rework locations of ndlp reference taking") Cc: <stable@vger.kernel.org> # v5.11+ Reported-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/all/Y3OefhyyJNKH%2Fiaf@kili/ Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230417191558.83100-3-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-05-08 07:16:04 -04:00
Justin Tee	796876fdae	scsi: lpfc: Revise lpfc_error_lost_link() reason code evaluation logic Extended status reason code errors should mask off the IOERR_PARAM_MASK before checking strict equalities for IOERR values. Update the lpfc_error_lost_link() routine as such. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230301231626.9621-9-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-03-09 21:21:45 -05:00
Justin Tee	1d0f9fea5d	scsi: lpfc: Defer issuing new PLOGI if received RSCN before completing REG_LOGIN When mapped to a target with multiple virtual ports, a link bounce sometimes results in unsuccessful rediscovery of all of the target's virtual ports. This is because a succession of repeat RSCNs for the virtual target ports leaves ndlps in the REG_LOGIN state with the NLP_REG_LOGIN_SEND flag set. With NLP_REG_LOGIN_SEND set, during the next PLOGI, the driver will UNREG_RPI. When UNREG_RPI is processed, the driver can be in the middle of PRLI_ISSUE or MAPPED state resulting in an illegal state transition by the discovery engine and stalling. Fix by calling the discovery state machine with DEVICE_RECOVERY event during RSCN processing. This will set the NLP_IGNR_REG_CMPL bit and prevent the old REG_LOGIN state from advancing. Then for the new PLOGI issue, add the check for the NLP_IGNR_REG_CMPL bit to delay issuing the new PLOGI until the queued REG_LOGIN and UNREG_LOGIN have been processed. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230301231626.9621-6-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-03-09 21:21:44 -05:00
Bo Liu	442336a5a9	scsi: lpfc: Fix double word in comments Remove the repeated word "the" in comments. [mkp: fixed additional typos in the changed lines] Link: https://lore.kernel.org/r/20230217083046.4090-1-liubo03@inspur.com Signed-off-by: Bo Liu <liubo03@inspur.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-02-21 22:00:51 -05:00
Justin Tee	191b5a3877	scsi: lpfc: Copyright updates for 14.2.0.10 patches Update copyrights to 2023 for files modified in the 14.2.0.10 patch set. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-01-12 00:03:15 -05:00
Justin Tee	ecdf4ddf4e	scsi: lpfc: Remove duplicate ndlp kref decrement in lpfc_cleanup_rpis() With faulty cables in PT2PT topology, an unintentional ndlp double kref decrement can occur. If a FLOGI request is outstanding before the link goes down, the missing FLOGI_ACC causes an F_Port ndlp to remain in the UNUSED state. During link down, lpfc_cleanup_rpis() is called and decrements an ndlp kref. Additionally, when the driver later decides to abort the FLOGI, the FLOGI completion handler decrements the ndlp kref a second time. Remove duplicate clean up logic in lpfc_cleanup_rpis() because the updated FLOGI completion handler already handles the ndlp kref decrement. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-01-12 00:03:14 -05:00
Justin Tee	97f256913c	scsi: lpfc: Fix crash involving race between FLOGI timeout and devloss handler When a FLOGI completes with a sequence timeout error, a freed kref ptr dereference crash can occur due to a timing race involving ndlp referencing in lpfc_dev_loss_tmo_callbk. Fix by ensuring the driver accounts for an outstanding FLOGI when dev_loss is active. Also, don't remove the HBA_FLOGI_OUTSTANDING flag when the FLOGI is retried to allow the driver to handle the reference counts correctly in lpfc_dev_loss_tmo_handler. Reported-by: Dietmar Hahn <dietmar.hahn@fujitsu.com> Tested-by: Dietmar Hahn <dietmar.hahn@fujitsu.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20221116011921.105995-5-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-11-17 18:18:42 +00:00
Jason A. Donenfeld	7e3cf0843f	treewide: use get_random_{u8,u16}() when possible, part 1 Rather than truncate a 32-bit value to a 16-bit value or an 8-bit value, simply use the get_random_{u8,u16}() functions, which are faster than wasting the additional bytes from a 32-bit value. This was done mechanically with this coccinelle script: @@ expression E; identifier get_random_u32 =~ "get_random_int\|prandom_u32\|get_random_u32"; typedef u16; typedef __be16; typedef __le16; typedef u8; @@ ( - (get_random_u32() & 0xffff) + get_random_u16() \| - (get_random_u32() & 0xff) + get_random_u8() \| - (get_random_u32() % 65536) + get_random_u16() \| - (get_random_u32() % 256) + get_random_u8() \| - (get_random_u32() >> 16) + get_random_u16() \| - (get_random_u32() >> 24) + get_random_u8() \| - (u16)get_random_u32() + get_random_u16() \| - (u8)get_random_u32() + get_random_u8() \| - (__be16)get_random_u32() + (__be16)get_random_u16() \| - (__le16)get_random_u32() + (__le16)get_random_u16() \| - prandom_u32_max(65536) + get_random_u16() \| - prandom_u32_max(256) + get_random_u8() \| - E->inet_id = get_random_u32() + E->inet_id = get_random_u16() ) @@ identifier get_random_u32 =~ "get_random_int\|prandom_u32\|get_random_u32"; typedef u16; identifier v; @@ - u16 v = get_random_u32(); + u16 v = get_random_u16(); @@ identifier get_random_u32 =~ "get_random_int\|prandom_u32\|get_random_u32"; typedef u8; identifier v; @@ - u8 v = get_random_u32(); + u8 v = get_random_u8(); @@ identifier get_random_u32 =~ "get_random_int\|prandom_u32\|get_random_u32"; typedef u16; u16 v; @@ - v = get_random_u32(); + v = get_random_u16(); @@ identifier get_random_u32 =~ "get_random_int\|prandom_u32\|get_random_u32"; typedef u8; u8 v; @@ - v = get_random_u32(); + v = get_random_u8(); // Find a potential literal @literal_mask@ expression LITERAL; type T; identifier get_random_u32 =~ "get_random_int\|prandom_u32\|get_random_u32"; position p; @@ ((T)get_random_u32()@p & (LITERAL)) // Examine limits @script:python add_one@ literal << literal_mask.LITERAL; RESULT; @@ value = None if literal.startswith('0x'): value = int(literal, 16) elif literal[0] in '123456789': value = int(literal, 10) if value is None: print("I don't know how to handle %s" % (literal)) cocci.include_match(False) elif value < 256: coccinelle.RESULT = cocci.make_ident("get_random_u8") elif value < 65536: coccinelle.RESULT = cocci.make_ident("get_random_u16") else: print("Skipping large mask of %s" % (literal)) cocci.include_match(False) // Replace the literal mask with the calculated result. @plus_one@ expression literal_mask.LITERAL; position literal_mask.p; identifier add_one.RESULT; identifier FUNC; @@ - (FUNC()@p & (LITERAL)) + (RESULT() & LITERAL) Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-10-11 17:42:58 -06:00
James Smart	a4de8356b6	scsi: lpfc: Fix various issues reported by tools This patch fixes below Smatch reported issues: 1. lpfc_hbadisc.c:3020 lpfc_mbx_cmpl_fcf_rr_read_fcf_rec() error: uninitialized symbol 'vlan_id'. 2. lpfc_hbadisc.c:3121 lpfc_mbx_cmpl_read_fcf_rec() error: uninitialized symbol 'vlan_id'. 3. lpfc_init.c:335 lpfc_dump_wakeup_param_cmpl() warn: always true condition '(prg->dist < 4) => (0-3 < 4)' 4. lpfc_init.c:2419 lpfc_parse_vpd() warn: inconsistent indenting. 5. lpfc_init.c:13248 lpfc_sli4_enable_msi() warn: 'phba->pcidev->irq' 2147483648 can't fit into 65535 'eqhdl->irq' 6. lpfc_debugfs.c:5300 lpfc_idiag_extacc_avail_get() error: uninitialized symbol 'ext_cnt' 7. lpfc_debugfs.c:5300 lpfc_idiag_extacc_avail_get() error: uninitialized symbol 'ext_size' 8. lpfc_vmid.c:248 lpfc_vmid_get_appid() warn: sleeping in atomic context. 9. lpfc_init.c:8342 lpfc_sli4_driver_resource_setup() warn: missing error code 'rc'. 10. lpfc_init.c:13573 lpfc_sli4_hba_unset() warn: variable dereferenced before check 'phba->pport' (see line 13546) 11. lpfc_auth.c:1923 lpfc_auth_handle_dhchap_reply() error: double free of 'hash_value' Fixes: 1. Initialize vlan_id to LPFC_FCOE_NULL_VID. 2. Initialize vlan_id to LPFC_FCOE_NULL_VID. 3. prg->dist is a 2 bit field. Its value can only be between 0-3. Remove redundent check 'if (prg->dist < 4)'. 4. Fix inconsistent indenting. Moved logic into helper function lpfc_fill_vpd(). 5. Define 'eqhdl->irq' as int value as pci_irq_vector() returns int. Also, check for return value of pci_irq_vector() and log message in case of failure. 6. Initialize 'ext_cnt' to 0. 7. Initialize 'ext_size' to 0. 8. Use alloc_percpu_gfp() with GFP_ATOMIC flag. 9. 'rc' was not updated when dma_pool_create() fails. Update 'rc = -ENOMEM' when dma_pool_create() fails before calling goto statement. 10. Add check for 'phba->pport' in lpfc_cpuhp_remove(). 11. Initialize 'hash_value' to NULL, same like 'aug_chal' variable. Link: https://lore.kernel.org/r/20220911221505.117655-13-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-09-15 22:18:28 -04:00
James Smart	dbb1e2ff87	scsi: lpfc: Add reporting capability for Link Degrade Signaling Firmware reports link degrade signaling via ACQES. Handlers and new additions to the SET_FEATURES mbox command are implemented so that link degrade parameters for 64GB capable links are reported through EDC ELS frames. Link: https://lore.kernel.org/r/20220911221505.117655-12-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-09-15 22:18:27 -04:00
James Smart	11d6583d81	scsi: lpfc: Fix FLOGI ACC with wrong SID in PT2PT topology When a FLOGI is received before we have issued our FLOGI, the ACC response to the received FLOGI is issued with SID 2 instead of the expected fabric controller SID. Certain target vendors ignore the malformed ACC with SID 2 and wait for a properly filled ACC with a fabric controller SID. The lpfc_sli_prep_wqe() routine depends on the FC_PT2PT flag to fill in the fabric controller SID when in PT2PT mode, but due to a previous commit the flag was getting cleared. Fix by adding a check for the defer_flogi_acc flag to know whether or not to clear the FC_PT2PT flag on link up. Link: https://lore.kernel.org/r/20220911221505.117655-3-jsmart2021@gmail.com Fixes: `439b93293f` ("scsi: lpfc: Fix unsolicited FLOGI receive handling during PT2PT discovery") Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-09-15 22:18:26 -04:00
James Smart	2af33e5a03	scsi: lpfc: Remove SANDiags related code The SANDiags feature is unused, and related code is removed. Link: https://lore.kernel.org/r/20220819011736.14141-6-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-08-31 23:39:58 -04:00
James Smart	439b93293f	scsi: lpfc: Fix unsolicited FLOGI receive handling during PT2PT discovery During a stress offline/online test in PT2PT topology, target rediscovery can fail with a specific target vendor array. When the HBA transitions to online mode it is possible to receive an unsolicited FLOGI before processing the Link Up event. The received FLOGI will set the defer_flogi_acc_flag, which instructs the driver to wait until it transmits its own FLOGI before ACKing the received FLOGI. In this failure scenario, the link up processing clears the set defer_flogi_acc_flag before we have sent out the FLOGI. As the target has the higher WWPN and is responsible for sending the PLOGI, the target is stuck waiting for its FLOGI_ACC that the driver will never send. Remove the clear of defer_flogi_acc_flag from Link Up event processing. In this stress test case, the defer_flogi_acc_flag is cleared during the Link Down event processing anyways. Link: https://lore.kernel.org/r/20220819011736.14141-2-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-08-31 23:39:58 -04:00
James Smart	7f86d2b847	scsi: lpfc: Remove Menlo/Hornet related code The Menlo/Hornet adapter was never released to the field. As such, driver code specific to the adapter is unnecessary and should be removed. Link: https://lore.kernel.org/r/20220701211425.2708-11-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-07-07 17:21:44 -04:00
James Smart	ffc566411a	scsi: lpfc: Revert RSCN_MEMENTO workaround for misbehaved configuration The RSCN_MEMENTO logic was to workaround a target that does not register both FCP and NVMe FC4 types at the same time. This caused the configuration to not produce a second RSCN for the NVMe FC4 type registration in a timely manner. The intention of the RSCN_MEMENTO flag was to always signal to try NVMe PRLI. However, there are other FCP-only target arrays in correctly behaved configurations that reject the NVMe PRLI followed by a LOGO leading to never rediscovering the target after an issue_lip (as LOGO causes a repeat of PLOGI/PRLIs). Revert the RSCN_MEMENTO patch as it is causing correctly behaved configs to fail while it exists only to succeed on a misbehaved config. Link: https://lore.kernel.org/r/20220701211425.2708-9-jsmart2021@gmail.com Fixes: `1045592fc9` ("scsi: lpfc: Introduce FC_RSCN_MEMENTO flag for tracking post RSCN completion") Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-07-07 17:21:44 -04:00
James Smart	de3ec318fe	scsi: lpfc: Rework FDMI initialization after link up After a link up, it's possible for the switch to change FDMI support (e.g. FDMI1 vs FDMI2 vs SmartSAN). If the switch reverts to FDMI1, then the revert is currently not detected. Additionally, when NPIV is configured, it's possible the physical port's RHBA is unprocessed by the switch before reciept of an NPIV port issued RPRT. This causes some switches vendors to reject the NPIV's RPRT. Fix by reinitializing base FDMI mode on link up, and defer FDMI vport RPRT submission until after confirming physical port's RHBA is completed. Link: https://lore.kernel.org/r/20220506035519.50908-10-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-05-10 22:12:04 -04:00
James Smart	dc8a71bd41	scsi: lpfc: Decrement outstanding gidft_inp counter if lpfc_err_lost_link() During large NPIV port testing, it was sometimes seen that not all vports would log back in to the target device. There are instances when the fabric is slow to respond to a spam of GID_PT requests and as a result the SLI PORT may abort the GID_PT request because the fabric takes so long. lpfc_cmpl_ct_cmd_gid_pt() would enter the lpfc_err_lost_link() logic and attempt to lpfc_els_flush_rscn(), which is fine, but forgets to decrement the gidft_inp counter. This results in a vport->gidft_inp never reaching 0 and never restarting discovery again. Decrement vport->gidft_inp if lpfc_err_lost_link() is true for both lpfc_cmpl_ct_cmd_gid_pt() and lpfc_cmpl_ct_cmd_gid_ft(). Increase logging info during RSCN timeout and lpfc_err_lost_link() events. Link: https://lore.kernel.org/r/20220506035519.50908-8-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-05-10 22:12:03 -04:00
James Smart	ead76d4c09	scsi: lpfc: Inhibit aborts if external loopback plug is inserted After running a short external loopback test, when the external loopback is removed and a normal cable inserted that is directly connected to a target device, the system oops in the llpfc_set_rrq_active() routine. When the loopback was inserted an FLOGI was transmit. As we're looped back, we receive the FLOGI request. The FLOGI is ABTS'd as we recognize the same wppn thus understand it's a loopback. However, as the ABTS sends address information the port is not set to (fffffe), the ABTS is dropped on the wire. A short 1 frame loopback test is run and completes before the ABTS times out. The looback is unplugged and the new cable plugged in, and the an FLOGI to the new device occurs and completes. Due to a mixup in ref counting the completion of the new FLOGI releases the fabric ndlp. Then the original ABTS completes and references the released ndlp generating the oops. Correct by no-op'ing the ABTS when in loopback mode (it will be dropped anyway). Added a flag to track the mode to recognize when it should be no-op'd. Link: https://lore.kernel.org/r/20220506035519.50908-5-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-05-10 22:12:03 -04:00
James Smart	b7e952cbc6	scsi: lpfc: Fix ndlp put following a LOGO completion During testing with repeated asynchronous resets of the target, an issue was found when the driver issues a LOGO to disconnect its login and recover all exchanges. The LOGO command takes a node reference but neglects to remove it, keeping the node reference count artifically high. Add a call to lpfc_nlp_put() to lpfc_nlp_logo_unreg() and move the mempool free call to the routine exit along with the needed put. This is always safe as this will not be the last reference removed as lpfc_unreg_rpi() ensures there is an additional reference on the ndlp. Link: https://lore.kernel.org/r/20220506035519.50908-4-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-05-10 22:12:02 -04:00
James Smart	1b6f71f7fc	scsi: lpfc: Change FA-PWWN detection methodology Do not rely on vendor version field of the CSPs to determine if we are in a FA-PWWN environment. Instead, use the following procedure: First, during HBA initialization, driver does a READ_CONFIG to determine if FA-PWWN is configured on the HBA. A LPFC_FAWWPN_CONFIG hba_flag is set accordingly. Next, when the link comes up before the driver gets a link up event, the firmware logs into the fabric with FA-PWWN. If the fabric port does not support FA-PWWN, the driver will get a Misconfigured FA-WWN async event before the link up. A LPFC_FAWWPN_FABRIC hba_flag will be set accordingly. Finally, if the fabric supports FA-PWWN, the firmware will replace its CSPs WWN with the Fabric Assigned ones. Then after link up, the driver will retrieve the Fabric Assigned WWN when it does a READ_SPARAM mbox command. Link: https://lore.kernel.org/r/20220412222008.126521-23-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-04-18 22:48:47 -04:00
James Smart	ef47575fd9	scsi: lpfc: Refactor cleanup of mailbox commands The intention of this patch is to refactor mailbox memory allocation and cleanup steps in one routine respectively to prevent memory leaks or memory errors related to mailbox commands. There are trivial localized fixes as well. Provide lpfc_mbox_rsrc_prep() - this routine allocates the dmabuf and the mbuf associated with it. It also catches allocation errors and returns status. Provide lpfc_mbox_rsrc_cleanup() - this routine verifies a dmabuf exists and if so releases the associated mbuf and the dmabuf memory. It then sets the ctx_buf to NULL and releases the mailbox memory to the mailbox pool. Link: https://lore.kernel.org/r/20220412222008.126521-22-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-04-18 22:48:47 -04:00
James Smart	d51cf5bd92	scsi: lpfc: Fix field overload in lpfc_iocbq data structure The lpfc_iocbq data structure has void * pointers that are overloaded to be as many as 8 different data types and the driver translates the void * by casting. This patch removes the void * pointers by declaring the specific types needed by the driver. It also expands the context_un to include more seldom used pointer types to save structure bytes. It also groups the u8 types together to pack the 8 bytes needed. This work allows the lpfc_iocbq data structure to be more strongly typed and keeps it from being allocated from the 512 byte slab. [mkp: rolled in zeroday fix] Link: https://lore.kernel.org/r/20220412222008.126521-21-jsmart2021@gmail.com Reported-by: kernel test robot <lkp@intel.com> Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-04-18 22:48:46 -04:00
James Smart	1045592fc9	scsi: lpfc: Introduce FC_RSCN_MEMENTO flag for tracking post RSCN completion During an NVMe target reboot, the target may initialize itself as FCP only during the first RSCN and shortly after trigger a second RSCN claiming NVMe support. The timing of these RSCNs occur before FCP-PRLI for the first RSCN completes leading discovery issues over NVMe. Change RSCN and NVME-PRLI send logic based on a new FC_RSCN_MEMENTO flag that signals when lpfc_end_rscn() is completed and serves as a memento that discovery was started from RSCN. Link: https://lore.kernel.org/r/20220412222008.126521-20-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-04-18 22:48:46 -04:00
James Smart	a4691038b4	scsi: lpfc: Fix unload hang after back to back PCI EEH faults When injecting EEH errors the port is getting hung up waiting on the node list to empty, message number 0233. The driver is stuck at this point and also can't unload. The driver makes transport remoteport delete calls which try to abort I/O's, but the EEH daemon has already called the driver to detach and the detachment has set the global FC_UNLOADING flag. There are several code paths that will avoid I/O cleanup if the FC_UNLOADING flag is set, resulting in transports waiting for I/O while the driver is waiting on transports to clean up. Additionally, during study of the list, a locking issue was found in lpfc_sli_abort_iocb_ring that could corrupt the list. A special case was added to the lpfc_cleanup() routine to call lpfc_sli_flush_rings() if the driver is FC_UNLOADING and if the pci-slot is offline (e.g. EEH). The SLI4 part of lpfc_sli_abort_iocb_ring() is changed to use the ring_lock. Also added code to cancel the I/Os if the pci-slot is offline and added checks and returns for the FC_UNLOADING and HBA_IOQ_FLUSH flags to prevent trying to send an I/O that we cannot handle. Link: https://lore.kernel.org/r/20220317032737.45308-3-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-03-29 23:19:38 -04:00
James Smart	35ed9613d8	scsi: lpfc: Improve PCI EEH Error and Recovery Handling Following EEH errors, the driver can crash or hang when deleting the localport or when attempting to unload. The EEH handlers in the driver did not notify the NVMe-FC transport before tearing the driver down. This was delayed until the resume steps. This worked for SCSI because lpfc_block_scsi() would notify the scsi_fc_transport that the target was not available but it would not clean up all the references to the ndlp. The SLI3 prep for dev reset handler did the lpfc_offline_prep() and lpfc_offline() calls to get the port stopped before restarting. The SLI4 version of the prep for dev reset just destroyed the queues and did not stop NVMe from continuing. Also because the port was not really stopped the localport destroy would hang because the transport was still waiting for I/O. Additionally, a devloss tmo can fire and post events to a stopped worker thread creating another hang condition. lpfc_sli4_prep_dev_for_reset() is modified to call lpfc_offline_prep() and lpfc_offline() rather than just lpfc_scsi_dev_block() to ensure both SCSI and NVMe transports are notified to block I/O to the driver. Logic is added to devloss handler and worker thread to clean up ndlp references and quiesce appropriately. Link: https://lore.kernel.org/r/20220317032737.45308-2-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-03-29 23:19:37 -04:00

1 2 3 4 5 ...

402 Commits