Commit Graph

1050 Commits

Author SHA1 Message Date
Manivannan Sadhasivam
5aa326a6a2 PCI/PTM: Build debugfs code only if CONFIG_DEBUG_FS is enabled
Otherwise, the following build error will happen for CONFIG_DEBUG_FS=n &&
CONFIG_PCIE_PTM=y:

  drivers/pci/pcie/ptm.c:498:25: error: redefinition of 'pcie_ptm_create_debugfs'
    498 | struct pci_ptm_debugfs *pcie_ptm_create_debugfs(struct device *dev, void *pdata,
	|                         ^
  ./include/linux/pci.h:1915:2: note: previous definition is here
   1915 | *pcie_ptm_create_debugfs(struct device *dev, void *pdata,
	|  ^
  drivers/pci/pcie/ptm.c:546:6: error: redefinition of 'pcie_ptm_destroy_debugfs'
    546 | void pcie_ptm_destroy_debugfs(struct pci_ptm_debugfs *ptm_debugfs)
	|      ^
  ./include/linux/pci.h:1918:1: note: previous definition is here
   1918 | pcie_ptm_destroy_debugfs(struct pci_ptm_debugfs *ptm_debugfs) { }
	|

Fixes: 132833405e ("PCI: Add debugfs support for exposing PTM context")
Reported-by: Eric Biggers <ebiggers@kernel.org>
Closes: https://lore.kernel.org/linux-pci/20250607025506.GA16607@sol
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/20250608033305.15214-1-manivannan.sadhasivam@linaro.org
2025-06-23 12:55:49 -05:00
Bjorn Helgaas
db847adbf9 Merge branch 'pci/ptm-debugfs'
- Add debugfs support for exposing DWC device-specific PTM context
  (Manivannan Sadhasivam)

* pci/ptm-debugfs:
  PCI: qcom-ep: Mask PTM_UPDATING interrupt
  PCI: dwc: Add debugfs support for PTM context
  PCI: dwc: Pass DWC PCIe mode to dwc_pcie_debugfs_init()
  PCI: Add debugfs support for exposing PTM context
2025-06-04 10:50:44 -05:00
Bjorn Helgaas
1acf6a5e79 Merge branch 'pci/bwctrl'
- Simplify link bandwidth controller by replacing the count of Link
  Bandwidth Management Status (LBMS) events with a PCI_LINK_LBMS_SEEN flag
  (Ilpo Järvinen)

- Update the Link Speed after retraining, since the Link Speed may have
  changed (Ilpo Järvinen)

* pci/bwctrl:
  PCI: Update Link Speed after retraining
  PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag
2025-06-04 10:49:49 -05:00
Manivannan Sadhasivam
b06d125e62 PCI/ERR: Remove misleading TODO regarding kernel panic
A PCI device is just another peripheral in a system. So failure to
recover it, must not result in a kernel panic. So remove the TODO which
is quite misleading.

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
Link: https://patch.msgid.link/20250508-pcie-reset-slot-v4-1-7050093e2b50@linaro.org
2025-05-30 12:21:19 -05:00
Jon Pan-Doh
b4fe7398de PCI/AER: Add sysfs attributes for log ratelimits
Allow userspace to read/write log ratelimits per device (including
enable/disable). Create aer/ sysfs directory to store them and any
future AER configs.

The new sysfs files are:

  /sys/bus/pci/devices/*/aer/correctable_ratelimit_burst
  /sys/bus/pci/devices/*/aer/correctable_ratelimit_interval_ms
  /sys/bus/pci/devices/*/aer/nonfatal_ratelimit_burst
  /sys/bus/pci/devices/*/aer/nonfatal_ratelimit_interval_ms

The default values are ratelimit_burst=10, ratelimit_interval_ms=5000, so
if we try to emit more than 10 messages in a 5 second period, some are
suppressed.

Update AER sysfs ABI filename to reflect the broader scope of AER sysfs
attributes (e.g. stats and ratelimits).

  Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats ->
    sysfs-bus-pci-devices-aer

Tested using aer-inject[1]. Configured correctable log ratelimit to 5.
Sent 6 AER errors. Observed 5 errors logged while AER stats
(cat /sys/bus/pci/devices/<dev>/aer_dev_correctable) shows 6.

Disabled ratelimiting and sent 6 more AER errors. Observed all 6 errors
logged and accounted in AER stats (12 total errors).

[1] https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git

[bhelgaas: note fatal errors are not ratelimited, "aer_report" ->
"aer_info", replace ratelimit_log_enable toggle with *_ratelimit_interval_ms]

Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com>
Signed-off-by: Jon Pan-Doh <pandoh@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/20250522232339.1525671-21-helgaas@kernel.org
2025-05-23 11:11:45 -05:00
Jon Pan-Doh
a57f2bfb4a PCI/AER: Ratelimit correctable and non-fatal error logging
Spammy devices can flood kernel logs with AER errors and slow/stall
execution. Add per-device ratelimits for AER correctable and non-fatal
uncorrectable errors that use the kernel defaults (10 per 5s).  Logging of
fatal errors is not ratelimited.

There are two AER logging entry points:

  - aer_print_error() is used by DPC and native AER

  - pci_print_aer() is used by GHES and CXL

The native AER aer_print_error() case includes a loop that may log details
from multiple devices, which are ratelimited individually.  If we log
details for any device, we also log the Error Source ID from the Root Port
or RCEC.

If no such device details are found, we still log the Error Source from the
ERR_* Message, ratelimited by the Root Port or RCEC that received it.

The DPC aer_print_error() case is not ratelimited, since this only happens
for fatal errors.

The CXL pci_print_aer() case is ratelimited by the Error Source device.

The GHES pci_print_aer() case is via aer_recover_work_func(), which
searches for the Error Source device.  If the device is not found, there's
no per-device ratelimit, so we use a system-wide ratelimit that covers all
error types (correctable, non-fatal, and fatal).

Sargun at Meta reported internally that a flood of AER errors causes RCU
CPU stall warnings and CSD-lock warnings.

Tested using aer-inject[1]. Sent 11 AER errors. Observed 10 errors logged
while AER stats (cat /sys/bus/pci/devices/<dev>/aer_dev_correctable) show
true count of 11.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git

[bhelgaas: commit log, factor out trace_aer_event() and aer_print_rp_info()
changes to previous patches, enable Error Source logging if any downstream
detail will be printed, don't ratelimit fatal errors, "aer_report" ->
"aer_info", "cor_log_ratelimit" -> "correctable_ratelimit",
"uncor_log_ratelimit" -> "nonfatal_ratelimit"]

Reported-by: Sargun Dhillon <sargun@meta.com>
Signed-off-by: Jon Pan-Doh <pandoh@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/20250522232339.1525671-19-helgaas@kernel.org
2025-05-23 11:11:45 -05:00
Bjorn Helgaas
d72bae4230 PCI/AER: Simplify add_error_device()
Return -ENOSPC error early so the usual path through add_error_device() is
the straightline code.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/20250522232339.1525671-18-helgaas@kernel.org
2025-05-23 11:11:45 -05:00
Bjorn Helgaas
94bc15c348 PCI/AER: Convert aer_get_device_error_info(), aer_print_error() to index
Previously aer_get_device_error_info() and aer_print_error() took a pointer
to struct aer_err_info and a pointer to a pci_dev.  Typically the pci_dev
was one of the elements of the aer_err_info.dev[] array (DPC was an
exception, where the dev[] array was unused).

Convert aer_get_device_error_info() and aer_print_error() to take an index
into the aer_err_info.dev[] array instead.  A future patch will add
per-device ratelimit information, so the index makes it convenient to find
the ratelimit associated with the device.

To accommodate DPC, set info->dev[0] to the DPC port before using these
interfaces.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/20250522232339.1525671-17-helgaas@kernel.org
2025-05-23 11:11:36 -05:00
Karolina Stolarek
09683a6184 PCI/AER: Rename struct aer_stats to aer_info
Update name to reflect the broader definition of structs/variables that are
stored (e.g. ratelimits). This is a preparatory patch for adding rate limit
support.

[bhelgaas: "aer_report" -> "aer_info"]

Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://patch.msgid.link/20250522232339.1525671-16-helgaas@kernel.org
2025-05-23 11:02:33 -05:00
Karolina Stolarek
36c5932074 PCI/AER: Reduce pci_print_aer() correctable error level to KERN_WARNING
Some existing logs in pci_print_aer() log with error severity by default.

Convert them to use KERN_WARNING for correctable errors and KERN_ERR for
uncorrectable errors.

[bhelgaas: commit log]

Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://patch.msgid.link/20250522232339.1525671-15-helgaas@kernel.org
2025-05-23 11:02:19 -05:00
Bjorn Helgaas
82013ff394 PCI/ERR: Add printk level to pcie_print_tlp_log()
aer_print_error() produces output at a printk level (KERN_ERR/KERN_WARNING/
etc) that depends on the kind of error, and it calls pcie_print_tlp_log(),
which previously always produced output at KERN_ERR.

Add a "level" parameter so aer_print_error() can control the level of the
pcie_print_tlp_log() output to match.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/20250522232339.1525671-14-helgaas@kernel.org
2025-05-23 11:02:03 -05:00
Karolina Stolarek
c8f6791e33 PCI/AER: Check log level once and remember it
When reporting an AER error, we check its type multiple times to determine
the log level for each message. Do this check only in the top-level
functions (aer_isr_one_error(), pci_print_aer()) and save the level in
struct aer_err_info.

[bhelgaas: save log level in struct aer_err_info instead of passing it
as a parameter]

Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://patch.msgid.link/20250522232339.1525671-13-helgaas@kernel.org
2025-05-23 11:01:52 -05:00
Bjorn Helgaas
6bb4befbd6 PCI/AER: Trace error event before ratelimiting
As with the AER statistics, we always want to emit trace events, even if
the actual dmesg logging is rate limited.

Call trace_aer_event() immediately after pci_dev_aer_stats_incr() so both
happen before ratelimiting.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/20250522232339.1525671-12-helgaas@kernel.org
2025-05-23 11:01:42 -05:00
Bjorn Helgaas
88a7765e62 PCI/AER: Update statistics before ratelimiting
There are two AER logging entry points:

  - aer_print_error() is used by DPC (dpc_process_error()) and native AER
    handling (aer_process_err_devices()).

  - pci_print_aer() is used by GHES (aer_recover_work_func()) and CXL
    (cxl_handle_rdport_errors())

Both use __aer_print_error() to print the AER error bits.  Previously
__aer_print_error() also incremented the AER statistics via
pci_dev_aer_stats_incr().

Call pci_dev_aer_stats_incr() early in the entry points instead of in
__aer_print_error() so we update the statistics even if the actual printing
of error bits is rate limited by a future change.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://patch.msgid.link/20250522232339.1525671-11-helgaas@kernel.org
2025-05-23 11:01:28 -05:00
Bjorn Helgaas
ad9839137c PCI/AER: Simplify pci_print_aer()
Simplify pci_print_aer() by initializing the struct aer_err_info "info"
with a designated initializer list (it was previously initialized with
memset()) and using pci_name().

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/20250522232339.1525671-10-helgaas@kernel.org
2025-05-23 11:01:17 -05:00
Bjorn Helgaas
57964ba390 PCI/AER: Initialize aer_err_info before using it
Previously the struct aer_err_info "e_info" was allocated on the stack
without being initialized, so it contained junk except for the fields we
explicitly set later.

Initialize "e_info" at declaration with a designated initializer list,
which initializes the other members to zero.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://patch.msgid.link/20250522232339.1525671-9-helgaas@kernel.org
2025-05-23 11:01:06 -05:00
Bjorn Helgaas
ca2426a570 PCI/AER: Move aer_print_source() earlier in file
Move aer_print_source() earlier in the file so a future change can use it
from aer_print_error(), where it's easier to rate limit it.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://patch.msgid.link/20250522232339.1525671-8-helgaas@kernel.org
2025-05-23 11:00:54 -05:00
Jon Pan-Doh
99c3fd0de8 PCI/AER: Rename aer_print_port_info() to aer_print_source()
Rename aer_print_port_info() to aer_print_source() to be more descriptive.
This logs the Error Source ID logged by a Root Port or Root Complex Event
Collector when it receives an ERR_COR, ERR_NONFATAL, or ERR_FATAL Message.

[bhelgaas: aer_print_rp_info() -> aer_print_source()]

Signed-off-by: Jon Pan-Doh <pandoh@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://patch.msgid.link/20250522232339.1525671-7-helgaas@kernel.org
2025-05-23 11:00:38 -05:00
Bjorn Helgaas
f40bd28655 PCI/AER: Extract bus/dev/fn in aer_print_port_info() with PCI_BUS_NUM(), etc
Use PCI_BUS_NUM(), PCI_SLOT(), PCI_FUNC() to extract the bus number,
device, and function number directly from the Error Source ID.  There's no
need to shift and mask it explicitly.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://patch.msgid.link/20250522232339.1525671-6-helgaas@kernel.org
2025-05-23 11:00:27 -05:00
Bjorn Helgaas
6a1eda7459 PCI/AER: Consolidate Error Source ID logging in aer_isr_one_error_type()
Previously we decoded the AER Error Source ID in aer_isr_one_error_type(),
then again in find_source_device() if we didn't find any devices with
errors logged in their AER Capabilities.

Consolidate this so we only decode and log the Error Source ID once in
aer_isr_one_error_type().  Add a "found" parameter so we can add a note
when we didn't find any downstream devices with errors logged in their AER
Capability.

This changes the dmesg logging when we found no devices with errors logged:

  - pci 0000:00:01.0: AER: Correctable error message received from 0000:02:00.0
  - pci 0000:00:01.0: AER: found no error details for 0000:02:00.0
  + pci 0000:00:01.0: AER: Correctable error message received from 0000:02:00.0 (no details found)

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://patch.msgid.link/20250522232339.1525671-5-helgaas@kernel.org
2025-05-23 11:00:14 -05:00
Bjorn Helgaas
6fc4dae74a PCI/AER: Factor COR/UNCOR error handling out from aer_isr_one_error()
aer_isr_one_error() duplicates the Error Source ID logging and AER error
processing for Correctable Errors and Uncorrectable Errors.  Factor out the
duplicated code to aer_isr_one_error_type().

aer_isr_one_error() doesn't need the struct aer_rpc pointer, so pass it the
Root Port or RCEC pci_dev pointer instead.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/20250522232339.1525671-4-helgaas@kernel.org
2025-05-23 11:00:01 -05:00
Bjorn Helgaas
a0b62cc310 PCI/DPC: Log Error Source ID only when valid
DPC Error Source ID is only valid when the DPC Trigger Reason indicates
that DPC was triggered due to reception of an ERR_NONFATAL or ERR_FATAL
Message (PCIe r6.0, sec 7.9.14.5).

When DPC was triggered by ERR_NONFATAL (PCI_EXP_DPC_STATUS_TRIGGER_RSN_NFE)
or ERR_FATAL (PCI_EXP_DPC_STATUS_TRIGGER_RSN_FE) from a downstream device,
log the Error Source ID (decoded into domain/bus/device/function).  Don't
print the source otherwise, since it's not valid.

For DPC trigger due to reception of ERR_NONFATAL or ERR_FATAL, the dmesg
logging changes:

  - pci 0000:00:01.0: DPC: containment event, status:0x000d source:0x0200
  - pci 0000:00:01.0: DPC: ERR_FATAL detected
  + pci 0000:00:01.0: DPC: containment event, status:0x000d, ERR_FATAL received from 0000:02:00.0

and when DPC triggered for other reasons, where DPC Error Source ID is
undefined, e.g., unmasked uncorrectable error:

  - pci 0000:00:01.0: DPC: containment event, status:0x0009 source:0x0200
  - pci 0000:00:01.0: DPC: unmasked uncorrectable error detected
  + pci 0000:00:01.0: DPC: containment event, status:0x0009: unmasked uncorrectable error detected

Previously the "containment event" message was at KERN_INFO and the
"%s detected" message was at KERN_WARNING.  Now the single message is at
KERN_WARNING.

Fixes: 26e5157133 ("PCI: Add Downstream Port Containment driver")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://patch.msgid.link/20250522232339.1525671-3-helgaas@kernel.org
2025-05-23 10:59:40 -05:00
Bjorn Helgaas
a424b598e6 PCI/DPC: Initialize aer_err_info before using it
Previously the struct aer_err_info "info" was allocated on the stack
without being initialized, so it contained junk except for the fields we
explicitly set later.

Initialize "info" at declaration so it starts as all zeros.

Fixes: 8aefa9b0d9 ("PCI/DPC: Print AER status in DPC event handling")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://patch.msgid.link/20250522232339.1525671-2-helgaas@kernel.org
2025-05-23 10:58:52 -05:00
Ilpo Järvinen
6ade6e81f8 PCI: Update Link Speed after retraining
PCIe Link Retraining can alter Link Speed. pcie_retrain_link() that
performs the Link Training is called from bwctrl and ASPM driver.

While bwctrl listens for Link Bandwidth Management Status (LBMS) to
pick up changes in Link Speed, there is a race between
pcie_reset_lbms() clearing LBMS after the Link Training and
pcie_bwnotif_irq() reading the Link Status register. If LBMS is already
cleared when the irq handler reads the register, the interrupt handler
will return early with IRQ_NONE and won't update the Link Speed.

When Link Speed update originates from bwctrl,
pcie_bwctrl_change_speed() ensures Link Speed is updated after the
retraining. ASPM driver, however, calls pcie_retrain_link() but does
not update the Link Speed after retraining which can result in stale
Link Speed. Also, it is possible to have ASPM support with
CONFIG_PCIEPORTBUS=n in which case bwctrl will not be built in (and
thus won't update the Link Speed at all).

To ensure Link Speed is not left stale after Link Training, move the
call to pcie_update_link_speed() from pcie_bwctrl_change_speed() into
pcie_retrain_link().

Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Link: https://lore.kernel.org/linux-pci/aBCjpfyYmlkJ12AZ@wunner.de
Link: https://lore.kernel.org/r/20250514132821.15705-1-ilpo.jarvinen@linux.intel.com
2025-05-15 16:28:44 +00:00
Manivannan Sadhasivam
132833405e PCI: Add debugfs support for exposing PTM context
Precision Time Management (PTM) mechanism defined in PCIe spec r6.0,
sec 6.21 allows precise coordination of timing information across multiple
components in a PCIe hierarchy with independent local time clocks.

PCI core already supports enabling PTM in the root port and endpoint
devices through PTM Extended Capability registers. But the PTM context
supported by the PTM capable components such as Root Complex (RC) and
Endpoint (EP) controllers were not exposed as of now. Part of the reason is
that the spec doesn't define how the context information is exposed to the
software and left it to the vendor implementation. So there is no
standardized way to get access to the context information and each vendor
have defined their own way.

This commit adds debugfs support to expose the PTM context to userspace
from both PCIe RC and EP controllers. Since the context information is
exposed in a vendor specific way, the debugfs interface allows the
controller drivers to implement callbacks for each attribute, to be called
by the generic PTM driver.

The Controller drivers are expected to call pcie_ptm_create_debugfs() to
create the debugfs attributes for the PTM context and call
pcie_ptm_destroy_debugfs() to destroy them. The drivers should also
populate the relevant callbacks in the 'struct pcie_ptm_ops' structure
based on the controller implementation.

Below PTM context are exposed through debugfs:

PCIe RC
=======

1. PTM Local clock
2. PTM T2 timestamp
3. PTM T3 timestamp
4. PTM Context valid

PCIe EP
=======

1. PTM Local clock
2. PTM T1 timestamp
3. PTM T4 timestamp
4. PTM Master clock
5. PTM Context update

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
[kwilczynski: fix overflow issue reported by Dan Carpenter from
https://lore.kernel.org/linux-pci/b41c1754-c6b7-4805-9f14-7c643d6c5304@suswa.mountain]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Link: https://patch.msgid.link/20250505-pcie-ptm-v4-1-02d26d51400b@linaro.org
2025-05-15 09:16:20 +00:00
Ilpo Järvinen
2389d8dc38 PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag
PCIe BW controller counted LBMS assertions for the purposes of the Target
Speed quirk (pcie_failed_link_retrain()). It was also a plan to expose the
LBMS count through sysfs to allow better diagnosing link related issues.
Lukas Wunner suggested, however, that adding a trace event would be better
for diagnostics purposes, leaving only pcie_failed_link_retrain() as a user
of the lbms_count.

The logic in pcie_failed_link_retrain() does not require keeping count of
LBMS assertions, so replace lbms_count with a simple flag in pci_dev's
priv_flags.  The reduced complexity allows removing pcie_bwctrl_lbms_rwsem.

Since pcie_failed_link_retrain() runs before bwctrl is probed during boot,
the LBMS in Link Status register still has to be checked by the quirk.

The priv_flags numbering is not continuous because hotplug code added a few
flags to fill numbers 4-5 (hotplug and bwctrl changes are routed through in
different branches).

Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
[kwilczynski: squashed a fix to resolve build failures from
https://lore.kernel.org/all/20250508090036.1528-1-ilpo.jarvinen@linux.intel.com]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Link: https://patch.msgid.link/20250422115548.1483-1-ilpo.jarvinen@linux.intel.com
2025-05-15 08:38:40 +00:00
Linus Torvalds
7d06015d93 Merge tag 'pci-v6.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
 "Enumeration:

   - Enable Configuration RRS SV, which makes device readiness visible,
     early instead of during child bus scanning (Bjorn Helgaas)

   - Log debug messages about reset methods being used (Bjorn Helgaas)

   - Avoid reset when it has been disabled via sysfs (Nishanth
     Aravamudan)

   - Add common pci-ep-bus.yaml schema for exporting several peripherals
     of a single PCI function via devicetree (Andrea della Porta)

   - Create DT nodes for PCI host bridges to enable loading device tree
     overlays to create platform devices for PCI devices that have
     several features that require multiple drivers (Herve Codina)

  Resource management:

   - Enlarge devres table[] to accommodate bridge windows, ROM, IOV
     BARs, etc., and validate BAR index in devres interfaces (Philipp
     Stanner)

   - Fix typo that repeatedly distributed resources to a bridge instead
     of iterating over subordinate bridges, which resulted in too little
     space to assign some BARs (Kai-Heng Feng)

   - Relax bridge window tail sizing for optional resources, e.g., IOV
     BARs, to avoid failures when removing and re-adding devices (Ilpo
     Järvinen)

   - Allow drivers to enable devices even if we haven't assigned
     optional IOV resources to them (Ilpo Järvinen)

   - Rework handling of optional resources (IOV BARs, ROMs) to reduce
     failures if we can't allocate them (Ilpo Järvinen)

   - Fix a NULL dereference in the SR-IOV VF creation error path (Shay
     Drory)

   - Fix s390 mmio_read/write syscalls, which didn't cause page faults
     in some cases, which broke vfio-pci lazy mapping on first access
     (Niklas Schnelle)

   - Add pdev->non_mappable_bars to replace CONFIG_VFIO_PCI_MMAP, which
     was disabled only for s390 (Niklas Schnelle)

   - Support mmap of PCI resources on s390 except for ISM devices
     (Niklas Schnelle)

  ASPM:

   - Delay pcie_link_state deallocation to avoid dangling pointers that
     cause invalid references during hot-unplug (Daniel Stodden)

  Power management:

   - Allow PCI bridges to go to D3Hot when suspending on all non-x86
     systems (Manivannan Sadhasivam)

  Power control:

   - Create pwrctrl devices in pci_scan_device() to make it more
     symmetric with pci_pwrctrl_unregister() and make pwrctrl devices
     for PCI bridges possible (Manivannan Sadhasivam)

   - Unregister pwrctrl devices in pci_destroy_dev() so DOE, ASPM, etc.
     can still access devices after pci_stop_dev() (Manivannan
     Sadhasivam)

   - If there's a pwrctrl device for a PCI device, skip scanning it
     because the pwrctrl core will rescan the bus after the device is
     powered on (Manivannan Sadhasivam)

   - Add a pwrctrl driver for PCI slots based on voltage regulators
     described via devicetree (Manivannan Sadhasivam)

  Bandwidth control:

   - Add set_pcie_speed.sh to TEST_PROGS to fix issue when executing the
     set_pcie_cooling_state.sh test case (Yi Lai)

   - Avoid a NULL pointer dereference when we run out of bus numbers to
     assign for a bridge secondary bus (Lukas Wunner)

  Hotplug:

   - Drop superfluous pci_hotplug_slot_list, try_module_get() calls, and
     NULL pointer checks (Lukas Wunner)

   - Drop shpchp module init/exit logging, replace shpchp dbg() with
     ctrl_dbg(), and remove unused dbg(), err(), info(), warn() wrappers
     (Ilpo Järvinen)

   - Drop 'shpchp_debug' module parameter in favor of standard dynamic
     debugging (Ilpo Järvinen)

   - Drop unused cpcihp .get_power(), .set_power() function pointers
     (Guilherme Giacomo Simoes)

   - Disable hotplug interrupts in portdrv only when pciehp is not
     enabled to avoid issuing two hotplug commands too close together
     (Feng Tang)

   - Skip pciehp 'device replaced' check if the device has been removed
     to address a deadlock when resuming after a device was removed
     during system sleep (Lukas Wunner)

   - Don't enable pciehp hotplug interupt when resuming in poll mode
     (Ilpo Järvinen)

  Virtualization:

   - Fix bugs in 'pci=config_acs=' kernel command line parameter (Tushar
     Dave)

  DOE:

   - Expose supported DOE features via sysfs (Alistair Francis)

   - Allow DOE support to be enabled even if CXL isn't enabled (Alistair
     Francis)

  Endpoint framework:

   - Convert PCI device data so pci-epf-test works correctly on
     big-endian endpoint systems (Niklas Cassel)

   - Add BAR_RESIZABLE type to endpoint framework and add DWC core
     support for EPF drivers to set BAR_RESIZABLE type and size (Niklas
     Cassel)

   - Fix pci-epf-test double free that causes an oops if the host
     reboots and PERST# deassertion restarts endpoint BAR allocation
     (Christian Bruel)

   - Fix endpoint BAR testing so tests can skip disabled BARs instead of
     reporting them as failures (Niklas Cassel)

   - Widen endpoint test BAR size variable to accommodate BARs larger
     than INT_MAX (Niklas Cassel)

   - Remove unused tools 'pci' build target left over after moving tests
     to tools/testing/selftests/pci_endpoint (Jianfeng Liu)

  Altera PCIe controller driver:

   - Add DT binding and driver support for Agilex family (P-Tile,
     F-Tile, R-Tile) (Matthew Gerlach and D M, Sharath Kumar)

  AMD MDB PCIe controller driver:

   - Add DT binding and driver for AMD MDB (Multimedia DMA Bridge)
     (Thippeswamy Havalige)

  Broadcom STB PCIe controller driver:

   - Add BCM2712 MSI-X DT binding and interrupt controller drivers and
     add softdep on irq_bcm2712_mip driver to ensure that it is loaded
     first (Stanimir Varbanov)

   - Expand inbound window map to 64GB so it can accommodate BCM2712
     (Stanimir Varbanov)

   - Add BCM2712 support and DT updates (Stanimir Varbanov)

   - Apply link speed restriction before bringing link up, not after
     (Jim Quinlan)

   - Update Max Link Speed in Link Capabilities via the internal
     writable register, not the read-only config register (Jim Quinlan)

   - Handle regulator_bulk_get() error to avoid panic when we call
     regulator_bulk_free() later (Jim Quinlan)

   - Disable regulators only when removing the bus immediately below a
     Root Port because we don't support regulators deeper in the
     hierarchy (Jim Quinlan)

   - Make const read-only arrays static (Colin Ian King)

  Cadence PCIe endpoint driver:

   - Correct MSG TLP generation so endpoints can generate INTx messages
     (Hans Zhang)

  Freescale i.MX6 PCIe controller driver:

   - Identify the second controller on i.MX8MQ based on devicetree
     'linux,pci-domain' instead of DBI 'reg' address (Richard Zhu)

   - Remove imx_pcie_cpu_addr_fixup() since dwc core can now derive the
     ATU input address (using parent_bus_offset) from devicetree (Frank
     Li)

  Freescale Layerscape PCIe controller driver:

   - Drop deprecated 'num-ib-windows' and 'num-ob-windows' and
     unnecessary 'status' from example (Krzysztof Kozlowski)

   - Correct the syscon_regmap_lookup_by_phandle_args("fsl,pcie-scfg")
     arg_count to fix probe failure on LS1043A (Ioana Ciornei)

  HiSilicon STB PCIe controller driver:

   - Call phy_exit() to clean up if histb_pcie_probe() fails (Christophe
     JAILLET)

  Intel Gateway PCIe controller driver:

   - Remove intel_pcie_cpu_addr() since dwc core can now derive the ATU
     input address (using parent_bus_offset) from devicetree (Frank Li)

  Intel VMD host bridge driver:

   - Convert vmd_dev.cfg_lock from spinlock_t to raw_spinlock_t so
     pci_ops.read() will never sleep, even on PREEMPT_RT where
     spinlock_t becomes a sleepable lock, to avoid calling a sleeping
     function from invalid context (Ryo Takakura)

  MediaTek PCIe Gen3 controller driver:

   - Remove leftover mac_reset assert for Airoha EN7581 SoC (Lorenzo
     Bianconi)

   - Add EN7581 PBUS controller 'mediatek,pbus-csr' DT property and
     program host bridge memory aperture to this syscon node (Lorenzo
     Bianconi)

  Qualcomm PCIe controller driver:

   - Add qcom,pcie-ipq5332 binding (Varadarajan Narayanan)

   - Add qcom i.MX8QM and i.MX8QXP/DXP optional DMA interrupt (Alexander
     Stein)

   - Add optional dma-coherent DT property for Qualcomm SA8775P (Dmitry
     Baryshkov)

   - Make DT iommu property required for SA8775P and prohibited for
     SDX55 (Dmitry Baryshkov)

   - Add DT IOMMU and DMA-related properties for Qualcomm SM8450 (Dmitry
     Baryshkov)

   - Add endpoint DT properties for SAR2130P and enable endpoint mode in
     driver (Dmitry Baryshkov)

   - Describe endpoint BAR0 and BAR2 as 64-bit only and BAR1 and BAR3 as
     RESERVED (Manivannan Sadhasivam)

  Rockchip DesignWare PCIe controller driver:

   - Describe rk3568 and rk3588 BARs as Resizable, not Fixed (Niklas
     Cassel)

  Synopsys DesignWare PCIe controller driver:

   - Add debugfs-based Silicon Debug, Error Injection, Statistical
     Counter support for DWC (Shradha Todi)

   - Add debugfs property to expose LTSSM status of DWC PCIe link (Hans
     Zhang)

   - Add Rockchip support for DWC debugfs features (Niklas Cassel)

   - Add dw_pcie_parent_bus_offset() to look up the parent bus address
     of a specified 'reg' property and return the offset from the CPU
     physical address (Frank Li)

   - Use dw_pcie_parent_bus_offset() to derive CPU -> ATU addr offset
     via 'reg[config]' for host controllers and 'reg[addr_space]' for
     endpoint controllers (Frank Li)

   - Apply struct dw_pcie.parent_bus_offset in ATU users to remove use
     of .cpu_addr_fixup() when programming ATU (Frank Li)

  TI J721E PCIe driver:

   - Correct the 'link down' interrupt bit for J784S4 (Siddharth
     Vadapalli)

  TI Keystone PCIe controller driver:

   - Describe AM65x BARs 2 and 5 as Resizable (not Fixed) and reduce
     alignment requirement from 1MB to 64KB (Niklas Cassel)

  Xilinx Versal CPM PCIe controller driver:

   - Free IRQ domain in probe error path to avoid leaking it
     (Thippeswamy Havalige)

   - Add DT .compatible "xlnx,versal-cpm5nc-host" and driver support for
     Versal Net CPM5NC Root Port controller (Thippeswamy Havalige)

   - Add driver support for CPM5_HOST1 (Thippeswamy Havalige)

  Miscellaneous:

   - Convert fsl,mpc83xx-pcie binding to YAML (J. Neuschäfer)

   - Use for_each_available_child_of_node_scoped() to simplify apple,
     kirin, mediatek, mt7621, tegra drivers (Zhang Zekun)"

* tag 'pci-v6.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (197 commits)
  PCI: layerscape: Fix arg_count to syscon_regmap_lookup_by_phandle_args()
  PCI: j721e: Fix the value of .linkdown_irq_regfield for J784S4
  misc: pci_endpoint_test: Add support for PCITEST_IRQ_TYPE_AUTO
  PCI: endpoint: pci-epf-test: Expose supported IRQ types in CAPS register
  PCI: dw-rockchip: Endpoint mode cannot raise INTx interrupts
  PCI: endpoint: Add intx_capable to epc_features struct
  dt-bindings: PCI: Add common schema for devices accessible through PCI BARs
  PCI: intel-gw: Remove intel_pcie_cpu_addr()
  PCI: imx6: Remove imx_pcie_cpu_addr_fixup()
  PCI: dwc: Use parent_bus_offset to remove need for .cpu_addr_fixup()
  PCI: dwc: ep: Ensure proper iteration over outbound map windows
  PCI: dwc: ep: Use devicetree 'reg[addr_space]' to derive CPU -> ATU addr offset
  PCI: dwc: ep: Consolidate devicetree handling in dw_pcie_ep_get_resources()
  PCI: dwc: ep: Call epc_create() early in dw_pcie_ep_init()
  PCI: dwc: Use devicetree 'reg[config]' to derive CPU -> ATU addr offset
  PCI: dwc: Add dw_pcie_parent_bus_offset() checking and debug
  PCI: dwc: Add dw_pcie_parent_bus_offset()
  PCI/bwctrl: Fix NULL pointer dereference on bus number exhaustion
  PCI: xilinx-cpm: Add cpm_csr register mapping for CPM5_HOST1 variant
  PCI: brcmstb: Make const read-only arrays static
  ...
2025-03-28 19:36:53 -07:00
Bjorn Helgaas
dea140198b Merge branch 'pci/misc'
- Remove unused tools 'pci' build target left over after moving tests to
  tools/testing/selftests/pci_endpoint (Jianfeng Liu)

- Fix typos and whitespace errors (Bjorn Helgaas)

* pci/misc:
  PCI: Fix typos
  tools/Makefile: Remove pci target

# Conflicts:
#	drivers/pci/endpoint/functions/pci-epf-test.c
2025-03-27 13:15:05 -05:00
Bjorn Helgaas
655ea930fe Merge branch 'pci/hotplug'
- Drop shpchp module init/exit logging (Ilpo Järvinen)

- Replace shpchp dbg() with ctrl_dbg() and remove unused dbg(), err(),
  info(), warn() wrappers (Ilpo Järvinen)

- Drop 'shpchp_debug' module parameter in favor of standard dynamic
  debugging (Ilpo Järvinen)

- Drop unused .get_power(), .set_power() function pointers (Guilherme
  Giacomo Simoes)

- Drop superfluous pci_hotplug_slot_list (Lukas Wunner)

- Drop superfluous try_module_get() calls (Lukas Wunner)

- Drop superfluous NULL pointer checks (Lukas Wunner)

- Pass struct hotplug_slot pointers directly to avoid backpointer
  dereferencing in has_*_file() (Lukas Wunner)

- Inline pci_hp_{create,remove}_module_link() to reduce exported symbols
  (Lukas Wunner)

- Disable hotplug interrupts in portdrv only when pciehp is not enabled to
  prevent issuing two hotplug commands too close together (Feng Tang)

- Skip pciehp 'device replaced' check if the device has been removed to
  address a common deadlock when resuming after a device was removed during
  system sleep (Lukas Wunner)

- Don't enable pciehp hotplug interupt when resuming in poll mode (Ilpo
  Järvinen)

* pci/hotplug:
  PCI: pciehp: Don't enable HPIE when resuming in poll mode
  PCI: pciehp: Avoid unnecessary device replacement check
  PCI/portdrv: Only disable pciehp interrupts early when needed
  PCI: hotplug: Inline pci_hp_{create,remove}_module_link()
  PCI: hotplug: Avoid backpointer dereferencing in has_*_file()
  PCI: hotplug: Drop superfluous NULL pointer checks in has_*_file()
  PCI: hotplug: Drop superfluous try_module_get() calls
  PCI: hotplug: Drop superfluous pci_hotplug_slot_list
  PCI: cpcihp: Remove unused .get_power() and .set_power()
  PCI: shpchp: Remove 'shpchp_debug' module parameter
  PCI: shpchp: Remove unused logging wrappers
  PCI: shpchp: Change dbg() -> ctrl_dbg()
  PCI: shpchp: Remove logging from module init/exit functions
2025-03-27 13:14:44 -05:00
Bjorn Helgaas
4d1a2a9244 Merge branch 'pci/bwctrl'
- Add set_pcie_speed.sh to TEST_PROGS to fix issue when executing the
  set_pcie_cooling_state.sh test case (Yi Lai)

- Fix the pcie_bwctrl_select_speed() return value in cases where a
  non-compliant device doesn't advertise valid supported speeds (Ilpo
  Järvinen)

- Avoid a NULL pointer dereference when we run out of bus numbers to assign
  for a bridge secondary bus (Lukas Wunner)

* pci/bwctrl:
  PCI/bwctrl: Fix NULL pointer dereference on bus number exhaustion
  PCI/bwctrl: Fix pcie_bwctrl_select_speed() return type
  selftests/pcie_bwctrl: Add 'set_pcie_speed.sh' to TEST_PROGS
2025-03-27 13:14:42 -05:00
Bjorn Helgaas
2cde6eb252 Merge branch 'pci/aspm'
- Delay pcie_link_state deallocation to avoid dangling pointers that cause
  invalid references during hot-unplug (Daniel Stodden)

* pci/aspm:
  PCI/ASPM: Fix link state exit during switch upstream function removal
2025-03-27 13:14:42 -05:00
Lukas Wunner
667f053b05 PCI/bwctrl: Fix NULL pointer dereference on bus number exhaustion
When BIOS neglects to assign bus numbers to PCI bridges, the kernel
attempts to correct that during PCI device enumeration.  If it runs out
of bus numbers, no pci_bus is allocated and the "subordinate" pointer in
the bridge's pci_dev remains NULL.

The PCIe bandwidth controller erroneously does not check for a NULL
subordinate pointer and dereferences it on probe.

Bandwidth control of unusable devices below the bridge is of questionable
utility, so simply error out instead.  This mirrors what PCIe hotplug does
since commit 62e4492c30 ("PCI: Prevent NULL dereference during pciehp
probe").

The PCI core emits a message with KERN_INFO severity if it has run out of
bus numbers.  PCIe hotplug emits an additional message with KERN_ERR
severity to inform the user that hotplug functionality is disabled at the
bridge.  A similar message for bandwidth control does not seem merited,
given that its only purpose so far is to expose an up-to-date link speed
in sysfs and throttle the link speed on certain laptops with limited
Thermal Design Power.  So error out silently.

User-visible messages:

  pci 0000:16:02.0: bridge configuration invalid ([bus 00-00]), reconfiguring
  [...]
  pci_bus 0000:45: busn_res: [bus 45-74] end is updated to 74
  pci 0000:16:02.0: devices behind bridge are unusable because [bus 45-74] cannot be assigned for them
  [...]
  pcieport 0000:16:02.0: pciehp: Hotplug bridge without secondary bus, ignoring
  [...]
  BUG: kernel NULL pointer dereference
  RIP: pcie_update_link_speed
  pcie_bwnotif_enable
  pcie_bwnotif_probe
  pcie_port_probe_service
  really_probe

Fixes: 665745f274 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller")
Reported-by: Wouter Bijlsma <wouter@wouterbijlsma.nl>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219906
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Tested-by: Wouter Bijlsma <wouter@wouterbijlsma.nl>
Cc: stable@vger.kernel.org # v6.13+
Link: https://lore.kernel.org/r/3b6c8d973aedc48860640a9d75d20528336f1f3c.1742669372.git.lukas@wunner.de
2025-03-23 13:37:24 +00:00
Ilpo Järvinen
026e4bffb0 PCI/bwctrl: Fix pcie_bwctrl_select_speed() return type
pcie_bwctrl_select_speed() should take __fls() of the speed bit, not return
it as a raw value. Instead of directly returning 2.5GT/s speed bit, simply
assign the fallback speed (2.5GT/s) into supported_speeds variable to share
the normal return path that calls pcie_supported_speeds2target_speed() to
calculate __fls().

This code path is not very likely to execute because
pcie_get_supported_speeds() should provide valid ->supported_speeds but a
spec violating device could fail to synthesize any speed in
pcie_get_supported_speeds(). It could also happen in case the
supported_speeds intersection is empty (also a violation of the current
PCIe specs).

Link: https://lore.kernel.org/r/20250321163103.5145-1-ilpo.jarvinen@linux.intel.com
Fixes: de9a6c8d5d ("PCI/bwctrl: Add pcie_set_target_speed() to set PCIe Link Speed")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-21 12:11:38 -05:00
Bjorn Helgaas
f4e026f454 PCI: Fix typos
Fix typos and whitespace errors.

Link: https://lore.kernel.org/r/20250307231715.438518-1-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-08 15:08:45 -06:00
Feng Tang
9d7db4db19 PCI/portdrv: Only disable pciehp interrupts early when needed
Firmware developers reported that Linux issues two PCIe hotplug commands in
very short intervals on an ARM server, which doesn't comply with the PCIe
spec.  According to PCIe r6.1, sec 6.7.3.2, if the Command Completed event
is supported, software must wait for a command to complete or wait at
least 1 second before sending a new command.

In the failure case, the first PCIe hotplug command is from
get_port_device_capability(), which sends a command to disable PCIe hotplug
interrupts without waiting for its completion, and the second command comes
from pcie_enable_notification() of pciehp driver, which enables hotplug
interrupts again.

Fix this by only disabling the hotplug interrupts when the pciehp driver is
not enabled.

Link: https://lore.kernel.org/r/20250303023630.78397-1-feng.tang@linux.alibaba.com
Fixes: 2bd50dd800 ("PCI: PCIe: Disable PCIe port services during port initialization")
Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
2025-03-04 17:17:01 -06:00
Ilpo Järvinen
7e077e6707 PCI/ERR: Handle TLP Log in Flit mode
Flit mode introduced in PCIe r6.0 alters how the TLP Header Log is
presented through AER and DPC Capability registers. The TLP Prefix Log
Register is not present with Flit mode, and the register becomes an
extension of the TLP Header Log (PCIe r6.1 secs 7.8.4.12 & 7.9.14.13).

Adapt pcie_read_tlp_log() and struct pcie_tlp_log to read and store the
extended TLP Header Log when the Link is in Flit mode. As the Prefix Log
and Extended TLP Header are not present at the same time, a C union can be
used.

Determining whether the error occurred while the Link was in Flit mode is a
bit complicated. In case of AER, the Advanced Error Capabilities and
Control Register directly tells whether the error was logged in Flit mode
or not (PCIe r6.1 sec 7.8.4.7). The DPC Capability (PCIe r6.1 sec 7.9.14),
unfortunately, does not contain the same information.

Unlike AER, the DPC Capability does not provide a way to discern whether
the error was logged in Flit mode (this is confirmed by PCI WG to be an
oversight in the spec). DPC will bring the Link down immediately following
an error, which makes it impossible to acquire the Flit Mode Status
directly from the Link Status 2 register because Flit Mode Status is only
set in certain Link states (PCIe r6.1 sec 7.5.3.20). As a workaround, use
the flit_mode value stored into the struct pci_bus.

Link: https://lore.kernel.org/r/20250207161836.2755-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-02-21 17:31:45 -06:00
Ilpo Järvinen
fab874e125 PCI/AER: Descope pci_printk() to aer_printk()
include/linux/pci.h provides low-level pci_printk() interface that is
only used by AER because it needs to print the same message with
different levels depending on the error severity. No other PCI code
uses that functionality and calls pci_<level>() logging functions
directly with the appropriate level.

Descope pci_printk() into AER as aer_printk().

Link: https://lore.kernel.org/r/20241216161012.1774-5-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: retain pci_printk() for now since shpchp still uses it and I
moved those patches to a different branch]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-02-21 17:17:08 -06:00
Daniel Stodden
cbf937dcad PCI/ASPM: Fix link state exit during switch upstream function removal
Before 456d8aa37d ("PCI/ASPM: Disable ASPM on MFD function removal to
avoid use-after-free"), we would free the ASPM link only after the last
function on the bus pertaining to the given link was removed.

That was too late. If function 0 is removed before sibling function,
link->downstream would point to free'd memory after.

After above change, we freed the ASPM parent link state upon any function
removal on the bus pertaining to a given link.

That is too early. If the link is to a PCIe switch with MFD on the upstream
port, then removing functions other than 0 first would free a link which
still remains parent_link to the remaining downstream ports.

The resulting GPFs are especially frequent during hot-unplug, because
pciehp removes devices on the link bus in reverse order.

On that switch, function 0 is the virtual P2P bridge to the internal bus.
Free exactly when function 0 is removed -- before the parent link is
obsolete, but after all subordinate links are gone.

Link: https://lore.kernel.org/r/e12898835f25234561c9d7de4435590d957b85d9.1734924854.git.dns@arista.com
Fixes: 456d8aa37d ("PCI/ASPM: Disable ASPM on MFD function removal to avoid use-after-free")
Signed-off-by: Daniel Stodden <dns@arista.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-02-20 10:05:56 +00:00
Ilpo Järvinen
7507eb3e7b PCI/ASPM: Fix L1SS saving
Commit 1db806ec06 ("PCI/ASPM: Save parent L1SS config in
pci_save_aspm_l1ss_state()") aimed to perform L1SS config save for both the
Upstream Port and its upstream bridge when handling an Upstream Port, which
matches what the L1SS restore side does. However, parent->state_saved can
be set true at an earlier time when the upstream bridge saved other parts
of its state. Then later when attempting to save the L1SS config while
handling the Upstream Port, parent->state_saved is true in
pci_save_aspm_l1ss_state() resulting in early return and skipping saving
bridge's L1SS config because it is assumed to be already saved. Later on
restore, junk is written into L1SS config which causes issues with some
devices.

Remove parent->state_saved check and unconditionally save L1SS config also
for the upstream bridge from an Upstream Port which ought to be harmless
from correctness point of view. With the Upstream Port check now present,
saving the L1SS config more than once for the bridge is no longer a problem
(unlike when the parent->state_saved check got introduced into the fix
during its development).

Link: https://lore.kernel.org/r/20250131152913.2507-1-ilpo.jarvinen@linux.intel.com
Fixes: 1db806ec06 ("PCI/ASPM: Save parent L1SS config in pci_save_aspm_l1ss_state()")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219731
Reported-by: Niklāvs Koļesņikovs <pinkflames.linux@gmail.com>
Reported by: Rafael J. Wysocki <rafael@kernel.org>
Closes: https://lore.kernel.org/r/CAJZ5v0iKmynOQ5vKSQbg1J_FmavwZE-nRONovOZ0mpMVauheWg@mail.gmail.com
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Closes: https://lore.kernel.org/r/d7246feb-4f3f-4d0c-bb64-89566b170671@molgen.mpg.de
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Niklāvs Koļesņikovs <pinkflames.linux@gmail.com>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> # Dell XPS 13 9360
2025-02-06 09:54:38 -06:00
Linus Torvalds
647d69605c Merge tag 'pci-v6.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
 "Enumeration:

   - Batch sizing of multiple BARs while memory decoding is disabled
     instead of disabling/enabling decoding for each BAR individually;
     this optimizes virtualized environments where toggling decoding
     enable is expensive (Alex Williamson)

   - Add host bridge .enable_device() and .disable_device() hooks for
     bridges that need to configure things like Requester ID to StreamID
     mapping when enabling devices (Frank Li)

   - Extend struct pci_ecam_ops with .enable_device() and
     .disable_device() hooks so drivers that use pci_host_common_probe()
     instead of their own .probe() have a way to set the
     .enable_device() callbacks (Marc Zyngier)

   - Drop 'No bus range found' message so we don't complain when DTs
     don't specify the default 'bus-range = <0x00 0xff>' (Bjorn Helgaas)

   - Rename the drivers/pci/of_property.c struct of_pci_range to
     of_pci_range_entry to avoid confusion with the global of_pci_range
     in include/linux/of_address.h (Bjorn Helgaas)

  Driver binding:

   - Update resource request API documentation to encourage callers to
     supply a driver name when requesting resources (Philipp Stanner)

   - Export pci_intx_unmanaged() and pcim_intx() (always managed) so
     callers of pci_intx() (which is sometimes managed) can explicitly
     choose the one they need (Philipp Stanner)

   - Convert drivers from pci_intx() to always-managed pcim_intx() or
     never-managed pci_intx_unmanaged(): amd_sfh, ata (ahci, ata_piix,
     pata_rdc, sata_sil24, sata_sis, sata_uli, sata_vsc), bnx2x, bna,
     ntb, qtnfmac, rtsx, tifm_7xx1, vfio, xen-pciback (Philipp Stanner)

   - Remove pci_intx_unmanaged() since pci_intx() is now always
     unmanaged and pcim_intx() is always managed (Philipp Stanner)

  Error handling:

   - Unexport pcie_read_tlp_log() to encourage drivers to use PCI core
     logging rather than building their own (Ilpo Järvinen)

   - Move TLP Log handling to its own file (Ilpo Järvinen)

   - Store number of supported End-End TLP Prefixes always so we can
     read the correct number of DWORDs from the TLP Prefix Log (Ilpo
     Järvinen)

   - Read TLP Prefixes in addition to the Header Log in
     pcie_read_tlp_log() (Ilpo Järvinen)

   - Add pcie_print_tlp_log() to consolidate printing of TLP Header and
     Prefix Log (Ilpo Järvinen)

   - Quirk the Intel Raptor Lake-P PIO log size to accommodate vendor
     BIOSes that don't configure it correctly (Takashi Iwai)

  ASPM:

   - Save parent L1 PM Substates config so when we restore it along with
     an endpoint's config, the parent info isn't junk (Jian-Hong Pan)

  Power management:

   - Avoid D3 for Root Ports on TUXEDO Sirius Gen1 with old BIOS because
     the system can't wake up from suspend (Werner Sembach)

  Endpoint framework:

   - Destroy the EPC device in devm_pci_epc_destroy(), which previously
     didn't call devres_release() (Zijun Hu)

   - Finish virtual EP removal in pci_epf_remove_vepf(), which
     previously caused a subsequent pci_epf_add_vepf() to fail with
     -EBUSY (Zijun Hu)

   - Write BAR_MASK before iATU registers in pci_epc_set_bar() so we
     don't depend on the BAR_MASK reset value being larger than the
     requested BAR size (Niklas Cassel)

   - Prevent changing BAR size/flags in pci_epc_set_bar() to prevent
     reads from bypassing the iATU if we reduced the BAR size (Niklas
     Cassel)

   - Verify address alignment when programming iATU so we don't attempt
     to write bits that are read-only because of the BAR size, which
     could lead to directing accesses to the wrong address (Niklas
     Cassel)

   - Implement artpec6 pci_epc_features so we can rely on all drivers
     supporting it so we can use it in EPC core code (Niklas Cassel)

   - Check for BARs of fixed size to prevent endpoint drivers from
     trying to change their size (Niklas Cassel)

   - Verify that requested BAR size is a power of two when endpoint
     driver sets the BAR (Niklas Cassel)

  Endpoint framework tests:

   - Clear pci-epf-test dma_chan_rx, not dma_chan_tx, after freeing
     dma_chan_rx (Mohamed Khalfella)

   - Correct the DMA MEMCPY test so it doesn't fail if the Endpoint
     supports both DMA_PRIVATE and DMA_MEMCPY (Manivannan Sadhasivam)

   - Add pci-epf-test and pci_endpoint_test support for capabilities
     (Niklas Cassel)

   - Add Endpoint test for consecutive BARs (Niklas Cassel)

   - Remove redundant comparison from Endpoint BAR test because a > 1MB
     BAR can always be exactly covered by iterating with a 1MB buffer
     (Hans Zhang)

   - Move and convert PCI Endpoint tests from tools/pci to Kselftests
     (Manivannan Sadhasivam)

  Apple PCIe controller driver:

   - Convert StreamID mapping configuration from a bus notifier to the
     .enable_device() and .disable_device() callbacks (Marc Zyngier)

  Freescale i.MX6 PCIe controller driver:

   - Add Requester ID to StreamID mapping configuration when enabling
     devices (Frank Li)

   - Use DWC core suspend/resume functions for imx6 (Frank Li)

   - Add suspend/resume support for i.MX8MQ, i.MX8Q, and i.MX95 (Richard
     Zhu)

   - Add DT compatible string 'fsl,imx8q-pcie-ep' and driver support for
     i.MX8Q series (i.MX8QM, i.MX8QXP, and i.MX8DXL) Endpoints (Frank
     Li)

   - Add DT binding for optional i.MX95 Refclk and driver support to
     enable it if the platform hasn't enabled it (Richard Zhu)

   - Configure PHY based on controller being in Root Complex or Endpoint
     mode (Frank Li)

   - Rely on dbi2 and iATU base addresses from DT via
     dw_pcie_get_resources() instead of hardcoding them (Richard Zhu)

   - Deassert apps_reset in imx_pcie_deassert_core_reset() since it is
     asserted in imx_pcie_assert_core_reset() (Richard Zhu)

   - Add missing reference clock enable or disable logic for IMX6SX,
     IMX7D, IMX8MM (Richard Zhu)

   - Remove redundant imx7d_pcie_init_phy() since
     imx7d_pcie_enable_ref_clk() does the same thing (Richard Zhu)

  Freescale Layerscape PCIe controller driver:

   - Simplify by using syscon_regmap_lookup_by_phandle_args() instead
     of syscon_regmap_lookup_by_phandle() followed by
     of_property_read_u32_array() (Krzysztof Kozlowski)

  Marvell MVEBU PCIe controller driver:

   - Add MODULE_DEVICE_TABLE() to enable module autoloading (Liao Chen)

  MediaTek PCIe Gen3 controller driver:

   - Use clk_bulk_prepare_enable() instead of separate
     clk_bulk_prepare() and clk_bulk_enable() (Lorenzo Bianconi)

   - Rearrange reset assert/deassert so they're both done in the
     *_power_up() callbacks (Lorenzo Bianconi)

   - Document that Airoha EN7581 requires PHY init and power-on before
     PHY reset deassert, unlike other MediaTek Gen3 controllers (Lorenzo
     Bianconi)

   - Move Airoha EN7581 post-reset delay from the en7581 clock .enable()
     method to mtk_pcie_en7581_power_up() (Lorenzo Bianconi)

   - Sleep instead of delay during Airoha EN7581 power-up, since this is
     a non-atomic context (Lorenzo Bianconi)

   - Skip PERST# assertion on Airoha EN7581 during probe and
     suspend/resume to avoid a hardware defect (Lorenzo Bianconi)

   - Enable async probe to reduce system startup time (Douglas Anderson)

  Microchip PolarFlare PCIe controller driver:

   - Set up the inbound address translation based on whether the
     platform allows coherent or non-coherent DMA (Daire McNamara)

   - Update DT binding such that platforms are DMA-coherent by default
     and must specify 'dma-noncoherent' if needed (Conor Dooley)

  Mobiveil PCIe controller driver:

   - Convert mobiveil-pcie.txt to YAML and update 'interrupt-names'
     and 'reg-names' (Frank Li)

  Qualcomm PCIe controller driver:

   - Add DT SM8550 and SM8650 optional 'global' interrupt for link
     events (Neil Armstrong)

   - Add DT 'compatible' strings for IPQ5424 PCIe controller (Manikanta
     Mylavarapu)

   - If 'global' IRQ is supported for detection of Link Up events, tell
     DWC core not to wait for link up (Krishna chaitanya chundru)

  Renesas R-Car PCIe controller driver:

   - Avoid passing stack buffer as resource name (King Dix)

  Rockchip PCIe controller driver:

   - Simplify clock and reset handling by using bulk interfaces (Anand
     Moon)

   - Pass typed rockchip_pcie (not void) pointer to
     rockchip_pcie_disable_clocks() (Anand Moon)

   - Return -ENOMEM, not success, when pci_epc_mem_alloc_addr() fails
     (Dan Carpenter)

  Rockchip DesignWare PCIe controller driver:

   - Use dll_link_up IRQ to detect Link Up and enumerate devices so
     users don't have to manually rescan (Niklas Cassel)

   - Tell DWC core not to wait for link up since the 'sys' interrupt is
     required and detects Link Up events (Niklas Cassel)

  Synopsys DesignWare PCIe controller driver:

   - Don't wait for link up in DWC core if driver can detect Link Up
     event (Krishna chaitanya chundru)

   - Update ICC and OPP votes after Link Up events (Krishna chaitanya
     chundru)

   - Always stop link in dw_pcie_suspend_noirq(), which is required at
     least for i.MX8QM to re-establish link on resume (Richard Zhu)

   - Drop racy and unnecessary LTSSM state check before sending
     PME_TURN_OFF message in dw_pcie_suspend_noirq() (Richard Zhu)

   - Add struct of_pci_range.parent_bus_addr for devices that need their
     immediate parent bus address, not the CPU address, e.g., to program
     an internal Address Translation Unit (iATU) (Frank Li)

  TI DRA7xx PCIe controller driver:

   - Simplify by using syscon_regmap_lookup_by_phandle_args() instead of
     syscon_regmap_lookup_by_phandle() followed by
     of_parse_phandle_with_fixed_args() or of_property_read_u32_index()
     (Krzysztof Kozlowski)

  Xilinx Versal CPM PCIe controller driver:

   - Add DT binding and driver support for Xilinx Versal CPM5
     (Thippeswamy Havalige)

  MicroSemi Switchtec management driver:

   - Add Microchip PCI100X device IDs (Rakesh Babu Saladi)

  Miscellaneous:

   - Move reset related sysfs code from pci.c to pci-sysfs.c where other
     similar code lives (Ilpo Järvinen)

   - Simplify reset_method_store() memory management by using __free()
     instead of explicit kfree() cleanup (Ilpo Järvinen)

   - Constify struct bin_attribute for sysfs, VPD, P2PDMA, and the IBM
     ACPI hotplug driver (Thomas Weißschuh)

   - Remove redundant PCI_VSEC_HDR and PCI_VSEC_HDR_LEN_SHIFT (Dongdong
     Zhang)

   - Correct documentation of the 'config_acs=' kernel parameter
     (Akihiko Odaki)"

* tag 'pci-v6.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (111 commits)
  PCI: Batch BAR sizing operations
  dt-bindings: PCI: microchip,pcie-host: Allow dma-noncoherent
  PCI: microchip: Set inbound address translation for coherent or non-coherent mode
  Documentation: Fix pci=config_acs= example
  PCI: Remove redundant PCI_VSEC_HDR and PCI_VSEC_HDR_LEN_SHIFT
  PCI: Don't include 'pm_wakeup.h' directly
  selftests: pci_endpoint: Migrate to Kselftest framework
  selftests: Move PCI Endpoint tests from tools/pci to Kselftests
  misc: pci_endpoint_test: Fix IOCTL return value
  dt-bindings: PCI: qcom: Document the IPQ5424 PCIe controller
  dt-bindings: PCI: qcom,pcie-sm8550: Document 'global' interrupt
  dt-bindings: PCI: mobiveil: Convert mobiveil-pcie.txt to YAML
  PCI: switchtec: Add Microchip PCI100X device IDs
  misc: pci_endpoint_test: Remove redundant 'remainder' test
  misc: pci_endpoint_test: Add consecutive BAR test
  misc: pci_endpoint_test: Add support for capabilities
  PCI: endpoint: pci-epf-test: Add support for capabilities
  PCI: endpoint: pci-epf-test: Fix check for DMA MEMCPY test
  PCI: endpoint: pci-epf-test: Set dma_chan_rx pointer to NULL on error
  PCI: dwc: Simplify config resource lookup
  ...
2025-01-25 16:03:40 -08:00
Bjorn Helgaas
f4a09274c5 Merge branch 'pci/err'
- Unexport pcie_read_tlp_log() to encourage drivers to use PCI core logging
  rather than building their own (Ilpo Järvinen)

- Move TLP Log handling to its own file (Ilpo Järvinen)

- Add #defines for TLP Header/Prefix log sizes (Ilpo Järvinen)

- Store number of supported End-End TLP Prefixes always so we can read the
  correct number of DWORDs from the TLP Prefix Log (Ilpo Järvinen)

- Read TLP Prefixes in addition to the Header Log in pcie_read_tlp_log()
  (Ilpo Järvinen)

- Add pcie_print_tlp_log() to consolidate printing of TLP Header and Prefix
  Log (Ilpo Järvinen)

* pci/err:
  PCI: Add pcie_print_tlp_log() to print TLP Header and Prefix Log
  PCI: Add TLP Prefix reading to pcie_read_tlp_log()
  PCI: Store number of supported End-End TLP Prefixes
  PCI: Use unsigned int i in pcie_read_tlp_log()
  PCI: Use same names in pcie_read_tlp_log() prototype and definition
  PCI: Add defines for TLP Header/Prefix log sizes
  PCI: Move TLP Log handling to its own file
  PCI: Don't expose pcie_read_tlp_log() outside PCI subsystem
2025-01-23 13:04:50 -06:00
Ilpo Järvinen
f68ea779d9 PCI: Add pcie_print_tlp_log() to print TLP Header and Prefix Log
Add pcie_print_tlp_log() to print TLP Header and Prefix Log.  Print End-End
Prefixes only if they are non-zero.

Consolidate the few places which currently print TLP using custom
formatting.

Link: https://lore.kernel.org/r/20250114170840.1633-9-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-01-16 12:05:43 -06:00
Ilpo Järvinen
ad41ddeeac PCI: Add TLP Prefix reading to pcie_read_tlp_log()
pcie_read_tlp_log() handles only 4 Header Log DWORDs but TLP Prefix Log
(PCIe r6.1 secs 7.8.4.12 & 7.9.14.13) may also be present.

Generalize pcie_read_tlp_log() and struct pcie_tlp_log to also handle TLP
Prefix Log. The relevant registers are formatted identically in AER and DPC
Capability, but has these variations:

  a) The offsets of TLP Prefix Log registers vary.

  b) DPC RP PIO TLP Prefix Log register can be < 4 DWORDs.

  c) AER TLP Prefix Log Present (PCIe r6.1 sec 7.8.4.7) can indicate Prefix
     Log is not present.

Therefore callers must pass the offset of the TLP Prefix Log register and
the entire length to pcie_read_tlp_log() to be able to read the correct
number of TLP Prefix DWORDs from the correct offset.

Link: https://lore.kernel.org/r/20250114170840.1633-8-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: squash ternary fix from
https://lore.kernel.org/r/20250116172019.88116-1-colin.i.king@gmail.com]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-01-16 12:04:38 -06:00
Ilpo Järvinen
27d41818b6 PCI: Use unsigned int i in pcie_read_tlp_log()
Loop variable i counting from 0 upwards does not need to be signed so make
it unsigned int.

Link: https://lore.kernel.org/r/20250114170840.1633-6-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-01-14 17:47:34 -06:00
Ilpo Järvinen
dbfd51ba4c PCI: Use same names in pcie_read_tlp_log() prototype and definition
pcie_read_tlp_log()'s prototype and function signature diverged due to
changes made while applying.

Make the parameters of pcie_read_tlp_log() named identically.

Link: https://lore.kernel.org/r/20250114170840.1633-5-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
2025-01-14 17:46:36 -06:00
Ilpo Järvinen
ede5d5dbef PCI: Add defines for TLP Header/Prefix log sizes
Add defines for AER and DPC capabilities TLP Header Logging register sizes
(PCIe r6.2, sec 7.8.4 / 7.9.14) and replace literals with them.

Link: https://lore.kernel.org/r/20250114170840.1633-4-ilpo.jarvinen@linux.intel.com
Suggested-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-01-14 17:39:04 -06:00
Ilpo Järvinen
a71c59269e PCI: Move TLP Log handling to its own file
TLP Log is a PCIe feature and is processed only by AER and DPC.
Configwise, DPC depends AER being enabled. In lack of better place, the TLP
Log handling code was initially placed into pci.c but it can be easily
placed in a separate file.

Move TLP Log handling code to its own file under pcie/ subdirectory and
include it only when AER is enabled.

Link: https://lore.kernel.org/r/20250114170840.1633-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
2025-01-14 17:38:18 -06:00
Linus Torvalds
7f5b6a8ec1 Merge tag 'pci-v6.13-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fix from Bjorn Helgaas:

 - Prevent bwctrl NULL pointer dereference that caused hangs on shutdown
   on ASUS ROG Strix SCAR 17 G733PYV (Lukas Wunner)

* tag 'pci-v6.13-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  PCI/bwctrl: Fix NULL pointer deref on unbind and bind
2025-01-14 11:32:14 -08:00
Lukas Wunner
15b8968dcb PCI/bwctrl: Fix NULL pointer deref on unbind and bind
The interrupt handler for bandwidth notifications, pcie_bwnotif_irq(),
dereferences a "data" pointer.

On unbind, that pointer is set to NULL by pcie_bwnotif_remove().  However
the interrupt handler may still be invoked afterwards and will dereference
that NULL pointer.

That's because the interrupt is requested using a devm_*() helper and the
driver core releases devm_*() resources *after* calling ->remove().

pcie_bwnotif_remove() does clear the Link Bandwidth Management Interrupt
Enable and Link Autonomous Bandwidth Interrupt Enable bits in the Link
Control Register, but that won't prevent execution of pcie_bwnotif_irq():
The interrupt for bandwidth notifications may be shared with AER, DPC,
PME, and hotplug.  So pcie_bwnotif_irq() may be executed as long as the
interrupt is requested.

There's a similar race on bind:  pcie_bwnotif_probe() requests the
interrupt when the "data" pointer still points to NULL.  A NULL pointer
deref may thus likewise occur if AER, DPC, PME or hotplug raise an
interrupt in-between the bandwidth controller's call to devm_request_irq()
and assignment of the "data" pointer.

Drop the devm_*() usage and reorder requesting of the interrupt to fix the
issue.

While at it, drop a stray but harmless no_free_ptr() invocation when
assigning the "data" pointer in pcie_bwnotif_probe().

Ilpo points out that the locking on unbind and bind needs to be symmetric,
so move the call to pcie_bwnotif_disable() inside the critical section
protected by pcie_bwctrl_setspeed_rwsem and pcie_bwctrl_lbms_rwsem.

Evert reports a hang on shutdown of an ASUS ROG Strix SCAR 17 G733PYV.
The issue is no longer reproducible with the present commit.

Evert found that attaching a USB-C monitor prevented the hang.  The
machine contains an ASMedia USB 3.2 controller below a hotplug-capable
Root Port.  So one possible explanation is that the controller gets
hot-removed on shutdown unless something is connected.  And the ensuing
hotplug interrupt occurs exactly when the bandwidth controller is
unregistering.  The precise cause could not be determined because the
screen had already turned black when the hang occurred.

Link: https://lore.kernel.org/r/ae2b02c9cfbefff475b6e132b0aa962aaccbd7b2.1736162539.git.lukas@wunner.de
Fixes: 665745f274 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller")
Reported-by: Evert Vorster <evorster@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219629
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Evert Vorster <evorster@gmail.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-01-07 14:24:06 -06:00
Linus Torvalds
a99b4a369a Merge tag 'pci-v6.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI fixes from Krzysztof Wilczyński:
 "Two small patches that are important for fixing boot time hang on
  Intel JHL7540 'Titan Ridge' platforms equipped with a Thunderbolt
  controller.

  The boot time issue manifests itself when a PCI Express bandwidth
  control is unnecessarily enabled on the Thunderbolt controller
  downstream ports, which only supports a link speed of 2.5 GT/s in
  accordance with USB4 v2 specification (p. 671, sec. 11.2.1, "PCIe
  Physical Layer Logical Sub-block").

  As such, there is no need to enable bandwidth control on such
  downstream port links, which also works around the issue.

  Both patches were tested by the original reporter on the hardware on
  which the failure origin golly manifested itself. Both fixes were
  proven to resolve the reported boot hang issue, and both patches have
  been in linux-next this week with no reported problems"

* tag 'pci-v6.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  PCI/bwctrl: Enable only if more than one speed is supported
  PCI: Honor Max Link Speed when determining supported speeds
2024-12-21 10:51:04 -08:00