The interrupt line of PCI devices is interpreted as edge-triggered,
however the interrupt signal of the m_can controller integrated in Intel
Elkhart Lake CPUs appears to be generated level-triggered.
Consider the following sequence of events:
- IR register is read, interrupt X is set
- A new interrupt Y is triggered in the m_can controller
- IR register is written to acknowledge interrupt X. Y remains set in IR
As at no point in this sequence no interrupt flag is set in IR, the
m_can interrupt line will never become deasserted, and no edge will ever
be observed to trigger another run of the ISR. This was observed to
result in the TX queue of the EHL m_can to get stuck under high load,
because frames were queued to the hardware in m_can_start_xmit(), but
m_can_finish_tx() was never run to account for their successful
transmission.
On an Elkhart Lake based board with the two CAN interfaces connected to
each other, the following script can reproduce the issue:
ip link set can0 up type can bitrate 1000000
ip link set can1 up type can bitrate 1000000
cangen can0 -g 2 -I 000 -L 8 &
cangen can0 -g 2 -I 001 -L 8 &
cangen can0 -g 2 -I 002 -L 8 &
cangen can0 -g 2 -I 003 -L 8 &
cangen can0 -g 2 -I 004 -L 8 &
cangen can0 -g 2 -I 005 -L 8 &
cangen can0 -g 2 -I 006 -L 8 &
cangen can0 -g 2 -I 007 -L 8 &
cangen can1 -g 2 -I 100 -L 8 &
cangen can1 -g 2 -I 101 -L 8 &
cangen can1 -g 2 -I 102 -L 8 &
cangen can1 -g 2 -I 103 -L 8 &
cangen can1 -g 2 -I 104 -L 8 &
cangen can1 -g 2 -I 105 -L 8 &
cangen can1 -g 2 -I 106 -L 8 &
cangen can1 -g 2 -I 107 -L 8 &
stress-ng --matrix 0 &
To fix the issue, repeatedly read and acknowledge interrupts at the
start of the ISR until no interrupt flags are set, so the next incoming
interrupt will also result in an edge on the interrupt line.
While we have received a report that even with this patch, the TX queue
can become stuck under certain (currently unknown) circumstances on the
Elkhart Lake, this patch completely fixes the issue with the above
reproducer, and it is unclear whether the remaining issue has a similar
cause at all.
Fixes: cab7ffc032 ("can: m_can: add PCI glue driver for Intel Elkhart Lake")
Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Reviewed-by: Markus Schneider-Pargmann <msp@baylibre.com>
Link: https://patch.msgid.link/fdf0439c51bcb3a46c21e9fb21c7f1d06363be84.1728288535.git.matthias.schiffer@ew.tq-group.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Add a flag to the device class structure that leaves the chip in a
running state with rx interrupt enabled, so that an m_can device driver
can configure and use the interrupt as a wakeup source.
Signed-off-by: Martin Hundebøll <martin@geanix.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This reverts commit 0e8ffdf3b8.
Commit 0e8ffdf3b8 ("can: m_can: pci: use custom bit timings for
Elkhart Lake") broke the test case using bitrate switching.
| ip link set can0 up type can bitrate 500000 dbitrate 4000000 fd on
| ip link set can1 up type can bitrate 500000 dbitrate 4000000 fd on
| candump can0 &
| cangen can1 -I 0x800 -L 64 -e -fb \
| -D 11223344deadbeef55667788feedf00daabbccdd44332211 -n 1 -v -v
Above commit does everything correctly according to the datasheet.
However datasheet wasn't correct.
I got confirmation from hardware engineers that the actual CAN
hardware on Intel Elkhart Lake is based on M_CAN version v3.2.0.
Datasheet was mirroring values from an another specification which was
based on earlier M_CAN version leading to wrong bit timings.
Therefore revert the commit and switch back to common bit timings.
Fixes: ea4c178768 ("can: m_can: pci: use custom bit timings for Elkhart Lake")
Link: https://lore.kernel.org/all/20220512124144.536850-1-jarkko.nikula@linux.intel.com
Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Reported-by: Chee Hou Ong <chee.houx.ong@intel.com>
Reported-by: Aman Kumar <aman.kumar@intel.com>
Reported-by: Pallavi Kumari <kumari.pallavi@intel.com>
Cc: <stable@vger.kernel.org> # v5.16+
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
If FIFO reads or writes fail due to the underlying regmap (e.g., SPI)
I/O, propagate that up to the m_can driver, log an error, and disable
interrupts, similar to the mcp251xfd driver.
While reworking the FIFO functions to add this error handling,
add support for bulk reads and writes of multiple registers.
Link: https://lore.kernel.org/r/20210817050853.14875-2-matt@bitbashing.io
Signed-off-by: Matt Kline <matt@bitbashing.io>
[mkl: re-wrap long lines, remove WARN_ON, convert to netdev block comments]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The m_can driver's suspend and resume functions (m_can_class_suspend() and
m_can_class_resume()) make use of dev_get_drvdata() and assume that the drvdata
is a pointer to the struct net_device.
With upcoming conversion of the tcan4x5x driver to pm_runtime this assumption
is no longer valid. As the suspend and resume functions actually need a struct
m_can_classdev pointer, change the m_can_platform and the m_can_pci driver to
hold a pointer to struct m_can_classdev instead, as the tcan4x5x driver already
does.
Link: https://lore.kernel.org/r/20201212175518.139651-8-mkl@pengutronix.de
Reviewed-by: Sean Nyekjaer <sean@geanix.com>
Reviewed-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Add support for M_CAN controller on Intel Elkhart Lake attached to the PCI bus.
It integrates the Bosch M_CAN controller with Message RAM and the wrapper IP
block with additional registers which all of them are within the same MMIO
range.
Currently only interrupt control register from wrapper IP is used and the MRAM
configuration is expected to come from the firmware via "bosch,mram-cfg" device
property and parsed by m_can.c core.
Initial implementation is done by Felipe Balbi while he was working at Intel
with later changes from Raymond Tan and me.
Co-developed-by: Felipe Balbi (Intel) <balbi@kernel.org>
Co-developed-by: Raymond Tan <raymond.tan@intel.com>
Signed-off-by: Felipe Balbi (Intel) <balbi@kernel.org>
Signed-off-by: Raymond Tan <raymond.tan@intel.com>
Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Link: https://lore.kernel.org/r/20201117160827.3636264-1-jarkko.nikula@linux.intel.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>