When the mcp251xfd_start_xmit() function fails, the driver stops
processing messages, and the interrupt routine does not return,
running indefinitely even after killing the running application.
Error messages:
[ 441.298819] mcp251xfd spi2.0 can0: ERROR in mcp251xfd_start_xmit: -16
[ 441.306498] mcp251xfd spi2.0 can0: Transmit Event FIFO buffer not empty. (seq=0x000017c7, tef_tail=0x000017cf, tef_head=0x000017d0, tx_head=0x000017d3).
... and repeat forever.
The issue can be triggered when multiple devices share the same SPI
interface. And there is concurrent access to the bus.
The problem occurs because tx_ring->head increments even if
mcp251xfd_start_xmit() fails. Consequently, the driver skips one TX
package while still expecting a response in
mcp251xfd_handle_tefif_one().
Resolve the issue by starting a workqueue to write the tx obj
synchronously if err = -EBUSY. In case of another error, decrement
tx_ring->head, remove skb from the echo stack, and drop the message.
Fixes: 55e5b97f00 ("can: mcp25xxfd: add driver for Microchip MCP25xxFD SPI CAN")
Cc: stable@vger.kernel.org
Signed-off-by: Vitor Soares <vitor.soares@toradex.com>
Link: https://lore.kernel.org/all/20240517134355.770777-1-ivitro@gmail.com
[mkl: use more imperative wording in patch description]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
In a future patch HAS_IOPORT=n will disable inb()/outb() and friends at
compile time. We thus need to add HAS_IOPORT as dependency for
those drivers requiring them. For the DEFXX driver the use of I/O
ports is optional and we only need to fence specific code paths. It also
turns out that with HAS_IOPORT handled explicitly HAMRADIO does not need
the !S390 dependency and successfully builds the bpqether driver.
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Maciej W. Rozycki <macro@orcam.me.uk>
Co-developed-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We want to be able to run rtnl_fill_ifinfo() under RCU protection
instead of RTNL in the future.
This patch prepares dev_get_iflink() and nla_put_iflink()
to run either with RTNL or RCU held.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marc Kleine-Budde says:
====================
pull-request: can-next 2024-02-20
this is a pull request of 9 patches for net-next/master.
The first patch is by Francesco Dolcini and removes a redundant check
for pm_clock_support from the m_can driver.
Martin Hundebøll contributes 3 patches to the m_can/tcan4x5x driver to
allow resume upon RX of a CAN frame.
3 patches by Srinivas Goud add support for ECC statistics to the
xilinx_can driver.
The last 2 patches are by Oliver Hartkopp and me, target the CAN RAW
protocol and fix an error in the getsockopt() for CAN-XL introduced in
the previous pull request to net-next (linux-can-next-for-6.9-20240213).
linux-can-next-for-6.9-20240220
* tag 'linux-can-next-for-6.9-20240220' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
can: raw: raw_getsockopt(): reduce scope of err
can: raw: fix getsockopt() for new CAN_RAW_XL_VCID_OPTS
can: xilinx_can: Add ethtool stats interface for ECC errors
can: xilinx_can: Add ECC support
dt-bindings: can: xilinx_can: Add 'xlnx,has-ecc' optional property
can: tcan4x5x: support resuming from rx interrupt signal
can: m_can: allow keeping the transceiver running in suspend
dt-bindings: can: tcan4x5x: Document the wakeup-source flag
can: m_can: remove redundant check for pm_clock_support
====================
Link: https://lore.kernel.org/r/20240220085130.2936533-1-mkl@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Implement the "wakeup-source" device tree property, so the chip is left
running when suspending, and its rx interrupt is used as a wakeup source
to resume operation.
Signed-off-by: Martin Hundebøll <martin@geanix.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Add a flag to the device class structure that leaves the chip in a
running state with rx interrupt enabled, so that an m_can device driver
can configure and use the interrupt as a wakeup source.
Signed-off-by: Martin Hundebøll <martin@geanix.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
m_can_clk_start() already skip starting the clock when
clock support is disabled, remove the redundant check in
m_can_class_register().
This also solves the imbalance with m_can_clk_stop() that is called
afterward in the same function before the return.
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Reviewed-by: Markus Schneider-Pargmann <msp@baylibre.com>
Link: https://lore.kernel.org/all/20240104235723.46931-1-francesco@dolcini.it
[mkl: rebased to net-next/main]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
m_can supports submitting multiple transmits with one register write.
This is an interesting option to reduce the number of SPI transfers for
peripheral chips.
The m_can_tx_op is extended with a bool that signals if it is the last
transmission and the submit should be executed immediately.
The worker then writes the skb to the FIFO and submits it only if the
submit bool is set. If it isn't set, the worker will write the next skb
which is waiting in the workqueue to the FIFO, etc.
Signed-off-by: Markus Schneider-Pargmann <msp@baylibre.com>
Link: https://lore.kernel.org/all/20240207093220.2681425-15-msp@baylibre.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The network queue is currently always stopped in start_xmit and
continued in the interrupt handler. This is not possible anymore if we
want to keep multiple transmits in flight in parallel.
Use the previously introduced tx_fifo_in_flight counter to control the
network queue instead. This has the benefit of not needing to ask the
hardware about fifo status.
This patch stops the network queue in start_xmit if the number of
transmits in flight reaches the size of the fifo and wakes up the queue
from the interrupt handler once the transmits in flight drops below the
fifo size. This means any skbs over the limit will be rejected
immediately in start_xmit (it shouldn't be possible at all to reach that
state anyways).
The maximum number of transmits in flight is the size of the fifo.
Signed-off-by: Markus Schneider-Pargmann <msp@baylibre.com>
Link: https://lore.kernel.org/all/20240207093220.2681425-13-msp@baylibre.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Keep track of the number of transmits in flight.
This patch prepares the driver to control the network interface queue
based on this counter. By itself this counter be
implemented with an atomic, but as we need to do other things in the
critical sections later I am using a spinlock instead.
Signed-off-by: Markus Schneider-Pargmann <msp@baylibre.com>
Link: https://lore.kernel.org/all/20240207093220.2681425-12-msp@baylibre.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The current implementation uses the workqueue for peripheral chips to
submit work. Only a single work item is queued and used at any time.
To be able to keep more than one transmit in flight at a time, prepare
the workqueue to support multiple transmits at the same time.
Each work item now has a separate storage for a skb and a pointer to
cdev. This assures that each workitem can be processed individually.
The workqueue is replaced by an ordered workqueue which makes sure that
only a single worker processes the items queued on the workqueue. Also
items are ordered by the order they were enqueued. This removes most of
the concurrency the workqueue normally offers. It is not necessary for
this driver.
The cleanup functions have to be adopted a bit to handle this new
mechanism.
Signed-off-by: Markus Schneider-Pargmann <msp@baylibre.com>
Link: https://lore.kernel.org/all/20240207093220.2681425-11-msp@baylibre.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Add TX support to get/set functions for ethtool coalescing.
tx-frames-irq and tx-usecs-irq can only be set/unset together.
tx-frames-irq needs to be less than TXE and TXB.
As rx and tx share the same timer, rx-usecs-irq and tx-usecs-irq can be
enabled/disabled individually but they need to have the same value if
enabled.
Polling is excluded from TX irq coalescing.
Signed-off-by: Markus Schneider-Pargmann <msp@baylibre.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/all/20240207093220.2681425-8-msp@baylibre.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Add the possibility to set coalescing parameters with ethtool.
rx-frames-irq and rx-usecs-irq can only be set and unset together as the
implemented mechanism would not work otherwise. rx-frames-irq can't be
greater than the RX FIFO size.
Also all values can only be changed if the chip is not active.
Polling is excluded from irq coalescing support.
Signed-off-by: Markus Schneider-Pargmann <msp@baylibre.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/all/20240207093220.2681425-7-msp@baylibre.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Extend the coalescing implementation for transmits.
In normal mode the chip raises an interrupt for every finished transmit.
This implementation switches to coalescing mode as soon as an interrupt
handled a transmit. For coalescing the watermark level interrupt is used
to interrupt exactly after x frames were sent. It switches back into
normal mode once there was an interrupt with no finished transmit and
the timer being inactive.
The timer is shared with receive coalescing. The time for receive and
transmit coalescing timers have to be the same for that to work. The
benefit is to have only a single running timer.
Signed-off-by: Markus Schneider-Pargmann <msp@baylibre.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/all/20240207093220.2681425-6-msp@baylibre.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
m_can offers the possibility to set an interrupt on reaching a watermark
level in the receive FIFO. This can be used to implement coalescing.
Unfortunately there is no hardware timeout available to trigger an
interrupt if only a few messages were received within a given time. To
solve this I am using a hrtimer to wake up the irq thread after x
microseconds.
The timer is always started if receive coalescing is enabled and new
received frames were available during an interrupt. The timer is stopped
if during a interrupt handling no new data was available.
If the timer is started the new item interrupt is disabled and the
watermark interrupt takes over. If the timer is not started again, the
new item interrupt is enabled again, notifying the handler about every
new item received.
Signed-off-by: Markus Schneider-Pargmann <msp@baylibre.com>
Link: https://lore.kernel.org/all/20240207093220.2681425-5-msp@baylibre.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Use preferred device_get_match_data() instead of of_match_device() to
get the driver match data. With this, adjust the includes to explicitly
include the correct headers.
Error checking for matching and match data was not necessary as matching
is always successful if we're already in probe and the match tables always
have data pointers.
Signed-off-by: Rob Herring <robh@kernel.org>
There is usbdrv_wrap in struct usb_driver and usb_device_driver, it
contains device_driver and for_devices. for_devices is used to
distinguish between device drivers and interface drivers.
Like the is_usb_device(), it tests the type of the device. We can test
that if the probe of device_driver is equal to usb_probe_device in
is_usb_device_driver(), and then the struct usbdrv_wrap is no longer
needed.
Clean up struct usbdrv_wrap, use device_driver directly in struct
usb_driver and usb_device_driver. This makes the code cleaner.
Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/20240104032822.1896596-1-yajun.deng@linux.dev
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
kernel/bpf/verifier.c
829955981c ("bpf: Fix verifier log for async callback return values")
a923819fb2 ("bpf: Treat first argument as return value for bpf_throw")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
IMX93 A0 chip involve the internal q-channel handshake in LPCG and
CCM to automatically handle the Flex-CAN IPG STOP signal. Only after
FLEX-CAN enter stop mode then can support the self-wakeup feature.
But meet issue when do the continue system PM stress test. When config
the CAN as wakeup source, the first time after system suspend, any data
on CAN bus can wakeup the system, this is as expect. But the second time
when system suspend, data on CAN bus can't wakeup the system. If continue
this test, we find in odd time system enter suspend, CAN can wakeup the
system, but in even number system enter suspend, CAN can't wakeup the
system. IC find a bug in the auto stop mode logic, and can't fix it easily.
So for the new imx93 A1, IC drop the auto stop mode and involve the
GPR to support stop mode (used before). IC define a bit in GPR which can
trigger the IPG STOP signal to Flex-CAN, let it go into stop mode.
And NXP claim to drop IMX93 A0, and only support IMX93 A1. So this patch
remove the auto stop mode, and add flag FLEXCAN_QUIRK_SETUP_STOP_MODE_GPR
to imx93.
Signed-off-by: Haibo Chen <haibo.chen@nxp.com>
Link: https://lore.kernel.org/all/20230726112458.3524165-2-haibo.chen@nxp.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Upstream commit 717c6ec241 ("can: sja1000: Prevent overrun stalls with
a soft reset on Renesas SoCs") fixes an issue with Renesas own SJA1000
CAN controller reception: the Rx buffer is only 5 messages long, so when
the bus loaded (eg. a message every 50us), overrun may easily
happen. Upon an overrun situation, due to a possible internal crosstalk
situation, the controller enters a frozen state which only can be
unlocked with a soft reset (experimentally). The solution was to offload
a call to sja1000_start() in a threaded handler. This needs to happen in
process context as this operation requires to sleep. sja1000_start()
basically enters "reset mode", performs a proper software reset and
returns back into "normal mode".
Since this fix was introduced, we no longer observe any stalls in
reception. However it was sporadically observed that the transmit path
would now freeze. Further investigation blamed the fix mentioned above,
and especially the reset operation. Reproducing the reset in a loop
helped identifying what could possibly go wrong. The sja1000 is a single
Tx queue device, which leverages the netdev helpers to process one Tx
message at a time. The logic is: the queue is stopped, the message sent
to the transceiver, once properly transmitted the controller sets a
status bit which triggers an interrupt, in the interrupt handler the
transmission status is checked and the queue woken up. Unfortunately, if
an overrun happens, we might perform the soft reset precisely between
the transmission of the buffer to the transceiver and the advent of the
transmission status bit. We would then stop the transmission operation
without re-enabling the queue, leading to all further transmissions to
be ignored.
The reset interrupt can only happen while the device is "open", and
after a reset we anyway want to resume normal operations, no matter if a
packet to transmit got dropped in the process, so we shall wake up the
queue. Restarting the device and waking-up the queue is exactly what
sja1000_set_mode(CAN_MODE_START) does. In order to be consistent about
the queue state, we must acquire a lock both in the reset handler and in
the transmit path to ensure serialization of both operations. It turns
out, a lock is already held when entering the transmit path, so we can
just acquire/release it as well with the regular net helpers inside the
threaded interrupt handler and this way we should be safe. As the
reset handler might still be called after the transmission of a frame to
the transceiver but before it actually gets transmitted, we must ensure
we don't leak the skb, so we free it (the behavior is consistent, no
matter if there was an skb on the stack or not).
Fixes: 717c6ec241 ("can: sja1000: Prevent overrun stalls with a soft reset on Renesas SoCs")
Cc: stable@vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/all/20231002160206.190953-1-miquel.raynal@bootlin.com
[mkl: fixed call to can_free_echo_skb()]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The current at91_can driver uses NAPI to handle RX'ed CAN frames, the
RX IRQ is disabled and a NAPI poll is scheduled. Then in
at91_poll_rx() the RX'ed CAN frames are tried to read in order from
the device.
This approach has 2 drawbacks:
- Under high system load it might take too long from the initial RX
IRQ to the NAPI poll function to run. This causes RX buffer
overflows.
- The algorithm to read the CAN frames in order is not bullet proof
and may fail under certain use cases/system loads.
The rx-offload helper fixes these problems by reading the RX'ed CAN
frames in the interrupt handler and adding it to a list sorted by RX
timestamp. This list of RX'ed SKBs is then passed to the networking
stack via NAPI.
Convert the RX path to rx-offload, pass all CAN error frames with
can_rx_offload_queue_timestamp().
Link: https://lore.kernel.org/all/20231005-at91_can-rx_offload-v2-27-9987d53600e0@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This is a preparation patch to convert the driver to make use of the
rx-offload helper. With rx-offload the received CAN frames are sorted
by their timestamp. Regular CAN RX'ed and TX'ed CAN frames are
timestamped by the hardware. Error events are not.
Introduce a new function at91_alloc_can_err_skb() the allocates an
error SKB and reads the current timestamp from the controller.
Link: https://lore.kernel.org/all/20231005-at91_can-rx_offload-v2-26-9987d53600e0@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The driver implements a hand crafted CAN state handling. Update the
driver to make use of can_change_state(), introduced in ("can: dev:
Consolidate and unify state change handling")
Also switch from hand crafted CAN bus off handling to can_bus_off():
In case of a bus off, abort all pending TX requests, switch off the
device and let can_bus_off() handle the device restart.
Link: https://lore.kernel.org/all/20231005-at91_can-rx_offload-v2-24-9987d53600e0@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The at91 CAN controller automatically recovers from bus-off after 128
occurrences of 11 consecutive recessive bits.
After an auto-recovered bus-off, the error counters no longer reflect
this fact. On the sam9263 the state bits in the SR register show the
current state (based on the current error counters), while on sam9x5
and newer SoCs these bits are latched.
Take any latched bus-off information from the SR register into account
when calculating the CAN new state, to start the standard CAN bus off
handling.
Link: https://lore.kernel.org/all/20231005-at91_can-rx_offload-v2-23-9987d53600e0@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
On the sam9263 the SR bits for bus off, error passive, warning limit,
and error active are not latched and reflect the current status of the
controller. On the sam9x5 and newer SoCs these bits are latched.
To simplify the code, use can_state_get_by_berr_counter() to get the
state of the controller regardless of the SoC version.
Link: https://lore.kernel.org/all/20231005-at91_can-rx_offload-v2-22-9987d53600e0@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>