When the action of an xdp program was XDP_TX, lan966x was creating
a xdp_frame and use this one to send the frame back. But it is also
possible to send back the frame without needing a xdp_frame, because
it is possible to send it back using the page.
And then once the frame is transmitted is possible to use directly
page_pool_recycle_direct as lan966x is using page pools.
This would save some CPU usage on this path, which results in higher
number of transmitted frames. Bellow are the statistics:
Frame size: Improvement:
64 ~8%
256 ~11%
512 ~8%
1000 ~0%
1500 ~0%
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20230422142344.3630602-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
From time to time, it was observed that the nanosecond part of the
received timestamp, which is extracted from the IFH, it was actually
bigger than 1 second. So then when actually calculating the full
received timestamp, based on the nanosecond part from IFH and the second
part which is read from HW, it was actually wrong.
The issue seems to be inside the function lan966x_ifh_get, which
extracts information from an IFH(which is an byte array) and returns the
value in a u64. When extracting the timestamp value from the IFH, which
starts at bit 192 and have the size of 32 bits, then if the most
significant bit was set in the timestamp, then this bit was extended
then the return value became 0xffffffff... . And the reason of this is
because constants without any postfix are treated as signed longs and
that is the reason why '1 << 31' becomes 0xffffffff80000000.
This is fixed by adding the postfix 'ULL' to 1.
Fixes: fd7627833d ("net: lan966x: Stop using packing library")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a frame is injected from CPU, it is required to create an IFH(Inter
frame header) which sits in front of the frame that is transmitted.
This IFH, contains different fields like destination port, to bypass the
analyzer, priotity, etc. Lan966x it is using packing library to set and
get the fields of this IFH. But this seems to be an expensive
operations.
If this is changed with a simpler implementation, the RX will be
improved with ~5Mbit while on the TX is a much bigger improvement as it
is required to set more fields. Below are the numbers for TX.
Before:
[ 5] 0.00-10.02 sec 439 MBytes 367 Mbits/sec 0 sender
After:
[ 5] 0.00-10.00 sec 578 MBytes 485 Mbits/sec 0 sender
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Whenever a frame was received to the CPU, the HW is timestamping the
frame. In the IFH(Inter Frame Header) it is found the nanosecond part
of the timestamps the SW is required to read from HW the second part.
But reading the second part it seems to be a expensive operations, so
so change this such to read the second part only when rx filter is
enabled.
Doing this change gives the RX a performance boost of ~70mbit.
before:
[ 5] 0.00-10.01 sec 546 MBytes 457 Mbits/sec 0 sender
now:
[ 5] 0.00-10.01 sec 652 MBytes 530 Mbits/sec 0 sender
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IS1 VCAP has it's own list of supported ethernet protocol types which is
different than the IS2 VCAP. Therefore separate the list of known
protocol types based on the VCAP type.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Allow rules to be chained between IS1 VCAP and IS2 VCAP. Chaining
between IS1 lookups or between IS2 lookups are not supported by the
hardware.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add IS1 VCAP port keyset configuration for lan966x and also update debug
fs support to show the keyset configuration.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Provide IS1 (ingress stage 1) VCAP model for lan966x.
This provides classification actions for lan966x.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the police was removed from the port, then it was trying to
remove the police from the police id and not from the actual
police index.
The police id represents the id of the police and police index
represents the position in HW where the police is situated.
The port police id can be any number while the port police index
is a number based on the port chip port.
Fix this by deleting the police from HW that is situated at the
police index and not police id.
Fixes: 5390334b59 ("net: lan966x: Add port police support using tc-matchall")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 81e164c4ae ("net: microchip: sparx5: Add automatic
selection of VCAP rule actionset") the VCAP API has the capability to
select automatically the actionset based on the actions that are attached
to the rule. So it is not needed anymore to hardcode the actionset in the
driver, therefore it is OK to remove this.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the following TC flower filter keys to lan966x for IS2:
- ipv4_addr (sip and dip)
- ipv6_addr (sip and dip)
- control (IPv4 fragments)
- portnum (tcp and udp port numbers)
- basic (L3 and L4 protocol)
- vlan (outer vlan tag info)
- tcp (tcp flags)
- ip (tos field)
As the parsing of these keys is similar between lan966x and sparx5, move
the code in a separate file to be shared by these 2 chips. And put the
specific parsing outside of the common functions.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add flower filter packet statistics. This will just read the TCAM
counter of the rule, which mention how many packages were hit by this
rule.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable debugfs for vcap for lan966x. This will allow to print all the
entries in the VCAP and also the port information regarding which keys
are configured.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows the check of the goto action to be specific to the ingress and
egress VCAP instances.
The debugfs support is also updated to show this information.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure that xdp_do_flush() is always executed before
napi_complete_done(). This is important for two reasons. First, a
redirect to an XSKMAP assumes that a call to xdp_do_redirect() from
napi context X on CPU Y will be followed by a xdp_do_flush() from the
same napi context and CPU. This is not guaranteed if the
napi_complete_done() is executed before xdp_do_flush(), as it tells
the napi logic that it is fine to schedule napi context X on another
CPU. Details from a production system triggering this bug using the
veth driver can be found following the first link below.
The second reason is that the XDP_REDIRECT logic in itself relies on
being inside a single NAPI instance through to the xdp_do_flush() call
for RCU protection of all in-kernel data structures. Details can be
found in the second link below.
Fixes: a825b611c7 ("net: lan966x: Add support for XDP_REDIRECT")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com
Link: https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This changes the way the chain information verified when adding a new tc
flower filter.
When adding a flower filter it is now checked that the filter contains a
goto action to one of the IS2 VCAP lookups, except for the last lookup
which may omit this goto action.
It is also checked if you attempt to add multiple matchall filters to
enable the same VCAP lookup. This will be rejected.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds both the source and destination chain id to the information kept
for enabled port lookups.
This allows enabling and disabling a chain of lookups by walking the chain
information for a port.
This changes the way that VCAP lookups are enabled from userspace: instead
of one matchall rule that enables all the 4 Sparx5 IS2 lookups, you need a
matchall rule per lookup.
In practice that is done by adding one matchall rule in chain 0 to goto IS2
Lookup 0, and then for each lookup you add a rule per lookup (low priority)
that does a goto to the next lookup chain.
Examples:
If you want IS2 Lookup 0 to be enabled you add the same matchall filter as
before:
tc filter add dev eth12 ingress chain 0 prio 1000 handle 1000 matchall \
skip_sw action goto chain 8000000
If you also want to enable lookup 1 to 3 in IS2 and chain them you need
to add the following matchall filters:
tc filter add dev eth12 ingress chain 8000000 prio 1000 handle 1000 \
matchall skip_sw action goto chain 8100000
tc filter add dev eth12 ingress chain 8100000 prio 1000 handle 1000 \
matchall skip_sw action goto chain 8200000
tc filter add dev eth12 ingress chain 8200000 prio 1000 handle 1000 \
matchall skip_sw action goto chain 8300000
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This changes the VCAP lookups state to always be enabled so that it is
possible to add "internal" VCAP rules that must be available even though
the user has not yet enabled the VCAP chains via a TC matchall filter.
The API callback to enable and disable VCAP lookups is therefore removed.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The blamed commit implemented the vcap_operations to allow to add an
entry in the TCAM. One of the callbacks is to validate the supported
keysets. If the TCAM lookup was not enabled, then this will return
failure so no entries could be added.
This doesn't make much sense, as you can enable at a later point the
TCAM. Therefore change it such to allow entries in TCAM even it is not
enabled.
Fixes: 4426b78c62 ("net: lan966x: Add port keyset config and callback interface")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the PCS was taken out of reset, we were changing by mistake also
the speed to 100 Mbit. But in case the link was going down, the link
up routine was setting correctly the link speed. If the link was not
getting down then the speed was forced to run at 100 even if the
speed was something else.
On lan966x, to set the speed link to 1G or 2.5G a value of 1 needs to be
written in DEV_CLOCK_CFG_LINK_SPEED. This is similar to the procedure in
lan966x_port_init.
The issue was reproduced using 1000base-x sfp module using the commands:
ip link set dev eth2 up
ip link addr add 10.97.10.2/24 dev eth2
ethtool -s eth2 speed 1000 autoneg off
Fixes: d28d6d2e37 ("net: lan966x: add port module support")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com>
Link: https://lore.kernel.org/r/20221221093315.939133-1-horatiu.vultur@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Currently lan966x, doesn't allow to run PTP over interfaces that are
part of the bridge. The reason is when the lan966x was receiving a
PTP frame (regardless if L2/IPv4/IPv6) the HW it would flood this
frame.
Now that it is possible to add VCAP rules to the HW, such to trap these
frames to the CPU, it is possible to run PTP also over interfaces that
are part of the bridge.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Implement vcap_operations and enable default port keyset configuration
for each port. Now it is possible actually write/read/move entries in
the VCAP.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Extend matchall with action goto. This is needed to enable the lookup in
the VCAP. It is needed to connect chain 0 to a chain that is recognized
by the HW.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Currently the only supported action is ACTION_TRAP and the only
dissector is ETH_ADDRS. Others will be added in future patches.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This provides the lan966x is2 model and adds it to the vcap control
instance that will be provided to the vcap API.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When lan966x driver is initialized, initialize also the VCAP module for
lan966x.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Extend lan966x XDP support with the action XDP_REDIRECT. This is similar
with the XDP_TX, so a lot of functionality can be reused.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend lan966x XDP support with the action XDP_TX. In this case when the
received buffer needs to execute XDP_TX, the buffer will be moved to the
TX buffers. So a new RX buffer will be allocated.
When the TX finish with the frame, it would give back the buffer to the
page pool.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To add support for XDP_TX it is required to be able to write to the DMA
area therefore it is required that the pages will be mapped using
DMA_BIDIRECTIONAL flag.
Therefore check if there are any xdp programs on the interfaces and in
that case set DMA_BIDRECTIONAL otherwise use DMA_FROM_DEVICE.
Therefore when a new XDP program is added it is required to redo the
page_pool.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
By default the rxq memory model is MEM_TYPE_PAGE_SHARED but to be able
to reuse pages on the TX side, when the XDP action XDP_TX it is required
to update the memory model to PAGE_POOL.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently when a frame was transmitted, it is required to unamp the
frame that was transmitted. The length of the frame was taken from the
transmitted skb. In the future we might not have an skb, therefore store
the length skb directly in the lan966x_tx_dcb_buf and use this one to
unamp the frame.
While at this, also arrange the members in lan966x_tx_dcb_buf not to
have any holes.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce lan966x_fdma_tx_setup_dcb and lan966x_fdma_tx_start functions
and use of them inside lan966x_fdma_xmit. There is no functional change
in here.
They are introduced to be used when XDP_TX/REDIRECT actions are
introduced.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update the page_pool params to allocate XDP_PACKET_HEADROOM space as
headroom for all received frames.
This is needed for when the XDP_TX and XDP_REDIRECT are implemented.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
lan966x_stats_init() calls create_singlethread_workqueue() and not
checked the ret value, which may return NULL. And a null-ptr-deref may
happen:
lan966x_stats_init()
create_singlethread_workqueue() # failed, lan966x->stats_queue is NULL
queue_delayed_work()
queue_delayed_work_on()
__queue_delayed_work() # warning here, but continue
__queue_work() # access wq->flags, null-ptr-deref
Check the ret value and return -ENOMEM if it is NULL.
Fixes: 12c2d0a5b8 ("net: lan966x: add ethtool configuration and statistics")
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the page_pool API for allocation, freeing and DMA handling instead
of dev_alloc_pages, __free_pages and dma_map_page.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>