This patch registers a callback for get_fec_stats such that
FEC stats can be queried from the below command
"ethtool -I --show-fec eth0"
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In otx2_init_tc(), if rhashtable_init() failed, it does not free
tc->tc_entries_bitmap which is allocated in otx2_tc_alloc_ent_bitmap().
Fixes: 2e2a8126ff ("octeontx2-pf: Unify flow management variables")
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
devlink_ops::info_get() is now optional and devlink will continue to
report information even if that callback gets removed.
Remove all the empty devlink_ops::info_get() callbacks from the
drivers.
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The driver name is available in device_driver::name. Right now,
drivers still have to report this piece of information themselves in
their devlink_ops::info_get callback function.
In order to factorize code, make devlink_nl_info_fill() add the driver
name attribute.
Now that the core sets the driver name attribute, drivers are not
supposed to call devlink_info_driver_name_put() anymore. Remove
devlink_info_driver_name_put() and clean-up all the drivers using this
function in their callback.
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Tested-by: Ido Schimmel <idosch@nvidia.com> # mlxsw
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
1. Added support to filter packets based on IP fragment.
For IPv4 packets check for ip_flag == 0x20 (more fragment bit set).
For IPv6 packets check for next_header == 0x2c (next_header set to
'fragment header for IPv6')
2. Added configuration support from both "ethtool ntuple" and "tc flower".
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch addresses pfc_alloc_status array overflow occurring for
send queue index value greater than PFC priority. Queue index can be
greater than supported PFC priority for multiple scenarios (e.g. QoS,
during non zero SMQ allocation for a PF/VF).
In those scenarios the API should return default tx scheduler '0'.
This is causing mbox errors as otx2_get_smq_idx returing invalid smq value.
Fixes: 99c969a83d ("octeontx2-pf: Add egress PFC support")
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1. If a profile does not support DMAC extraction then avoid installing NPC
flow rules for unicast. Similarly, if LXMB(L2 and L3) extraction is not
supported by the profile then avoid installing broadcast and multicast
rules.
2. Allow MCAM entry insertion for promiscuous mode.
3. For the profiles where DMAC is not extracted in MKEX key default
unicast entry installed by AF is not valid. Hence do not use action
from the AF installed default unicast entry for such cases.
4. Adjacent packet header fields in a packet like IP header source
and destination addresses or UDP/TCP header source port and destination
can be extracted together in MKEX profile. Therefore MKEX profile can be
configured to in two ways:
a. Total of 4 bytes from start of UDP header(src port
+ destination port)
or
b. Two bytes from start and two bytes from offset 2
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://lore.kernel.org/r/20221118053329.2288486-1-sumang@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
skb->vlan_present seems redundant.
We can instead derive it from this boolean expression:
vlan_present = skb->vlan_proto != 0 || skb->vlan_tci != 0
Add a new union, to access both fields in a single load/store
when possible.
union {
u32 vlan_all;
struct {
__be16 vlan_proto;
__u16 vlan_tci;
};
};
This allows following patch to remove a conditional test in GRO stack.
Note:
We move remcsum_offload to keep TC_AT_INGRESS_MASK
and SKB_MONO_DELIVERY_TIME_MASK unchanged.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Current way of checking available SQE count which is based on
HW updated SQB count could result in driver submitting an SQE
even before CQE for the previously transmitted SQE at the same
index is processed in NAPI resulting losing SKB pointers,
hence a leak. Fix this by checking a consumer index which
is updated once CQE is processed.
Fixes: 3ca6c4c882 ("octeontx2-pf: Add packet transmission support")
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Reviewed-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Link: https://lore.kernel.org/r/20221107033505.2491464-1-rkannoth@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In scenarios where multiple errors have occurred
for a SQ before SW starts handling error interrupt,
SQ_CTX[OP_INT] may get overwritten leading to
NIX_LF_SQ_OP_INT returning incorrect value.
To workaround this read LMT, MNQ and SQ individual
error status registers to determine the cause of error.
Fixes: 4ff7d1488a ("octeontx2-pf: Error handling support")
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Reviewed-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fix rxsc and txsc not getting freed before going out of scope
Fixes: c54ffc7360 ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading")
Signed-off-by: Manank Patel <pmanank200502@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In error path after calling cn10k_mcs_init(), cn10k_mcs_free() need
be called to avoid memory leak.
Fixes: c54ffc7360 ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the missing unlock in some error paths.
Fixes: c54ffc7360 ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces the macsec offload feature to cn10k
PF netdev driver. The macsec offload ops like adding, deleting
and updating SecYs, SCs, SAs and stats are supported. XPN support
will be added in later patches. Some stats use same counter in hardware
which means based on the SecY mode the same counter represents different
stat. Hence when SecY mode/policy is changed then snapshot of current
stats are captured. Also there is no provision to specify the unique
flow-id/SCI per packet to hardware hence different mac address needs to
be set for macsec interfaces.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We tell driver developers to always pass NAPI_POLL_WEIGHT
as the weight to netif_napi_add(). This may be confusing
to newcomers, drop the weight argument, those who really
need to tweak the weight can use netif_napi_add_weight().
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for CAN
Link: https://lore.kernel.org/r/20220927132753.750069-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If CONFIG_DCB is not set,
make ARCH=x86_64 CROSS_COMPILE=x86_64-linux-gnu-,
will be failed, like this:
drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c: In function ‘otx2_select_queue’:
drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c:1886:19: error: unused variable ‘pf’ [-Werror=unused-variable]
struct otx2_nic *pf = netdev_priv(netdev);
^~
cc1: all warnings being treated as errors
To fix this build error, put the definition of *pf under the CONFIG_DCB.
Fixes: 99c969a83d ("octeontx2-pf: Add egress PFC support")
Signed-off-by: Ren Zhijie <renzhijie2@huawei.com>
Link: https://lore.kernel.org/r/20220919025840.256411-1-renzhijie2@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Errata:
The ptp_clock_hi rollsover to zero one clock cycle before it
reaches one second boundary. As a result, the pps threshold
comparison fails after one second and the pps output signal
won't toggle further.
This patch workarounds the issue by programming the pps_lo_incr
register to 500msec minus one clock cycle period, ensuring that
the pps threshold comparison succeeds at one second rollover
boundary and pps edge toggles. After that point, the driver will
have enough time (~500msec) to reset the pps threshold value.
After each one second boundary, hrtimer is invoked which resets
the pps threshold value.
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Rakesh Babu Saladi <rsaladi2@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for ptp 1-step mode using timecounter. The seconds and
nanoseconds to be updated in PTP header are calculated by adding the
timecounter offset to the free running PTP clock counter time. The PF
driver periodically gets the PTP clock time using AF mbox. The 1-step
support uses HW feature to update correction field rather than
OriginTimestamp field in PTP header.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As of now all transmit queues transmit packets out of same scheduler
queue hierarchy. Due to this PFC frames sent by peer are not handled
properly, either all transmit queues are backpressured or none.
To fix this when user enables PFC for a given priority map relavant
transmit queue to a different scheduler queue hierarcy, so that
backpressure is applied only to the traffic egressing out of that TXQ.
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Link: https://lore.kernel.org/r/20220830120304.158060-1-sumang@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
For packets scheduled to RPM and LBK, NIX_AF_PSE_CHANNEL_LEVEL[BP_LEVEL]
selects the TL3 or TL2 scheduling level as the one used for link/channel
selection and backpressure. For each scheduling queue at the selected
level: Setting NIX_AF_TL3_TL2(0..255)_LINK(0..12)_CFG[ENA] = 1 allows
the TL3/TL2 queue to schedule packets to a specified RPM or LBK link
and channel.
There is an issue in the code where NIX_AF_PSE_CHANNEL_LEVEL[BP_LEVEL]
is set to TL3 where as the NIX_AF_TL3_TL2(0..255)_LINK(0..12)_CFG is
configured for TL2 queue in some cases. As a result packets will not
transmit on that link/channel. This patch fixes the issue by configuring
the NIX_AF_TL3_TL2(0..255)_LINK(0..12)_CFG register depending on the
NIX_AF_PSE_CHANNEL_LEVEL[BP_LEVEL] value.
Fixes: caa2da34fd ("octeontx2-pf: Initialize and config queues")
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Link: https://lore.kernel.org/r/20220802142813.25031-1-naveenm@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
PTP messages like SYNC, FOLLOW_UP, DELAY_REQ are of size 58 bytes.
Using a minimum packet length as 64 makes NIX to pad 6 bytes of
zeroes while transmission. This is causing latest ptp4l application to
emit errors since length in PTP header and received packet are not same.
Padding upto 3 bytes is fine but more than that makes ptp4l to assume
the pad bytes as a TLV. Hence reduce the size to 60 from 64.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Link: https://lore.kernel.org/r/20220729092457.3850-1-naveenm@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This reverts commit 35d099da41, reversing
changes made to 58d8bcd47e.
I wrongly applied that to the net-next tree instead of the intended
target tree (net). Reverting it on net-next.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Check the mask for non-zero value before installing tc filters
for L4 source and destination ports. Otherwise installing a
filter for source port installs destination port too and
vice-versa.
Fixes: 1d4d9e42c2 ("octeontx2-pf: Add tc flower hardware offload on ingress traffic")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
NIX_AF_TLXX_PIR/CIR register format has changed from OcteonTx2
to CN10K. CN10K supports larger burst size. Fix burst exponent
and burst mantissa configuration for CN10K.
Also fixed 'maxrate' from u32 to u64 since 'police.rate_bytes_ps'
passed by stack is also u64.
Fixes: e638a83f16 ("octeontx2-pf: TC_MATCHALL egress ratelimiting offload")
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Check the mask for non-zero value before installing tc filters
for L4 source and destination ports. Otherwise installing a
filter for source port installs destination port too and
vice-versa.
Fixes: 1d4d9e42c2 ("octeontx2-pf: Add tc flower hardware offload on ingress traffic")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
NIX_AF_TLXX_PIR/CIR register format has changed from OcteonTx2
to CN10K. CN10K supports larger burst size. Fix burst exponent
and burst mantissa configuration for CN10K.
Also fixed 'maxrate' from u32 to u64 since 'police.rate_bytes_ps'
passed by stack is also u64.
Fixes: e638a83f16 ("octeontx2-pf: TC_MATCHALL egress ratelimiting offload")
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
NPC exact match table can support more entries than RPM
dmac filters. This requires field size of DMAC filter count
and index to be increased.
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Exact match table modification requires wider fields as it has
more number of slots to fill in. Modifying an entry in exact match
table may cause hash collision and may be required to delete entry
from 4-way 2K table and add to fully associative 32 entry CAM table.
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This reverts commit 2ef8e39f58, reversing
changes made to e7ce9fc9ad.
There are build warnings here which break the normal
build due to -Werror. Ratheesh was nice enough to quickly
follow up with fixes but didn't hit all the warnings I
see on GCC 12 so to unlock net-next from taking patches
let get this series out for now.
Link: https://lore.kernel.org/r/20220707013201.1372433-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
NPC exact match table can support more entries than RPM
dmac filters. This requires field size of DMAC filter count
and index to be increased.
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Exact match table modification requires wider fields as it has
more number of slots to fill in. Modifying an entry in exact match
table may cause hash collision and may be required to delete entry
from 4-way 2K table and add to fully associative 32 entry CAM table.
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most drivers use "skb_transport_offset(skb) + tcp_hdrlen(skb)"
to compute headers length for a TCP packet, but others
use more convoluted (but equivalent) ways.
Add skb_tcp_all_headers() and skb_inner_tcp_all_headers()
helpers to harmonize this a bit.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: 6e144b47f5 (octeontx2-pf: Add support for adaptive interrupt coalescing)
Added support for VF interfaces as well.
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull bitmap updates from Yury Norov:
- bitmap: optimize bitmap_weight() usage, from me
- lib/bitmap.c make bitmap_print_bitmask_to_buf parseable, from Mauro
Carvalho Chehab
- include/linux/find: Fix documentation, from Anna-Maria Behnsen
- bitmap: fix conversion from/to fix-sized arrays, from me
- bitmap: Fix return values to be unsigned, from Kees Cook
It has been in linux-next for at least a week with no problems.
* tag 'bitmap-for-5.19-rc1' of https://github.com/norov/linux: (31 commits)
nodemask: Fix return values to be unsigned
bitmap: Fix return values to be unsigned
KVM: x86: hyper-v: replace bitmap_weight() with hweight64()
KVM: x86: hyper-v: fix type of valid_bank_mask
ia64: cleanup remove_siblinginfo()
drm/amd/pm: use bitmap_{from,to}_arr32 where appropriate
KVM: s390: replace bitmap_copy with bitmap_{from,to}_arr64 where appropriate
lib/bitmap: add test for bitmap_{from,to}_arr64
lib: add bitmap_{from,to}_arr64
lib/bitmap: extend comment for bitmap_(from,to)_arr32()
include/linux/find: Fix documentation
lib/bitmap.c make bitmap_print_bitmask_to_buf parseable
MAINTAINERS: add cpumask and nodemask files to BITMAP_API
arch/x86: replace nodes_weight with nodes_empty where appropriate
mm/vmstat: replace cpumask_weight with cpumask_empty where appropriate
clocksource: replace cpumask_weight with cpumask_empty in clocksource.c
genirq/affinity: replace cpumask_weight with cpumask_empty where appropriate
irq: mips: replace cpumask_weight with cpumask_empty where appropriate
drm/i915/pmu: replace cpumask_weight with cpumask_empty where appropriate
arch/x86: replace cpumask_weight with cpumask_empty where appropriate
...
Calling synchronize_irq() right before free_irq() is quite useless. On one
hand the IRQ can easily fire again before free_irq() is entered, on the
other hand free_irq() itself calls synchronize_irq() internally (in a race
condition free way), before any state associated with the IRQ is freed.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Drivers should call the TSO setting helper, GSO is controllable
by user space.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In some places, octeontx2 code calls bitmap_weight() to check if any bit of
a given bitmap is set. It's better to use bitmap_empty() in that case
because bitmap_empty() stops traversing the bitmap as soon as it finds
first set bit, while bitmap_weight() counts all bits unconditionally.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
As more police parameters are passed to flow_offload, driver can check
them to make sure hardware handles packets in the way indicated by tc.
The conform-exceed control should be drop/pipe or drop/ok. Besides,
for drop/ok, the police should be the last action. As hardware can't
configure peakrate/avrate/overhead, offload should not be supported if
any of them is configured.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Completion Queue Entry(CQE) is a descriptor written
by hardware to notify software about the send and
receive completion status. The CQE can be of size
128 or 512 bytes. A 512 bytes CQE can hold more receive
fragments pointers compared to 128 bytes CQE. This
patch enables to modify CQE size using:
<ethtool -G cqe-size N>.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The cn10k hardware ptp timestamp format has been modified primarily
to support 1-step ptp clock. The 64-bit timestamp used by hardware is
split into two 32-bit fields, the upper one holds seconds, the lower
one nanoseconds. A new register (PTP_CLOCK_SEC) has been added that
returns the current seconds value. The nanoseconds register PTP_CLOCK_HI
resets after every second. The cn10k RPM block provides Rx/Tx timestamps
to the NIX block using the new timestamp format. The software can read
the current timestamp in nanoseconds by reading both PTP_CLOCK_SEC &
PTP_CLOCK_HI registers.
This patch provides support for new timestamp format.
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: Rakesh Babu Saladi <rsaladi2@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds TC feature for VFs also. When MCAM
rules are allocated for a VF then either TC or ntuple
filters can be used. Below are the commands to use
TC feature for a VF(say lbk0):
devlink dev param set pci/0002:01:00.1 name mcam_count value 16 \
cmode runtime
ethtool -K lbk0 hw-tc-offload on
ifconfig lbk0 up
tc qdisc add dev lbk0 ingress
tc filter add dev lbk0 parent ffff: protocol ip flower skip_sw \
dst_mac 98:03:9b:83:aa:12 action police rate 100Mbit burst 5000
Also to modify any fields of the hardware context with
NIX_AQ_INSTOP_WRITE command then corresponding masks of those
fields must be set as per hardware. This was missing in
ingress ratelimiting context. This patch sets those masks also.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>