In case the eXtended mezzanine is present on the system, use it for IPv4
router offload.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is a table that assigns L-value per M-index. The L is always the
biggest from the currently inserted prefixes. Setup a hashtable to track
the M-index information and the prefixes that are related to it. Ensure
the L-value is always correctly set.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
During the router init flow, call into XM code and initialize couple of
items needed for XM functionality:
1) Query the capabilities and sizes. Check the XM device id.
2) Initialize the M-value. Note that currently the M-value is set fixed
to 16 for IPv4. In future this may change to better cover the actual
inserted routes.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In order to offload entries to XM, implement a set of low-level
functions to work with LPM trees in XM and also to pack and write
FIB entries into XM.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Turned out that mlxsw_sp_ipip_fib_entry_op_gre4() does not need to
figure out the IP address and virtual router id. Those are exactly
the same as in the fib_entry it is called for. So just use that and
reduce mlxsw_sp_ipip_fib_entry_op_gre4() function to only call
mlxsw_sp_ipip_fib_entry_op_gre4_rtdp() make the ipip decap op
code similar to nve.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case a router interface (RIF) is configured for a LAG, make sure its
configuration is applied on the new LAG member.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After adding support for QinQ, a.k.a 802.1ad protocol, there are a few
scenarios that should be vetoed.
The vetoes are motivated by various ASIC limitations.
For example, a port that is member in a 802.1ad bridge cannot have 802.1q
uppers as the port needs to be configured to treat 802.1q packets as
untagged packets.
Veto all those unsupported scenarios and return suitable messages.
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The device supports an operation that allows the driver to issue one
request to update the adjacency index for all the routes in a given
virtual router (VR) from old index and size to new ones. This is useful
in case the configuration of a certain nexthop group is updated and its
adjacency index changes.
Currently, the driver does not use this operation in an efficient
manner. It iterates over all the routes using the nexthop group and
issues an update request for the VR if it is not the same as the
previous VR.
Instead, use the VR tracking added in the previous patch to update the
adjacency index once for each VR currently using the nexthop group.
Example:
8k IPv6 routes were added in an alternating manner to two VRFs. All the
routes are using the same nexthop object ('nhid 1').
Before:
# perf stat -e devlink:devlink_hwmsg --filter='incoming==0' -- ip nexthop replace id 1 via 2001:db8:1::2 dev swp3
Performance counter stats for 'ip nexthop replace id 1 via 2001:db8:1::2 dev swp3':
16,385 devlink:devlink_hwmsg
4.255933213 seconds time elapsed
0.000000000 seconds user
0.666923000 seconds sys
Number of EMAD transactions corresponds to number of routes using the
nexthop group.
After:
# perf stat -e devlink:devlink_hwmsg --filter='incoming==0' -- ip nexthop replace id 1 via 2001:db8:1::2 dev swp3
Performance counter stats for 'ip nexthop replace id 1 via 2001:db8:1::2 dev swp3':
3 devlink:devlink_hwmsg
0.077655094 seconds time elapsed
0.000000000 seconds user
0.076698000 seconds sys
Number of EMAD transactions corresponds to number of VRFs / VRs.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For each nexthop group, track in which virtual routers (VRs) the group is
used. This is going to be used by the next patch to perform a more
efficient adjacency index update whenever the group's adjacency index
changes.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the rare case where the adjacency pointer cannot be updated for a
given virtual router, rollback the operation so that virtual routers
that are already using the new index will use the old one again.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
mlxsw_sp_adj_index_mass_update_vr() only needs the virtual router's
identifier and protocol, so pass them directly. In a subsequent patch
the caller will not have access to the pointer.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support for blackhole nexthops by programming them to the adjacency
table with a discard action.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The two are the same, but for blackhole nexthops we will not have an
associated neighbour struct, so resolve the RIF from the nexthop struct
itself instead.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that the driver creates a loopback RIF during its initialization, it
can be used to program the adjacency entries for unresolved nexthops
instead of other RIFs. The loopback RIF is guaranteed to exist for the
entire life time of the driver, unlike other RIFs that come and go.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Unresolved nexthops are currently written to the adjacency table with a
discard action. Packets hitting such entries are trapped to the CPU via
the 'DISCARD_ROUTER3' trap which can be enabled or disabled on demand,
but is always enabled in order to ensure the kernel can resolve the
unresolved neighbours.
This trap will be needed for blackhole nexthops support. Therefore, move
unresolved nexthops to explicitly program the adjacency entries with a
trap action and a different trap identifier, 'RTR_EGRESS0'.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Up until now RIFs (router interfaces) were created on demand (e.g.,
when an IP address was added to a netdev). However, sometimes the device
needs to be provided with a RIF when one might not be available.
For example, adjacency entries that drop packets need to be programmed
with an egress RIF despite the RIF not being used to forward packets.
Create such a RIF during initialization so that it could be used later
on to support blackhole nexthops.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that the driver supports nexthop objects, the check is no longer
necessary. Remove it.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If the FIB info (i.e, 'struct fib_info', 'struct fib6_info') uses a
nexthop object, then use the object's identifier to resolve the nexthop
group.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Register a listener to the nexthop notification chain and parse notified
nexthop objects into the existing mlxsw nexthop data structures.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The function is responsible for allocating the adjacency entries used by
the nexthop group and populating them with the adjacency information
such as egress RIF and MAC address.
Allow the function to return an error when it encounters a problem and
have the relevant call sites check it.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, a nexthop group is destroyed when the last FIB entry is
detached from it.
When nexthop objects are supported, this can no longer be the case, as
the group is a separate object whose lifetime is managed by user space.
Add an indication if a nexthop group can be destroyed and always set it
to true for the existing IPv4 and IPv6 nexthop groups.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the IPv6 FIB info has a nexthop object, the nexthop offload
indication is set on the nexthop object and not on the FIB info itself.
Therefore, do not try to clear the offload indication from the FIB info
when it has a nexthop object.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Attach the FIB entry to the nexthop group after setting the offload flag
on the IPv6 FIB info (i.e., 'struct fib6_info'). The second operation is
not needed when the nexthop group is a nexthop object. This will allow
us to have a common exit path from the function, regardless of the
nexthop group's type.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The previous patch associated a nexthop group with the FIB entry before
the entry's type is determined.
Make use of the nexthop group when determining the entry's type instead
of relying on helpers that assume that the nexthop info is not a nexthop
object (i.e., 'struct nexthop').
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Each FIB entry has a type (e.g., remote, local) that determines how the
entry is programmed to the device. In order to determine if the entry is
local (directly connected) or remote (has a gateway) the relevant FIB
info structures (e.g., 'struct fib_info') are checked.
When entries that use nexthop objects are supported, these checks will
need to be changed to take into account 'struct nexthop'.
Instead, first associate the entry with a nexthop group so that the next
patch could determine the entry's type based on the associated nexthop
group's type.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The sole caller of the function will soon only have the ifindex
available, instead of the pointer itself.
Therefore, change the function to take the ifindex as input and have it
get the pointer.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The ifindex of the nexthop device was never set for IPv4 nexthops,
unlike IPv6 nexthops. This went unnoticed since only IPv6 nexthops use
it.
Set the ifindex for IPv4 nexthops in order to be consistent with IPv6
and also because it will be used by a later patch.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The function allocates 'nhgi', not 'nh_grp', so it needs to free the
former in its error path.
Fixes: 7f7a417e6a ("mlxsw: spectrum_router: Split nexthop group configuration to a different struct")
Addresses-Coverity: ("Memory - corruptions (USE_AFTER_FREE)")
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since commit 21151f64a4 ("mlxsw: Add new FIB entry type for reject
routes") this comment is no longer correct. Remove it.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The two functions are identical, so consolidate them to
mlxsw_sp_nexthop_type_fini().
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The two functions are now identical, so consolidate them to
mlxsw_sp_nexthop_type_init().
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Instead of passing the nexthop and resolving the nexthop netdev from it,
pass the nexthop netdev directly.
This will later allow us to consolidate code paths between IPv4 and IPv6
code.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Instead of passing the route and resolving the nexthop netdev from it,
pass the nexthop netdev directly.
This will later allow us to consolidate code paths between IPv4 and IPv6
code.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The overlay protocol (i.e., IPv4/IPv6) that is being encapsulated has
no impact on whether a certain IP tunnel can be offloaded or not. Only
the underlay protocol matters.
Therefore, remove the unused overlay protocol parameter from the
callback.
This will later allow us to consolidate code paths between IPv4 and IPv6
code.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, the individual nexthops member in the group and attributes of
the group (e.g., its type) are stored in the same struct (i.e., 'struct
mlxsw_sp_nexthop_group'). This is fine since the individual nexthops
cannot change during the lifetime of the group.
With nexthop objects this is no longer the case. An existing nexthop
group can be replaced to use a new set of nexthops. Creating a new
struct whenever a group is replaced entails replacing the group pointer
of all the routes (i.e., 'struct mlxsw_sp_fib_entry') using the group.
Avoid this inefficient step by splitting the nexthop group configuration
to a different struct (i.e., 'struct mlxsw_sp_nexthop_group_info').
When a nexthop group is replaced a new group info struct is created and
the individual rotues do not need to be touched.
Illustration after the change:
mlxsw_sp_fib_entry mlxsw_sp_nexthop_group mlxsw_sp_nexthop_group_info
+-------------------+ +----------------------+ +---------------------------+
| nh_group; +--> nhgi; +--> |
| | | | | |
+-------------------+ +----------------------+ +---------------------------+
No functional changes intended.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Instead of storing the FIB info as 'priv' when the nexthop group
represents an IPv4 nexthop group, simply store it as a FIB info with a
proper comment.
When nexthop objects are supported, this field will become a union with
the nexthop object's identifier.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When needed, IPv4 routes fetch the FIB info (i.e., 'struct fib_info')
from their associated nexthop group. This will not work when the nexthop
group represents a nexthop object (i.e., 'struct nexthop'), as it will
only have access to the nexthop's identifier.
Instead, store the FIB info in the route itself.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As explained in the previous patch, nexthop objects can have both IPv4
and IPv6 nexthops in the same group. Therefore, move the neighbour table
to be a property of the nexthop instead of the nexthop group.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Both IPv4 and IPv6 nexthop groups are hashed in the same table. The
protocol field is used to indicate how the hash should be computed for
each group.
When nexthop group objects are supported, the hash will be computed for
them based on the nexthop identifier.
To differentiate between all the nexthop group types, encode the type of
the group in the key instead of the protocol.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, the type (i.e., IPv4/IPv6) of the nexthop group is derived
from the neighbour table associated with the group.
This is problematic when nexthop objects are taken into account, as a
nexthop group object can contain both IPv4 and IPv6 nexthops.
Instead, add a new field that indicates the type of the group and
initialize it during the group's creation. Currently, the types are IPv4
('struct fib_info') and IPv6 ('struct fib6_info'). In the future another
type will be added for nexthop objects ('struct nexthop').
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When comparing a key with a nexthop group in rhastable's obj_cmpfn()
callback, make sure that the key and nexthop group are of the same type
(i.e., IPv4 / IPv6).
The bug is not currently visible because IPv6 nexthop groups do not
populate the FIB info pointer and IPv4 nexthop groups do not set the
ifindex for the individual nexthops.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Follow-up patchset introducing XMDR implementation is going to need
to distinguish write and update ops. Therefore introduce "update op"
and call "write op" only when new FIB entry is inserted.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In case bulking is used, the entry that was previously added may not
be yet committed to the HW as it waits in the queue for bulk send. For
such entries, skip the deletion.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Prepare for the low-level ops that need to store some data alongside
the fib_entry and introduce a per-fib_entry priv for ll ops.
The priv is reference counted as in the follow-up patch it is going
to be saved in pack() function and used later on in commit() even in
case the related fib_entry gets freed in the middle.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Get the max size needed for FIB entry op context and allocate it once
for the instance. Use it repeatedly from the scheduled work.
By this, allow to extend the context to hold more data than it is wise
to do when it was on the stack. Make sure to signalize that the context
needs to be initialized in case families of subsequent FIB entries differ.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For XMDR register it is possible to carry multiple FIB entry
operations in a single write. However the FW does not restrict mixing
the types of operations, make the code easier and indicate the bulking
is ok only in case the bulk contains FIB operations of the same family
and event.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
With follow-up introduction of XM implementation, XMDR register is
going to be optionally used instead of RALUE register. Push the RALUE
packing helpers and write call into low-level router ops.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>