linux

mirror of https://github.com/raspberrypi/linux.git synced 2025-12-15 22:41:38 +00:00

Author	SHA1	Message	Date
Russell King (Oracle)	21c8e45acb	net: mvneta: use phylink_pcs_change() to report PCS link change events Use phylink_pcs_change() when reporting changes in PCS link state to phylink as the interrupts are informing us about changes to the PCS state. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/E1s0OGs-009hgl-Jg@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-29 19:04:23 -07:00
Russell King (Oracle)	45f54a9106	net: mvpp2: use phylink_pcs_change() to report PCS link change events Use phylink_pcs_change() when reporting changes in PCS link state to phylink as the interrupts are informing us about changes to the PCS state. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/E1s0OGn-009hgf-G6@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-29 19:03:46 -07:00
Eric Dumazet	e8dfd42c17	ipv6: introduce dst_rt6_info() helper Instead of (struct rt6_info *)dst casts, we can use : #define dst_rt6_info(_ptr) \ container_of_const(_ptr, struct rt6_info, dst) Some places needed missing const qualifiers : ip6_confirm_neigh(), ipv6_anycast_destination(), ipv6_unicast_destination(), has_gateway() v2: added missing parts (David Ahern) Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-29 13:32:01 +01:00
Amit Cohen	3b0b3019db	mlxsw: pci: Use NAPI for event processing Spectrum ASICs only support a single interrupt, that means that all the events are handled by one IRQ (interrupt request) handler. Once an interrupt is received, we schedule tasklet to handle events from EQ and then schedule tasklets to handle completions from CQs. Tasklet runs in softIRQ (software IRQ) context, and will be run on the same CPU which scheduled it. That means that today we use only one CPU to handle all the packets (both network packets and EMADs) from hardware. This can be improved using NAPI. The idea is to use NAPI instance per CQ, which is mapped 1:1 to DQ (RDQ or SDQ). NAPI poll method can be run in kernel thread, so then the driver will be able to handle WQEs in several CPUs. Convert the existing code to use NAPI APIs. Add NAPI instance as part of 'struct mlxsw_pci_queue' and initialize it as part of CQs initialization. Set the appropriate poll method and dummy net device, according to queue number, similar to tasklet setup. For CQs which are used for completions of RDQ, use Rx poll method and 'napi_dev_rx', which is set as 'threaded'. It means that Rx poll method will run in kernel context, so several RDQs will be handled in parallel. For CQs which are used for completions of SDQ, use Tx poll method and 'napi_dev_tx', this method will run in softIRQ context, as it is recommended in NAPI documentation, as Tx packets' processing is short task. Convert mlxsw_pci_cq_{rx,tx}_tasklet() to poll methods. Handle 'budget' argument - ignore it in Tx poll method, as it is recommended to not limit Tx processing. For Rx processing, handle up to 'budget' completions. Return 'work_done' which is the amount of completions that were handled. Handle the following cases: 1. After processing 'budget' completions, the driver still has work to do: Return work-done = budget. In that case, the NAPI instance will be polled again (without the need to be rescheduled). Do not re-arm the queue, as NAPI will handle the reschedule, so we do not have to involve hardware to send an additional interrupt for the completions that should be processed. 2. Event processing has been completed: Call napi_complete_done() to mark NAPI processing as completed, which means that the poll method will not be rescheduled. Re-arm the queue, as all completions were handled. In case that poll method handled exactly 'budget' completions, return work-done = budget -1, to distinguish from the case that driver still has completions to handle. Otherwise, return the amount of completions that were handled. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-29 10:47:05 +01:00
Amit Cohen	c0d9267873	mlxsw: pci: Reorganize 'mlxsw_pci_queue' structure The next patch will set the driver to use NAPI for event processing. Then tasklet mechanism will be used only for EQ. Reorganize 'mlxsw_pci_queue' to hold EQ and CQ attributes in a union. For now, add tasklet for both EQ and CQ. This will be changed in the next patch, as 'tasklet_struct' will be replaced with NAPI instance. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-29 10:47:05 +01:00
Amit Cohen	5d01ed2e97	mlxsw: pci: Initialize dummy net devices for NAPI mlxsw will use NAPI for event processing in a next patch. As preparation, add two dummy net devices and initialize them. NAPI instance should be attached to net device. Usually each queue is used by a single net device in network drivers, so the mapping between net device to NAPI instance is intuitive. In our case, Rx queues are not per port, they are per trap-group. Tx queues are mapped to net devices, but we do not have a separate queue for each local port, several ports share the same queue. Use init_dummy_netdev() to initialize dummy net devices for NAPI. To run NAPI poll method in a kernel thread, the net device which NAPI instance is attached to should be marked as 'threaded'. It is recommended to handle Tx packets in softIRQ context, as usually this is a short task - just free the Tx packet which has been transmitted. Rx packets handling is more complicated task, so drivers can use a dedicated kernel thread to process them. It allows processing packets from different Rx queues in parallel. We would like to handle only Rx packets in kernel threads, which means that we will use two dummy net devices (one for Rx and one for Tx). Set only one of them with 'threaded' as it will be used for Rx processing. Do not fail in case that setting 'threaded' fails, as it is better to use regular softIRQ NAPI rather than preventing the driver from loading. Note that the net devices are initialized with init_dummy_netdev(), so they are not registered, which means that they will not be visible to user. It will not be possible to change 'threaded' configuration from user space, but it is reasonable in our case, as there is no another configuration which makes sense, considering that user has no influence on the usage of each queue. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-29 10:47:05 +01:00
Amit Cohen	6b3d015cdb	mlxsw: pci: Ring RDQ and CQ doorbells once per several completions Currently, for each CQE in CQ, we ring CQ doorbell, then handle RDQ and ring RDQ doorbell. Finally we ring CQ arm doorbell - once per CQ tasklet. The idea of ringing CQ doorbell before RDQ doorbell, is to be sure that when we post new WQE (after RDQ is handled), there is an available CQE. This was done because of a hardware bug as part of commit `c9ebea04cb` ("mlxsw: pci: Ring CQ's doorbell before RDQ's"). There is no real reason to ring RDQ and CQ doorbells for each completion, it is better to handle several completions and reduce number of ringings, as access to hardware is expensive (time wise) and might take time because of memory barriers. A previous patch changed CQ tasklet to handle up to 64 Rx packets. With this limitation, we can ring CQ and RDQ doorbells once per CQ tasklet. The counters of the doorbells are increased by the amount of packets that we handled, then the device will know for which completion to send an additional event. To avoid reordering CQ and RDQ doorbells' ring, let the tasklet to ring also RDQ doorbell, mlxsw_pci_cqe_rdq_handle() handles the counter but does not ring the doorbell. Note that with this change there is no need to copy the CQE, as we ring CQ doorbell only after Rx packet processing (which uses the CQE) is done. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-29 10:47:05 +01:00
Amit Cohen	e28d8aba43	mlxsw: pci: Handle up to 64 Rx completions in tasklet We can get many completions in one interrupt. Currently, the CQ tasklet handles up to half queue size completions, and then arms the hardware to generate additional events, which means that in case that there were additional completions that we did not handle, we will get immediately an additional interrupt to handle the rest. The decision to handle up to half of the queue size is arbitrary and was determined in 2015, when mlxsw driver was added to the kernel. One additional fact that should be taken into account is that while WQEs from RDQ are handled, the CPU that handles the tasklet is dedicated for this task, which means that we might hold the CPU for a long time. Handle WQEs in smaller chucks, then arm CQ doorbell to notify the hardware to send additional notifications. Set the chunk size to 64 as this number is recommended using NAPI and the driver will use NAPI in a next patch. Note that for now we use ARM doorbell to retrigger CQ tasklet, but with NAPI it will be more efficient as software will reschedule the poll method and we will not involve hardware for that. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-29 10:47:05 +01:00
Asbjørn Sloth Tønnesen	f26f719a36	net: qede: use return from qede_parse_actions() When calling qede_parse_actions() then the return code was only used for a non-zero check, and then -EINVAL was returned. qede_parse_actions() can currently fail with: * -EINVAL * -EOPNOTSUPP This patch changes the code to use the actual return code, not just return -EINVAL. The blaimed commit broke the implicit assumption that only -EINVAL would ever be returned. Only compile tested. Fixes: `319a1d1947` ("flow_offload: check for basic action hw stats type") Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-29 10:02:43 +01:00
Asbjørn Sloth Tønnesen	27b44414a3	net: qede: use return from qede_parse_flow_attr() for flow_spec In qede_flow_spec_to_rule(), when calling qede_parse_flow_attr() then the return code was only used for a non-zero check, and then -EINVAL was returned. qede_parse_flow_attr() can currently fail with: * -EINVAL * -EOPNOTSUPP * -EPROTONOSUPPORT This patch changes the code to use the actual return code, not just return -EINVAL. The blaimed commit introduced qede_flow_spec_to_rule(), and this call to qede_parse_flow_attr(), it looks like it just duplicated how it was already used. Only compile tested. Fixes: `37c5d3efd7` ("qede: use ethtool_rx_flow_rule() to remove duplicated parser code") Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-29 10:02:43 +01:00
Asbjørn Sloth Tønnesen	fcee2065a1	net: qede: use return from qede_parse_flow_attr() for flower In qede_add_tc_flower_fltr(), when calling qede_parse_flow_attr() then the return code was only used for a non-zero check, and then -EINVAL was returned. qede_parse_flow_attr() can currently fail with: * -EINVAL * -EOPNOTSUPP * -EPROTONOSUPPORT This patch changes the code to use the actual return code, not just return -EINVAL. The blaimed commit introduced these functions. Only compile tested. Fixes: `2ce9c93eac` ("qede: Ingress tc flower offload (drop action) support.") Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-29 10:02:43 +01:00
Asbjørn Sloth Tønnesen	e25714466a	net: qede: sanitize 'rc' in qede_add_tc_flower_fltr() Explicitly set 'rc' (return code), before jumping to the unlock and return path. By not having any code depend on that 'rc' remains at it's initial value of -EINVAL, then we can re-use 'rc' for the return code of function calls in subsequent patches. Only compile tested. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-29 10:02:43 +01:00
Doug Berger	0d5e2a8223	net: bcmgenet: synchronize UMAC_CMD access The UMAC_CMD register is written from different execution contexts and has insufficient synchronization protections to prevent possible corruption. Of particular concern are the acceses from the phy_device delayed work context used by the adjust_link call and the BH context that may be used by the ndo_set_rx_mode call. A spinlock is added to the driver to protect contended register accesses (i.e. reg_lock) and it is used to synchronize accesses to UMAC_CMD. Fixes: `1c1008c793` ("net: bcmgenet: add main driver file") Cc: stable@vger.kernel.org Signed-off-by: Doug Berger <opendmb@gmail.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-29 06:24:22 +01:00
Doug Berger	2dbe5f1936	net: bcmgenet: synchronize use of bcmgenet_set_rx_mode() The ndo_set_rx_mode function is synchronized with the netif_addr_lock spinlock and BHs disabled. Since this function is also invoked directly from the driver the same synchronization should be applied. Fixes: `72f9634762` ("net: bcmgenet: set Rx mode before starting netif") Cc: stable@vger.kernel.org Signed-off-by: Doug Berger <opendmb@gmail.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-29 06:24:22 +01:00
Doug Berger	d85cf67a33	net: bcmgenet: synchronize EXT_RGMII_OOB_CTRL access The EXT_RGMII_OOB_CTRL register can be written from different contexts. It is predominantly written from the adjust_link handler which is synchronized by the phydev->lock, but can also be written from a different context when configuring the mii in bcmgenet_mii_config(). The chances of contention are quite low, but it is conceivable that adjust_link could occur during resume when WoL is enabled so use the phydev->lock synchronizer in bcmgenet_mii_config() to be sure. Fixes: `afe3f907d2` ("net: bcmgenet: power on MII block for all MII modes") Cc: stable@vger.kernel.org Signed-off-by: Doug Berger <opendmb@gmail.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-29 06:24:21 +01:00
Tanmay Patil	d63394abc9	net: ethernet: ti: am65-cpsw-qos: Add support to taprio for past base_time If the base-time for taprio is in the past, start the schedule at the time of the form "base_time + N*cycle_time" where N is the smallest possible integer such that the above time is in the future. Signed-off-by: Tanmay Patil <t-patil@ti.com> Signed-off-by: Chintan Vankar <c-vankar@ti.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-29 06:19:16 +01:00
Song Yoong Siang	15fd021bc4	igc: Add Tx hardware timestamp request for AF_XDP zero-copy packet This patch adds support to per-packet Tx hardware timestamp request to AF_XDP zero-copy packet via XDP Tx metadata framework. Please note that user needs to enable Tx HW timestamp capability via igc_ioctl() with SIOCSHWTSTAMP cmd before sending xsk Tx hardware timestamp request. Same as implementation in RX timestamp XDP hints kfunc metadata, Timer 0 (adjustable clock) is used in xsk Tx hardware timestamp. i225/i226 have four sets of timestamping registers. skb and xsk_tx_buffer pointers are used to indicate whether the timestamping register is already occupied. Furthermore, a boolean variable named xsk_pending_ts is used to hold the transmit completion until the tx hardware timestamp is ready. This is because, for i225/i226, the timestamp notification event comes some time after the transmit completion event. The driver will retrigger hardware irq to clean the packet after retrieve the tx hardware timestamp. Besides, xsk_meta is added into struct igc_tx_timestamp_request as a hook to the metadata location of the transmit packet. When the Tx timestamp interrupt is fired, the interrupt handler will copy the value of Tx hwts into metadata location via xsk_tx_metadata_complete(). This patch is tested with tools/testing/selftests/bpf/xdp_hw_metadata on Intel ADL-S platform. Below are the test steps and results. Test Step 1: Run xdp_hw_metadata app ./xdp_hw_metadata <iface> > /dev/shm/result.log Test Step 2: Enable Tx hardware timestamp hwstamp_ctl -i <iface> -t 1 -r 1 Test Step 3: Run ptp4l and phc2sys for time synchronization Test Step 4: Generate UDP packets with 1ms interval for 10s trafgen --dev <iface> '{eth(da=<addr>), udp(dp=9091)}' -t 1ms -n 10000 Test Step 5: Rerun Step 1-3 with 10s iperf3 as background traffic Test Step 6: Rerun Step 1-4 with 10s iperf3 as background traffic Based on iperf3 results below, the impact of holding tx completion to throughput is not observable. Result of last UDP packet (no. 10000) in Step 4: poll: 1 (0) skip=99 fail=0 redir=10000 xsk_ring_cons__peek: 1 0x5640a37972d0: rx_desc[9999]->addr=f2110 addr=f2110 comp_addr=f2110 EoP rx_hash: 0x2049BE1D with RSS type:0x1 HW RX-time: 1679819246792971268 (sec:1679819246.7930) delta to User RX-time sec:0.0000 (14.990 usec) XDP RX-time: 1679819246792981987 (sec:1679819246.7930) delta to User RX-time sec:0.0000 (4.271 usec) No rx_vlan_tci or rx_vlan_proto, err=-95 0x5640a37972d0: ping-pong with csum=ab19 (want 315b) csum_start=34 csum_offset=6 0x5640a37972d0: complete tx idx=9999 addr=f010 HW TX-complete-time: 1679819246793036971 (sec:1679819246.7930) delta to User TX-complete-time sec:0.0001 (77.656 usec) XDP RX-time: 1679819246792981987 (sec:1679819246.7930) delta to User TX-complete-time sec:0.0001 (132.640 usec) HW RX-time: 1679819246792971268 (sec:1679819246.7930) delta to HW TX-complete-time sec:0.0001 (65.703 usec) 0x5640a37972d0: complete rx idx=10127 addr=f2110 Result of iperf3 without tx hwts request in step 5: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 2.74 GBytes 2.36 Gbits/sec 0 sender [ 5] 0.00-10.05 sec 2.74 GBytes 2.34 Gbits/sec receiver Result of iperf3 running parallel with trafgen command in step 6: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 2.74 GBytes 2.36 Gbits/sec 0 sender [ 5] 0.00-10.04 sec 2.74 GBytes 2.34 Gbits/sec receiver Co-developed-by: Lai Peter Jun Ann <jun.ann.lai@intel.com> Signed-off-by: Lai Peter Jun Ann <jun.ann.lai@intel.com> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20240424210256.3440903-1-anthony.l.nguyen@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-04-26 15:23:59 +02:00
Kent Overstreet	0069455bcb	fix missing vmalloc.h includes Patch series "Memory allocation profiling", v6. Overview: Low overhead [1] per-callsite memory allocation profiling. Not just for debug kernels, overhead low enough to be deployed in production. Example output: root@moria-kvm:~# sort -rn /proc/allocinfo 127664128 31168 mm/page_ext.c:270 func:alloc_page_ext 56373248 4737 mm/slub.c:2259 func:alloc_slab_page 14880768 3633 mm/readahead.c:247 func:page_cache_ra_unbounded 14417920 3520 mm/mm_init.c:2530 func:alloc_large_system_hash 13377536 234 block/blk-mq.c:3421 func:blk_mq_alloc_rqs 11718656 2861 mm/filemap.c:1919 func:__filemap_get_folio 9192960 2800 kernel/fork.c:307 func:alloc_thread_stack_node 4206592 4 net/netfilter/nf_conntrack_core.c:2567 func:nf_ct_alloc_hashtable 4136960 1010 drivers/staging/ctagmod/ctagmod.c:20 [ctagmod] func:ctagmod_start 3940352 962 mm/memory.c:4214 func:alloc_anon_folio 2894464 22613 fs/kernfs/dir.c:615 func:__kernfs_new_node ... Usage: kconfig options: - CONFIG_MEM_ALLOC_PROFILING - CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT - CONFIG_MEM_ALLOC_PROFILING_DEBUG adds warnings for allocations that weren't accounted because of a missing annotation sysctl: /proc/sys/vm/mem_profiling Runtime info: /proc/allocinfo Notes: [1]: Overhead To measure the overhead we are comparing the following configurations: (1) Baseline with CONFIG_MEMCG_KMEM=n (2) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n) (3) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y) (4) Enabled at runtime (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n && /proc/sys/vm/mem_profiling=1) (5) Baseline with CONFIG_MEMCG_KMEM=y && allocating with __GFP_ACCOUNT (6) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n) && CONFIG_MEMCG_KMEM=y (7) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y) && CONFIG_MEMCG_KMEM=y Performance overhead: To evaluate performance we implemented an in-kernel test executing multiple get_free_page/free_page and kmalloc/kfree calls with allocation sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU affinity set to a specific CPU to minimize the noise. Below are results from running the test on Ubuntu 22.04.2 LTS with 6.8.0-rc1 kernel on 56 core Intel Xeon: kmalloc pgalloc (1 baseline) 6.764s 16.902s (2 default disabled) 6.793s (+0.43%) 17.007s (+0.62%) (3 default enabled) 7.197s (+6.40%) 23.666s (+40.02%) (4 runtime enabled) 7.405s (+9.48%) 23.901s (+41.41%) (5 memcg) 13.388s (+97.94%) 48.460s (+186.71%) (6 def disabled+memcg) 13.332s (+97.10%) 48.105s (+184.61%) (7 def enabled+memcg) 13.446s (+98.78%) 54.963s (+225.18%) Memory overhead: Kernel size: text data bss dec diff (1) 26515311 18890222 17018880 62424413 (2) 26524728 19423818 16740352 62688898 264485 (3) 26524724 19423818 16740352 62688894 264481 (4) 26524728 19423818 16740352 62688898 264485 (5) 26541782 18964374 16957440 62463596 39183 Memory consumption on a 56 core Intel CPU with 125GB of memory: Code tags: 192 kB PageExts: 262144 kB (256MB) SlabExts: 9876 kB (9.6MB) PcpuExts: 512 kB (0.5MB) Total overhead is 0.2% of total memory. Benchmarks: Hackbench tests run 100 times: hackbench -s 512 -l 200 -g 15 -f 25 -P baseline disabled profiling enabled profiling avg 0.3543 0.3559 (+0.0016) 0.3566 (+0.0023) stdev 0.0137 0.0188 0.0077 hackbench -l 10000 baseline disabled profiling enabled profiling avg 6.4218 6.4306 (+0.0088) 6.5077 (+0.0859) stdev 0.0933 0.0286 0.0489 stress-ng tests: stress-ng --class memory --seq 4 -t 60 stress-ng --class cpu --seq 4 -t 60 Results posted at: https://evilpiepirate.org/~kent/memalloc_prof_v4_stress-ng/ [2] https://lore.kernel.org/all/20240306182440.2003814-1-surenb@google.com/ This patch (of 37): The next patch drops vmalloc.h from a system header in order to fix a circular dependency; this adds it to all the files that were pulling it in implicitly. [kent.overstreet@linux.dev: fix arch/alpha/lib/memcpy.c] Link: https://lkml.kernel.org/r/20240327002152.3339937-1-kent.overstreet@linux.dev [surenb@google.com: fix arch/x86/mm/numa_32.c] Link: https://lkml.kernel.org/r/20240402180933.1663992-1-surenb@google.com [kent.overstreet@linux.dev: a few places were depending on sizes.h] Link: https://lkml.kernel.org/r/20240404034744.1664840-1-kent.overstreet@linux.dev [arnd@arndb.de: fix mm/kasan/hw_tags.c] Link: https://lkml.kernel.org/r/20240404124435.3121534-1-arnd@kernel.org [surenb@google.com: fix arc build] Link: https://lkml.kernel.org/r/20240405225115.431056-1-surenb@google.com Link: https://lkml.kernel.org/r/20240321163705.3067592-1-surenb@google.com Link: https://lkml.kernel.org/r/20240321163705.3067592-2-surenb@google.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Tested-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-04-25 20:55:49 -07:00
Jakub Kicinski	1cedb16b94	Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== net: intel: start The Great Code Dedup + Page Pool for iavf Alexander Lobakin says: Here's a two-shot: introduce {,Intel} Ethernet common library (libeth and libie) and switch iavf to Page Pool. Details are in the commit messages; here's a summary: Not a secret there's a ton of code duplication between two and more Intel ethernet modules. Before introducing new changes, which would need to be copied over again, start decoupling the already existing duplicate functionality into a new module, which will be shared between several Intel Ethernet drivers. The first name that came to my mind was "libie" -- "Intel Ethernet common library". Also this sounds like "lovelie" (-> one word, no "lib I E" pls) and can be expanded as "lib Internet Explorer" :P The "generic", pure-software part is placed separately, so that it can be easily reused in any driver by any vendor without linking to the Intel pre-200G guts. In a few words, it's something any modern driver does the same way, but nobody moved it level up (yet). The series is only the beginning. From now on, adding every new feature or doing any good driver refactoring will remove much more lines than add for quite some time. There's a basic roadmap with some deduplications planned already, not speaking of that touching every line now asks: "can I share this?". The final destination is very ambitious: have only one unified driver for at least i40e, ice, iavf, and idpf with a struct ops for each generation. That's never gonna happen, right? But you still can at least try. PP conversion for iavf lands within the same series as these two are tied closely. libie will support Page Pool model only, so that a driver can't use much of the lib until it's converted. iavf is only the example, the rest will eventually be converted soon on a per-driver basis. That is when it gets really interesting. Stay tech. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: MAINTAINERS: add entry for libeth and libie iavf: switch to Page Pool iavf: pack iavf_ring more efficiently libeth: add Rx buffer management page_pool: add DMA-sync-for-CPU inline helper page_pool: constify some read-only function arguments slab: introduce kvmalloc_array_node() and kvcalloc_node() iavf: drop page splitting and recycling iavf: kill "legacy-rx" for good net: intel: introduce {, Intel} Ethernet common library ==================== Link: https://lore.kernel.org/r/20240424203559.3420468-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-25 20:00:54 -07:00
Asbjørn Sloth Tønnesen	8c65e27b42	net: lan966x: flower: check for unsupported control flags Use flow_rule_is_supp_control_flags() to reject filters with unsupported control flags. In case any unsupported control flags are masked, flow_rule_is_supp_control_flags() sets a NL extended error message, and we return -EOPNOTSUPP. Only compile-tested. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://lore.kernel.org/r/20240424125347.461995-4-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-25 19:36:37 -07:00
Asbjørn Sloth Tønnesen	12b8e129c4	net: lan966x: flower: rename goto in lan966x_tc_flower_handler_control_usage() Rename goto label, as the error message is specific to the fragment flags. Only compile-tested. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://lore.kernel.org/r/20240424125347.461995-3-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-25 19:36:37 -07:00
Asbjørn Sloth Tønnesen	505ccf890c	net: lan966x: flower: add extack to lan966x_tc_flower_handler_control_usage() Define extack locally, to reduce line lengths and aid future users. Only compile-tested. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://lore.kernel.org/r/20240424125347.461995-2-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-25 19:36:37 -07:00
Asbjørn Sloth Tønnesen	8ef631e9c9	net: sparx5: flower: check for unsupported control flags Use flow_rule_is_supp_control_flags() to reject filters with unsupported control flags. In case any unsupported control flags are masked, flow_rule_is_supp_control_flags() sets a NL extended error message, and we return -EOPNOTSUPP. Only compile-tested. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Tested-by: Daniel Machon <daniel.machon@microchip.com> Link: https://lore.kernel.org/r/20240424121632.459022-5-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-25 19:35:08 -07:00
Asbjørn Sloth Tønnesen	b92eb1ac13	net: sparx5: flower: remove goto in sparx5_tc_flower_handler_control_usage() Remove goto, as it's only used once, and the error message is specific to that context. Only compile tested. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Tested-by: Daniel Machon <daniel.machon@microchip.com> Link: https://lore.kernel.org/r/20240424121632.459022-4-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-25 19:35:08 -07:00
Asbjørn Sloth Tønnesen	8cd1b6c0bf	net: sparx5: flower: add extack to sparx5_tc_flower_handler_control_usage() Define extack locally, to reduce line lengths and aid future users. Only compile tested. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Tested-by: Daniel Machon <daniel.machon@microchip.com> Link: https://lore.kernel.org/r/20240424121632.459022-3-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-25 19:35:08 -07:00
Asbjørn Sloth Tønnesen	bcf303c62c	net: sparx5: flower: only do lookup if fragment flags are set The fragment lookup should only be performed, when at least one of the fragment flags are set. This change was deliberately not included in commit `68aba00483` ("net: sparx5: flower: fix fragment flags handling") as it's only needed for future proffing the code, since "mask" is currently only set in conjunction with the fragment flags. (The 3rd flag FLOW_DIS_ENCAPSULATION is only used with "key") Only compile tested. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Tested-by: Daniel Machon <daniel.machon@microchip.com> Link: https://lore.kernel.org/r/20240424121632.459022-2-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-25 19:35:08 -07:00
Bui Quang Minh	f299ee709f	octeontx2-af: avoid off-by-one read from userspace We try to access count + 1 byte from userspace with memdup_user(buffer, count + 1). However, the userspace only provides buffer of count bytes and only these count bytes are verified to be okay to access. To ensure the copied buffer is NUL terminated, we use memdup_user_nul instead. Fixes: `3a2eb515d1` ("octeontx2-af: Fix an off by one in rvu_dbg_qsize_write()") Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Link: https://lore.kernel.org/r/20240424-fix-oob-read-v2-6-f1f1b53a10f4@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-25 19:23:51 -07:00
Bui Quang Minh	8c34096c7f	bna: ensure the copied buf is NUL terminated Currently, we allocate a nbytes-sized kernel buffer and copy nbytes from userspace to that buffer. Later, we use sscanf on this buffer but we don't ensure that the string is terminated inside the buffer, this can lead to OOB read when using sscanf. Fix this issue by using memdup_user_nul instead of memdup_user. Fixes: `7afc5dbde0` ("bna: Add debugfs interface.") Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Link: https://lore.kernel.org/r/20240424-fix-oob-read-v2-2-f1f1b53a10f4@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-25 19:23:12 -07:00
Bui Quang Minh	666854ea9c	ice: ensure the copied buf is NUL terminated Currently, we allocate a count-sized kernel buffer and copy count bytes from userspace to that buffer. Later, we use sscanf on this buffer but we don't ensure that the string is terminated inside the buffer, this can lead to OOB read when using sscanf. Fix this issue by using memdup_user_nul instead of memdup_user. Fixes: `96a9a9341c` ("ice: configure FW logging") Fixes: `73671c3162` ("ice: enable FW logging") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Link: https://lore.kernel.org/r/20240424-fix-oob-read-v2-1-f1f1b53a10f4@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-25 19:23:11 -07:00
Simon Horman	d896a37437	net: sparx5: Correct spelling in comments Correct spelling in comments, as flagged by codespell. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Link: https://lore.kernel.org/r/20240424-lan743x-confirm-v2-4-f0480542e39f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-25 19:13:26 -07:00
Simon Horman	49c6e0a859	net: encx24j600: Correct spelling in comments Correct spelling in comments, as flagged by codespell. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Link: https://lore.kernel.org/r/20240424-lan743x-confirm-v2-3-f0480542e39f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-25 19:13:26 -07:00
Simon Horman	896e47f5f4	net: lan966x: Correct spelling in comments Correct spelling in comments, as flagged by codespell. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://lore.kernel.org/r/20240424-lan743x-confirm-v2-2-f0480542e39f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-25 19:13:26 -07:00
Simon Horman	632c9550b9	net: lan743x: Correct spelling in comments Correct spelling in comments, as flagged by codespell. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Link: https://lore.kernel.org/r/20240424-lan743x-confirm-v2-1-f0480542e39f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-25 19:13:26 -07:00
Jakub Kicinski	2bd87951de	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/ti/icssg/icssg_prueth.c net/mac80211/chan.c `89884459a0` ("wifi: mac80211: fix idle calculation with multi-link") `87f5500285` ("wifi: mac80211: simplify ieee80211_assign_link_chanctx()") https://lore.kernel.org/all/20240422105623.7b1fbda2@canb.auug.org.au/ net/unix/garbage.c `1971d13ffa` ("af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().") `4090fa373f` ("af_unix: Replace garbage collection algorithm.") drivers/net/ethernet/ti/icssg/icssg_prueth.c drivers/net/ethernet/ti/icssg/icssg_common.c `4dcd0e83ea` ("net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns()") `e2dc7bfd67` ("net: ti: icssg-prueth: Move common functions into a separate file") No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-25 12:41:37 -07:00
Damien Le Moal	935d5b33ce	net: wangxun: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY Use the macro PCI_IRQ_INTX instead of the deprecated PCI_IRQ_LEGACY macro. Link: https://lore.kernel.org/r/20240325070944.3600338-19-dlemoal@kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2024-04-25 12:53:32 -05:00
Damien Le Moal	b14ed9867d	r8169: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY Use the macro PCI_IRQ_INTX instead of the deprecated PCI_IRQ_LEGACY macro. Link: https://lore.kernel.org/r/20240325070944.3600338-18-dlemoal@kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Heiner Kallweit <hkallweit1@gmail.com>	2024-04-25 12:53:32 -05:00
Damien Le Moal	e0ff0ff74b	net: alx: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY Use the macro PCI_IRQ_INTX instead of the deprecated PCI_IRQ_LEGACY macro. Link: https://lore.kernel.org/r/20240325070944.3600338-17-dlemoal@kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2024-04-25 12:53:31 -05:00
Damien Le Moal	ff9b5e1477	net: atlantic: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY Use the macro PCI_IRQ_INTX instead of the deprecated PCI_IRQ_LEGACY macro. To be consistent with this change, rename AQ_HW_IRQ_LEGACY and AQ_CFG_FORCE_LEGACY_INT to AQ_HW_IRQ_INTX and AQ_CFG_FORCE_INTX. Link: https://lore.kernel.org/r/20240325070944.3600338-16-dlemoal@kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2024-04-25 12:53:31 -05:00
Damien Le Moal	5b44c2a705	net: amd-xgbe: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY Use the macro PCI_IRQ_INTX instead of the deprecated PCI_IRQ_LEGACY macro. Link: https://lore.kernel.org/r/20240325070944.3600338-15-dlemoal@kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2024-04-25 12:53:31 -05:00
Peter Münster	e3eb7dd47b	net: b44: set pause params only when interface is up b44_free_rings() accesses b44::rx_buffers (and ::tx_buffers) unconditionally, but b44::rx_buffers is only valid when the device is up (they get allocated in b44_open(), and deallocated again in b44_close()), any other time these are just a NULL pointers. So if you try to change the pause params while the network interface is disabled/administratively down, everything explodes (which likely netifd tries to do). Link: https://github.com/openwrt/openwrt/issues/13789 Fixes: `1da177e4c3` (Linux-2.6.12-rc2) Cc: stable@vger.kernel.org Reported-by: Peter Münster <pm@a16n.net> Suggested-by: Jonas Gorski <jonas.gorski@gmail.com> Signed-off-by: Vaclav Svoboda <svoboda@neng.cz> Tested-by: Peter Münster <pm@a16n.net> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Peter Münster <pm@a16n.net> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/87y192oolj.fsf@a16n.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-25 08:34:18 -07:00
Geert Uytterhoeven	0c81ea5a8e	net: ravb: Fix registered interrupt names As interrupts are now requested from ravb_probe(), before calling register_netdev(), ndev->name still contains the template "eth%d", leading to funny names in /proc/interrupts. E.g. on R-Car E3: 89: 0 0 GICv2 93 Level eth%d:ch22:multi 90: 0 3 GICv2 95 Level eth%d:ch24:emac 91: 0 23484 GICv2 71 Level eth%d:ch0:rx_be 92: 0 0 GICv2 72 Level eth%d:ch1:rx_nc 93: 0 13735 GICv2 89 Level eth%d:ch18:tx_be 94: 0 0 GICv2 90 Level eth%d:ch19:tx_nc Worse, on platforms with multiple RAVB instances (e.g. R-Car V4H), all interrupts have similar names. Fix this by using the device name instead, like is done in several other drivers: 89: 0 0 GICv2 93 Level e6800000.ethernet:ch22:multi 90: 0 1 GICv2 95 Level e6800000.ethernet:ch24:emac 91: 0 28578 GICv2 71 Level e6800000.ethernet:ch0:rx_be 92: 0 0 GICv2 72 Level e6800000.ethernet:ch1:rx_nc 93: 0 14044 GICv2 89 Level e6800000.ethernet:ch18:tx_be 94: 0 0 GICv2 90 Level e6800000.ethernet:ch19:tx_nc Rename the local variable dev_name, as it shadows the dev_name() function, and pre-initialize it, to simplify the code. Fixes: `32f012b8c0` ("net: ravb: Move getting/requesting IRQs in the probe() method") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Reviewed-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> Tested-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> # on RZ/G3S Link: https://lore.kernel.org/r/cde67b68adf115b3cf0b44c32334ae00b2fbb321.1713944647.git.geert+renesas@glider.be Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-25 08:30:54 -07:00
Su Hui	6e965eba43	octeontx2-af: fix the double free in rvu_npc_freemem() Clang static checker(scan-build) warning： drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c:line 2184, column 2 Attempt to free released memory. npc_mcam_rsrcs_deinit() has released 'mcam->counters.bmap'. Deleted this redundant kfree() to fix this double free problem. Fixes: `dd78428786` ("octeontx2-af: Add new devlink param to configure maximum usable NIX block LFs") Signed-off-by: Su Hui <suhui@nfschina.com> Reviewed-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Hariprasad Kelam <hkelam@marvell.com> Link: https://lore.kernel.org/r/20240424022724.144587-1-suhui@nfschina.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-25 08:28:36 -07:00
Jason Reeder	1b9e743e92	net: ethernet: ti: am65-cpts: Fix PTPv1 message type on TX packets The CPTS, by design, captures the messageType (Sync, Delay_Req, etc.) field from the second nibble of the PTP header which is defined in the PTPv2 (1588-2008) specification. In the PTPv1 (1588-2002) specification the first two bytes of the PTP header are defined as the versionType which is always 0x0001. This means that any PTPv1 packets that are tagged for TX timestamping by the CPTS will have their messageType set to 0x0 which corresponds to a Sync message type. This causes issues when a PTPv1 stack is expecting a Delay_Req (messageType: 0x1) timestamp that never appears. Fix this by checking if the ptp_class of the timestamped TX packet is PTP_CLASS_V1 and then matching the PTP sequence ID to the stored sequence ID in the skb->cb data structure. If the sequence IDs match and the packet is of type PTPv1 then there is a chance that the messageType has been incorrectly stored by the CPTS so overwrite the messageType stored by the CPTS with the messageType from the skb->cb data structure. This allows the PTPv1 stack to receive TX timestamps for Delay_Req packets which are necessary to lock onto a PTP Leader. Signed-off-by: Jason Reeder <jreeder@ti.com> Signed-off-by: Ravi Gunasekaran <r-gunasekaran@ti.com> Tested-by: Ed Trexel <ed.trexel@hp.com> Fixes: `f6bd59526c` ("net: ethernet: ti: introduce am654 common platform time sync driver") Link: https://lore.kernel.org/r/20240424071626.32558-1-r-gunasekaran@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-25 08:20:55 -07:00
Jacob Keller	96fdd1f6b4	ice: fix LAG and VF lock dependency in ice_reset_vf() `9f74a3dfcf` ("ice: Fix VF Reset paths when interface in a failed over aggregate"), the ice driver has acquired the LAG mutex in ice_reset_vf(). The commit placed this lock acquisition just prior to the acquisition of the VF configuration lock. If ice_reset_vf() acquires the configuration lock via the ICE_VF_RESET_LOCK flag, this could deadlock with ice_vc_cfg_qs_msg() because it always acquires the locks in the order of the VF configuration lock and then the LAG mutex. Lockdep reports this violation almost immediately on creating and then removing 2 VF: ====================================================== WARNING: possible circular locking dependency detected 6.8.0-rc6 #54 Tainted: G W O ------------------------------------------------------ kworker/60:3/6771 is trying to acquire lock: ff40d43e099380a0 (&vf->cfg_lock){+.+.}-{3:3}, at: ice_reset_vf+0x22f/0x4d0 [ice] but task is already holding lock: ff40d43ea1961210 (&pf->lag_mutex){+.+.}-{3:3}, at: ice_reset_vf+0xb7/0x4d0 [ice] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&pf->lag_mutex){+.+.}-{3:3}: __lock_acquire+0x4f8/0xb40 lock_acquire+0xd4/0x2d0 __mutex_lock+0x9b/0xbf0 ice_vc_cfg_qs_msg+0x45/0x690 [ice] ice_vc_process_vf_msg+0x4f5/0x870 [ice] __ice_clean_ctrlq+0x2b5/0x600 [ice] ice_service_task+0x2c9/0x480 [ice] process_one_work+0x1e9/0x4d0 worker_thread+0x1e1/0x3d0 kthread+0x104/0x140 ret_from_fork+0x31/0x50 ret_from_fork_asm+0x1b/0x30 -> #0 (&vf->cfg_lock){+.+.}-{3:3}: check_prev_add+0xe2/0xc50 validate_chain+0x558/0x800 __lock_acquire+0x4f8/0xb40 lock_acquire+0xd4/0x2d0 __mutex_lock+0x9b/0xbf0 ice_reset_vf+0x22f/0x4d0 [ice] ice_process_vflr_event+0x98/0xd0 [ice] ice_service_task+0x1cc/0x480 [ice] process_one_work+0x1e9/0x4d0 worker_thread+0x1e1/0x3d0 kthread+0x104/0x140 ret_from_fork+0x31/0x50 ret_from_fork_asm+0x1b/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&pf->lag_mutex); lock(&vf->cfg_lock); lock(&pf->lag_mutex); lock(&vf->cfg_lock); * DEADLOCK * 4 locks held by kworker/60:3/6771: #0: ff40d43e05428b38 ((wq_completion)ice){+.+.}-{0:0}, at: process_one_work+0x176/0x4d0 #1: ff50d06e05197e58 ((work_completion)(&pf->serv_task)){+.+.}-{0:0}, at: process_one_work+0x176/0x4d0 #2: ff40d43ea1960e50 (&pf->vfs.table_lock){+.+.}-{3:3}, at: ice_process_vflr_event+0x48/0xd0 [ice] #3: ff40d43ea1961210 (&pf->lag_mutex){+.+.}-{3:3}, at: ice_reset_vf+0xb7/0x4d0 [ice] stack backtrace: CPU: 60 PID: 6771 Comm: kworker/60:3 Tainted: G W O 6.8.0-rc6 #54 Hardware name: Workqueue: ice ice_service_task [ice] Call Trace: <TASK> dump_stack_lvl+0x4a/0x80 check_noncircular+0x12d/0x150 check_prev_add+0xe2/0xc50 ? save_trace+0x59/0x230 ? add_chain_cache+0x109/0x450 validate_chain+0x558/0x800 __lock_acquire+0x4f8/0xb40 ? lockdep_hardirqs_on+0x7d/0x100 lock_acquire+0xd4/0x2d0 ? ice_reset_vf+0x22f/0x4d0 [ice] ? lock_is_held_type+0xc7/0x120 __mutex_lock+0x9b/0xbf0 ? ice_reset_vf+0x22f/0x4d0 [ice] ? ice_reset_vf+0x22f/0x4d0 [ice] ? rcu_is_watching+0x11/0x50 ? ice_reset_vf+0x22f/0x4d0 [ice] ice_reset_vf+0x22f/0x4d0 [ice] ? process_one_work+0x176/0x4d0 ice_process_vflr_event+0x98/0xd0 [ice] ice_service_task+0x1cc/0x480 [ice] process_one_work+0x1e9/0x4d0 worker_thread+0x1e1/0x3d0 ? __pfx_worker_thread+0x10/0x10 kthread+0x104/0x140 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x31/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> To avoid deadlock, we must acquire the LAG mutex only after acquiring the VF configuration lock. Fix the ice_reset_vf() to acquire the LAG mutex only after we either acquire or check that the VF configuration lock is held. Fixes: `9f74a3dfcf` ("ice: Fix VF Reset paths when interface in a failed over aggregate") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Dave Ertman <david.m.ertman@intel.com> Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Tested-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20240423182723.740401-5-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-25 08:20:55 -07:00
Sudheer Mogilappagari	54976cf58d	iavf: Fix TC config comparison with existing adapter TC config Same number of TCs doesn't imply that underlying TC configs are same. The config could be different due to difference in number of queues in each TC. Add utility function to determine if TC configs are same. Fixes: `d5b33d0244` ("i40evf: add ndo_setup_tc callback to i40evf") Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Mineri Bhange <minerix.bhange@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20240423182723.740401-4-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-25 08:20:55 -07:00
Erwan Velu	ef3c313119	i40e: Report MFS in decimal base instead of hex If the MFS is set below the default (0x2600), a warning message is reported like the following : MFS for port 1 has been set below the default: 600 This message is a bit confusing as the number shown here (600) is in fact an hexa number: 0x600 = 1536 Without any explicit "0x" prefix, this message is read like the MFS is set to 600 bytes. MFS, as per MTUs, are usually expressed in decimal base. This commit reports both current and default MFS values in decimal so it's less confusing for end-users. A typical warning message looks like the following : MFS for port 1 (1536) has been set below the default (9728) Signed-off-by: Erwan Velu <e.velu@criteo.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Fixes: `3a2c6ced90` ("i40e: Add a check to see if MFS is set") Link: https://lore.kernel.org/r/20240423182723.740401-3-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-25 08:20:55 -07:00
Sindhu Devale	2cc7d15055	i40e: Do not use WQ_MEM_RECLAIM flag for workqueue Issue reported by customer during SRIOV testing, call trace: When both i40e and the i40iw driver are loaded, a warning in check_flush_dependency is being triggered. This seems to be because of the i40e driver workqueue is allocated with the WQ_MEM_RECLAIM flag, and the i40iw one is not. Similar error was encountered on ice too and it was fixed by removing the flag. Do the same for i40e too. [Feb 9 09:08] ------------[ cut here ]------------ [ +0.000004] workqueue: WQ_MEM_RECLAIM i40e:i40e_service_task [i40e] is flushing !WQ_MEM_RECLAIM infiniband:0x0 [ +0.000060] WARNING: CPU: 0 PID: 937 at kernel/workqueue.c:2966 check_flush_dependency+0x10b/0x120 [ +0.000007] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_timer snd_seq_device snd soundcore nls_utf8 cifs cifs_arc4 nls_ucs2_utils rdma_cm iw_cm ib_cm cifs_md4 dns_resolver netfs qrtr rfkill sunrpc vfat fat intel_rapl_msr intel_rapl_common irdma intel_uncore_frequency intel_uncore_frequency_common ice ipmi_ssif isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp gnss coretemp ib_uverbs rapl intel_cstate ib_core iTCO_wdt iTCO_vendor_support acpi_ipmi mei_me ipmi_si intel_uncore ioatdma i2c_i801 joydev pcspkr mei ipmi_devintf lpc_ich intel_pch_thermal i2c_smbus ipmi_msghandler acpi_power_meter acpi_pad xfs libcrc32c ast sd_mod drm_shmem_helper t10_pi drm_kms_helper sg ixgbe drm i40e ahci crct10dif_pclmul libahci crc32_pclmul igb crc32c_intel libata ghash_clmulni_intel i2c_algo_bit mdio dca wmi dm_mirror dm_region_hash dm_log dm_mod fuse [ +0.000050] CPU: 0 PID: 937 Comm: kworker/0:3 Kdump: loaded Not tainted 6.8.0-rc2-Feb-net_dev-Qiueue-00279-gbd43c5687e05 #1 [ +0.000003] Hardware name: Intel Corporation S2600BPB/S2600BPB, BIOS SE5C620.86B.02.01.0013.121520200651 12/15/2020 [ +0.000001] Workqueue: i40e i40e_service_task [i40e] [ +0.000024] RIP: 0010:check_flush_dependency+0x10b/0x120 [ +0.000003] Code: ff 49 8b 54 24 18 48 8d 8b b0 00 00 00 49 89 e8 48 81 c6 b0 00 00 00 48 c7 c7 b0 97 fa 9f c6 05 8a cc 1f 02 01 e8 35 b3 fd ff <0f> 0b e9 10 ff ff ff 80 3d 78 cc 1f 02 00 75 94 e9 46 ff ff ff 90 [ +0.000002] RSP: 0018:ffffbd294976bcf8 EFLAGS: 00010282 [ +0.000002] RAX: 0000000000000000 RBX: ffff94d4c483c000 RCX: 0000000000000027 [ +0.000001] RDX: ffff94d47f620bc8 RSI: 0000000000000001 RDI: ffff94d47f620bc0 [ +0.000001] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000ffff7fff [ +0.000001] R10: ffffbd294976bb98 R11: ffffffffa0be65e8 R12: ffff94c5451ea180 [ +0.000001] R13: ffff94c5ab5e8000 R14: ffff94c5c20b6e05 R15: ffff94c5f1330ab0 [ +0.000001] FS: 0000000000000000(0000) GS:ffff94d47f600000(0000) knlGS:0000000000000000 [ +0.000002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000001] CR2: 00007f9e6f1fca70 CR3: 0000000038e20004 CR4: 00000000007706f0 [ +0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ +0.000001] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ +0.000001] PKRU: 55555554 [ +0.000001] Call Trace: [ +0.000001] <TASK> [ +0.000002] ? __warn+0x80/0x130 [ +0.000003] ? check_flush_dependency+0x10b/0x120 [ +0.000002] ? report_bug+0x195/0x1a0 [ +0.000005] ? handle_bug+0x3c/0x70 [ +0.000003] ? exc_invalid_op+0x14/0x70 [ +0.000002] ? asm_exc_invalid_op+0x16/0x20 [ +0.000006] ? check_flush_dependency+0x10b/0x120 [ +0.000002] ? check_flush_dependency+0x10b/0x120 [ +0.000002] __flush_workqueue+0x126/0x3f0 [ +0.000015] ib_cache_cleanup_one+0x1c/0xe0 [ib_core] [ +0.000056] __ib_unregister_device+0x6a/0xb0 [ib_core] [ +0.000023] ib_unregister_device_and_put+0x34/0x50 [ib_core] [ +0.000020] i40iw_close+0x4b/0x90 [irdma] [ +0.000022] i40e_notify_client_of_netdev_close+0x54/0xc0 [i40e] [ +0.000035] i40e_service_task+0x126/0x190 [i40e] [ +0.000024] process_one_work+0x174/0x340 [ +0.000003] worker_thread+0x27e/0x390 [ +0.000001] ? __pfx_worker_thread+0x10/0x10 [ +0.000002] kthread+0xdf/0x110 [ +0.000002] ? __pfx_kthread+0x10/0x10 [ +0.000002] ret_from_fork+0x2d/0x50 [ +0.000003] ? __pfx_kthread+0x10/0x10 [ +0.000001] ret_from_fork_asm+0x1b/0x30 [ +0.000004] </TASK> [ +0.000001] ---[ end trace 0000000000000000 ]--- Fixes: `4d5957cbde` ("i40e: remove WQ_UNBOUND and the task limit of our workqueue") Signed-off-by: Sindhu Devale <sindhu.devale@intel.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Robert Ganzynkowicz <robert.ganzynkowicz@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20240423182723.740401-2-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-25 08:20:55 -07:00
Rahul Rameshbabu	39d26a8f2e	net/mlx5e: Advertise mlx5 ethernet driver updates sk_buff md_dst for MACsec mlx5 Rx flow steering and CQE handling enable the driver to be able to update an skb's md_dst attribute as MACsec when MACsec traffic arrives when a device is configured for offloading. Advertise this to the core stack to take advantage of this capability. Cc: stable@vger.kernel.org Fixes: `b7c9400cbc` ("net/mlx5e: Implement MACsec Rx data path using MACsec skb_metadata_dst") Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Benjamin Poirier <bpoirier@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/20240423181319.115860-5-rrameshbabu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-25 08:20:54 -07:00
Dan Carpenter	4dcd0e83ea	net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns() The rx_chn->irq[] array is unsigned int but it should be signed for the error handling to work. Also if k3_udma_glue_rx_get_irq() returns zero then we should return -ENXIO instead of success. Fixes: `128d5874c0` ("net: ti: icssg-prueth: Add ICSSG ethernet driver") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Link: https://lore.kernel.org/r/05282415-e7f4-42f3-99f8-32fde8f30936@moroto.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-25 08:20:54 -07:00
Satish Kharat	369dac68d2	enic: Replace hardcoded values for vnic descriptor by defines Replace the hardcoded values used in the calculations for vnic descriptors and rings with defines. Minor code cleanup. Signed-off-by: Satish Kharat <satishkh@cisco.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-25 10:36:30 +01:00

... 3 4 5 6 7 ...

49992 Commits