linux

mirror of https://github.com/raspberrypi/linux.git synced 2025-12-06 01:49:46 +00:00

Author	SHA1	Message	Date
Paolo Abeni	1bba3f219c	mptcp: do not fallback when OoO is present In case of DSS corruption, the MPTCP protocol tries to avoid the subflow reset if fallback is possible. Such corruptions happen in the receive path; to ensure fallback is possible the stack additionally needs to check for OoO data, otherwise the fallback will break the data stream. Fixes: `e32d262c89` ("mptcp: handle consistently DSS corruption") Cc: stable@vger.kernel.org Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/598 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-4-806d3781c95f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-19 20:07:14 -08:00
Paolo Abeni	17393fa7b7	mptcp: fix premature close in case of fallback I'm observing very frequent self-tests failures in case of fallback when running on a CONFIG_PREEMPT kernel. The root cause is that subflow_sched_work_if_closed() closes any subflow as soon as it is half-closed and has no incoming data pending. That works well for regular subflows - MPTCP needs bi-directional connectivity to operate on a given subflow - but for fallback socket is race prone. When TCP peer closes the connection before the MPTCP one, subflow_sched_work_if_closed() will schedule the MPTCP worker to gracefully close the subflow, and shortly after will do another schedule to inject and process a dummy incoming DATA_FIN. On CONFIG_PREEMPT kernel, the MPTCP worker can kick-in and close the fallback subflow before subflow_sched_work_if_closed() is able to create the dummy DATA_FIN, unexpectedly interrupting the transfer. Address the issue explicitly avoiding closing fallback subflows on when the peer is only half-closed. Note that, when the subflow is able to create the DATA_FIN before the worker invocation, the worker will change the msk state before trying to close the subflow and will skip the latter operation as the msk will not match anymore the precondition in __mptcp_close_subflow(). Fixes: `f09b0ad55a` ("mptcp: close subflow when receiving TCP+FIN") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-3-806d3781c95f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-19 20:07:14 -08:00
Paolo Abeni	4f102d747c	mptcp: avoid unneeded subflow-level drops The rcv window is shared among all the subflows. Currently, MPTCP sync the TCP-level rcv window with the MPTCP one at tcp_transmit_skb() time. The above means that incoming data may sporadically observe outdated TCP-level rcv window and being wrongly dropped by TCP. Address the issue checking for the edge condition before queuing the data at TCP level, and eventually syncing the rcv window as needed. Note that the issue is actually present from the very first MPTCP implementation, but backports older than the blamed commit below will range from impossible to useless. Before: $ nstat -n; sleep 1; nstat -z TcpExtBeyondWindow TcpExtBeyondWindow 14 0.0 After: $ nstat -n; sleep 1; nstat -z TcpExtBeyondWindow TcpExtBeyondWindow 0 0.0 Fixes: `fa3fe2b150` ("mptcp: track window announced to peer") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-2-806d3781c95f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-19 20:07:14 -08:00
Paolo Abeni	5e15395f6d	mptcp: fix ack generation for fallback msk mptcp_cleanup_rbuf() needs to know the last most recent, mptcp-level rcv_wnd sent, and such information is tracked into the msk->old_wspace field, updated at ack transmission time by mptcp_write_options(). Fallback socket do not add any mptcp options, such helper is never invoked, and msk->old_wspace value remain stale. That in turn makes ack generation at recvmsg() time quite random. Address the issue ensuring mptcp_write_options() is invoked even for fallback sockets, and just update the needed info in such a case. The issue went unnoticed for a long time, as mptcp currently overshots the fallback socket receive buffer autotune significantly. It is going to change in the near future. Fixes: `e3859603ba` ("mptcp: better msk receive window updates") Cc: stable@vger.kernel.org Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/594 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Geliang Tang <geliang@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-1-806d3781c95f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-19 20:07:14 -08:00
Wei Fang	e31a11be41	net: phylink: add missing supported link modes for the fixed-link Pause, Asym_Pause and Autoneg bits are not set when pl->supported is initialized, so these link modes will not work for the fixed-link. This leads to a TCP performance degradation issue observed on the i.MX943 platform. The switch CPU port of i.MX943 is connected to an ENETC MAC, this link is a fixed link and the link speed is 2.5Gbps. And one of the switch user ports is the RGMII interface, and its link speed is 1Gbps. If the flow-control of the fixed link is not enabled, we can easily observe the iperf performance of TCP packets is very low. Because the inbound rate on the CPU port is greater than the outbound rate on the user port, the switch is prone to congestion, leading to the loss of some TCP packets and requiring multiple retransmissions. Solving this problem should be as simple as setting the Asym_Pause and Pause bits. The reason why the Autoneg bit needs to be set, Russell has gave a very good explanation in the thread [1], see below. "As the advertising and lp_advertising bitmasks have to be non-empty, and the swphy reports aneg capable, aneg complete, and AN enabled, then for consistency with that state, Autoneg should be set. This is how it was prior to the blamed commit." Fixes: `de7d3f87be` ("net: phylink: Use phy_caps_lookup for fixed-link configuration") Link: https://lore.kernel.org/aRjqLN8eQDIQfBjS@shell.armlinux.org.uk # [1] Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20251117102943.1862680-1-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-19 08:31:11 -08:00
Jakub Kicinski	106a67494c	Merge branch 'af_unix-fix-so_peek_off-bug-in-unix_stream_read_generic' Kuniyuki Iwashima says: ==================== af_unix: Fix SO_PEEK_OFF bug in unix_stream_read_generic(). Miao Wang reported a bug of SO_PEEK_OFF on AF_UNIX SOCK_STREAM socket. Patch 1 fixes the bug and Patch 2 adds a new selftest to cover the case. ==================== Link: https://patch.msgid.link/20251117174740.3684604-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-18 19:19:29 -08:00
Kuniyuki Iwashima	e1bb28bf13	selftest: af_unix: Add test for SO_PEEK_OFF. The test covers various cases to verify SO_PEEK_OFF behaviour for all AF_UNIX socket types. two_chunks_blocking and two_chunks_overlap_blocking reproduce the issue mentioned in the previous patch. Without the patch, the two tests fail: # RUN so_peek_off.stream.two_chunks_blocking ... # so_peek_off.c:121:two_chunks_blocking:Expected 'bbbb' == 'aaaabbbb'. # two_chunks_blocking: Test terminated by assertion # FAIL so_peek_off.stream.two_chunks_blocking not ok 3 so_peek_off.stream.two_chunks_blocking # RUN so_peek_off.stream.two_chunks_overlap_blocking ... # so_peek_off.c:159:two_chunks_overlap_blocking:Expected 'bbbb' == 'aaaabbbb'. # two_chunks_overlap_blocking: Test terminated by assertion # FAIL so_peek_off.stream.two_chunks_overlap_blocking not ok 5 so_peek_off.stream.two_chunks_overlap_blocking With the patch, all tests pass: # PASSED: 15 / 15 tests passed. # Totals: pass:15 fail:0 xfail:0 xpass:0 skip:0 error:0 Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251117174740.3684604-3-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-18 19:19:09 -08:00
Kuniyuki Iwashima	7bf3a476ce	af_unix: Read sk_peek_offset() again after sleeping in unix_stream_read_generic(). Miao Wang reported a bug of SO_PEEK_OFF on AF_UNIX SOCK_STREAM socket. The unexpected behaviour is triggered when the peek offset is larger than the recv queue and the thread is unblocked by new data. Let's assume a socket which has "aaaa" in the recv queue and the peek offset is 4. First, unix_stream_read_generic() reads the offset 4 and skips the skb(s) of "aaaa" with the code below: skip = max(sk_peek_offset(sk, flags), 0); /* @skip is 4. / do { ... while (skip >= unix_skb_len(skb)) { skip -= unix_skb_len(skb); ... skb = skb_peek_next(skb, &sk->sk_receive_queue); if (!skb) goto again; / @skip is 0. / } The thread jumps to the 'again' label and goes to sleep since new data has not arrived yet. Later, new data "bbbb" unblocks the thread, and the thread jumps to the 'redo:' label to restart the entire process from the first skb in the recv queue. do { ... redo: ... last = skb = skb_peek(&sk->sk_receive_queue); ... again: if (skb == NULL) { ... timeo = unix_stream_data_wait(sk, timeo, last, last_len, freezable); ... goto redo; / @skip is 0 !! */ However, the peek offset is not reset in the path. If the buffer size is 8, recv() will return "aaaabbbb" without skipping any data, and the final offset will be 12 (the original offset 4 + peeked skbs' length 8). After sleeping in unix_stream_read_generic(), we have to fetch the peek offset again. Let's move the redo label before mutex_lock(&u->iolock). Fixes: `9f389e3567` ("af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag") Reported-by: Miao Wang <shankerwangmiao@gmail.com> Closes: https://lore.kernel.org/netdev/3B969F90-F51F-4B9D-AB1A-994D9A54D460@gmail.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251117174740.3684604-2-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-18 19:19:09 -08:00
Pradyumn Rahar	d47515af6c	net/mlx5: Clean up only new IRQ glue on request_irq() failure The mlx5_irq_alloc() function can inadvertently free the entire rmap and end up in a crash[1] when the other threads tries to access this, when request_irq() fails due to exhausted IRQ vectors. This commit modifies the cleanup to remove only the specific IRQ mapping that was just added. This prevents removal of other valid mappings and ensures precise cleanup of the failed IRQ allocation's associated glue object. Note: This error is observed when both fwctl and rds configs are enabled. [1] mlx5_core 0000:05:00.0: Successfully registered panic handler for port 1 mlx5_core 0000:05:00.0: mlx5_irq_alloc:293:(pid 66740): Failed to request irq. err = -28 infiniband mlx5_0: mlx5_ib_test_wc:290:(pid 66740): Error -28 while trying to test write-combining support mlx5_core 0000:05:00.0: Successfully unregistered panic handler for port 1 mlx5_core 0000:06:00.0: Successfully registered panic handler for port 1 mlx5_core 0000:06:00.0: mlx5_irq_alloc:293:(pid 66740): Failed to request irq. err = -28 infiniband mlx5_0: mlx5_ib_test_wc:290:(pid 66740): Error -28 while trying to test write-combining support mlx5_core 0000:06:00.0: Successfully unregistered panic handler for port 1 mlx5_core 0000:03:00.0: mlx5_irq_alloc:293:(pid 28895): Failed to request irq. err = -28 mlx5_core 0000:05:00.0: mlx5_irq_alloc:293:(pid 28895): Failed to request irq. err = -28 general protection fault, probably for non-canonical address 0xe277a58fde16f291: 0000 [#1] SMP NOPTI RIP: 0010:free_irq_cpu_rmap+0x23/0x7d Call Trace: <TASK> ? show_trace_log_lvl+0x1d6/0x2f9 ? show_trace_log_lvl+0x1d6/0x2f9 ? mlx5_irq_alloc.cold+0x5d/0xf3 [mlx5_core] ? __die_body.cold+0x8/0xa ? die_addr+0x39/0x53 ? exc_general_protection+0x1c4/0x3e9 ? dev_vprintk_emit+0x5f/0x90 ? asm_exc_general_protection+0x22/0x27 ? free_irq_cpu_rmap+0x23/0x7d mlx5_irq_alloc.cold+0x5d/0xf3 [mlx5_core] irq_pool_request_vector+0x7d/0x90 [mlx5_core] mlx5_irq_request+0x2e/0xe0 [mlx5_core] mlx5_irq_request_vector+0xad/0xf7 [mlx5_core] comp_irq_request_pci+0x64/0xf0 [mlx5_core] create_comp_eq+0x71/0x385 [mlx5_core] ? mlx5e_open_xdpsq+0x11c/0x230 [mlx5_core] mlx5_comp_eqn_get+0x72/0x90 [mlx5_core] ? xas_load+0x8/0x91 mlx5_comp_irqn_get+0x40/0x90 [mlx5_core] mlx5e_open_channel+0x7d/0x3c7 [mlx5_core] mlx5e_open_channels+0xad/0x250 [mlx5_core] mlx5e_open_locked+0x3e/0x110 [mlx5_core] mlx5e_open+0x23/0x70 [mlx5_core] __dev_open+0xf1/0x1a5 __dev_change_flags+0x1e1/0x249 dev_change_flags+0x21/0x5c do_setlink+0x28b/0xcc4 ? __nla_parse+0x22/0x3d ? inet6_validate_link_af+0x6b/0x108 ? cpumask_next+0x1f/0x35 ? __snmp6_fill_stats64.constprop.0+0x66/0x107 ? __nla_validate_parse+0x48/0x1e6 __rtnl_newlink+0x5ff/0xa57 ? kmem_cache_alloc_trace+0x164/0x2ce rtnl_newlink+0x44/0x6e rtnetlink_rcv_msg+0x2bb/0x362 ? __netlink_sendskb+0x4c/0x6c ? netlink_unicast+0x28f/0x2ce ? rtnl_calcit.isra.0+0x150/0x146 netlink_rcv_skb+0x5f/0x112 netlink_unicast+0x213/0x2ce netlink_sendmsg+0x24f/0x4d9 __sock_sendmsg+0x65/0x6a ____sys_sendmsg+0x28f/0x2c9 ? import_iovec+0x17/0x2b ___sys_sendmsg+0x97/0xe0 __sys_sendmsg+0x81/0xd8 do_syscall_64+0x35/0x87 entry_SYSCALL_64_after_hwframe+0x6e/0x0 RIP: 0033:0x7fc328603727 Code: c3 66 90 41 54 41 89 d4 55 48 89 f5 53 89 fb 48 83 ec 10 e8 0b ed ff ff 44 89 e2 48 89 ee 89 df 41 89 c0 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 44 ed ff ff 48 RSP: 002b:00007ffe8eb3f1a0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007fc328603727 RDX: 0000000000000000 RSI: 00007ffe8eb3f1f0 RDI: 000000000000000d RBP: 00007ffe8eb3f1f0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000 R13: 0000000000000000 R14: 00007ffe8eb3f3c8 R15: 00007ffe8eb3f3bc </TASK> ---[ end trace f43ce73c3c2b13a2 ]--- RIP: 0010:free_irq_cpu_rmap+0x23/0x7d Code: 0f 1f 80 00 00 00 00 48 85 ff 74 6b 55 48 89 fd 53 66 83 7f 06 00 74 24 31 db 48 8b 55 08 0f b7 c3 48 8b 04 c2 48 85 c0 74 09 <8b> 38 31 f6 e8 c4 0a b8 ff 83 c3 01 66 3b 5d 06 72 de b8 ff ff ff RSP: 0018:ff384881640eaca0 EFLAGS: 00010282 RAX: e277a58fde16f291 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ff2335e2e20b3600 RSI: 0000000000000000 RDI: ff2335e2e20b3400 RBP: ff2335e2e20b3400 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 00000000ffffffe4 R12: ff384881640ead88 R13: ff2335c3760751e0 R14: ff2335e2e1672200 R15: ff2335c3760751f8 FS: 00007fc32ac22480(0000) GS:ff2335e2d6e00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f651ab54000 CR3: 00000029f1206003 CR4: 0000000000771ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Kernel panic - not syncing: Fatal exception Kernel Offset: 0x1dc00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) kvm-guest: disable async PF for cpu 0 Fixes: `3354822cde` ("net/mlx5: Use dynamic msix vectors allocation") Signed-off-by: Mohith Kumar Thummaluru<mohith.k.kumar.thummaluru@oracle.com> Tested-by: Mohith Kumar Thummaluru<mohith.k.kumar.thummaluru@oracle.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Signed-off-by: Pradyumn Rahar <pradyumn.rahar@oracle.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1763381768-1234998-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-18 18:47:09 -08:00
Eric Dumazet	426358d9be	mptcp: fix a race in mptcp_pm_del_add_timer() mptcp_pm_del_add_timer() can call sk_stop_timer_sync(sk, &entry->add_timer) while another might have free entry already, as reported by syzbot. Add RCU protection to fix this issue. Also change confusing add_timer variable with stop_timer boolean. syzbot report: BUG: KASAN: slab-use-after-free in __timer_delete_sync+0x372/0x3f0 kernel/time/timer.c:1616 Read of size 4 at addr ffff8880311e4150 by task kworker/1:1/44 CPU: 1 UID: 0 PID: 44 Comm: kworker/1:1 Not tainted syzkaller #0 PREEMPT_{RT,(full)} Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/02/2025 Workqueue: events mptcp_worker Call Trace: <TASK> dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:378 [inline] print_report+0xca/0x240 mm/kasan/report.c:482 kasan_report+0x118/0x150 mm/kasan/report.c:595 __timer_delete_sync+0x372/0x3f0 kernel/time/timer.c:1616 sk_stop_timer_sync+0x1b/0x90 net/core/sock.c:3631 mptcp_pm_del_add_timer+0x283/0x310 net/mptcp/pm.c:362 mptcp_incoming_options+0x1357/0x1f60 net/mptcp/options.c:1174 tcp_data_queue+0xca/0x6450 net/ipv4/tcp_input.c:5361 tcp_rcv_established+0x1335/0x2670 net/ipv4/tcp_input.c:6441 tcp_v4_do_rcv+0x98b/0xbf0 net/ipv4/tcp_ipv4.c:1931 tcp_v4_rcv+0x252a/0x2dc0 net/ipv4/tcp_ipv4.c:2374 ip_protocol_deliver_rcu+0x221/0x440 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x3bb/0x6f0 net/ipv4/ip_input.c:239 NF_HOOK+0x30c/0x3a0 include/linux/netfilter.h:318 NF_HOOK+0x30c/0x3a0 include/linux/netfilter.h:318 __netif_receive_skb_one_core net/core/dev.c:6079 [inline] __netif_receive_skb+0x143/0x380 net/core/dev.c:6192 process_backlog+0x31e/0x900 net/core/dev.c:6544 __napi_poll+0xb6/0x540 net/core/dev.c:7594 napi_poll net/core/dev.c:7657 [inline] net_rx_action+0x5f7/0xda0 net/core/dev.c:7784 handle_softirqs+0x22f/0x710 kernel/softirq.c:622 __do_softirq kernel/softirq.c:656 [inline] __local_bh_enable_ip+0x1a0/0x2e0 kernel/softirq.c:302 mptcp_pm_send_ack net/mptcp/pm.c:210 [inline] mptcp_pm_addr_send_ack+0x41f/0x500 net/mptcp/pm.c:-1 mptcp_pm_worker+0x174/0x320 net/mptcp/pm.c:1002 mptcp_worker+0xd5/0x1170 net/mptcp/protocol.c:2762 process_one_work kernel/workqueue.c:3263 [inline] process_scheduled_works+0xae1/0x17b0 kernel/workqueue.c:3346 worker_thread+0x8a0/0xda0 kernel/workqueue.c:3427 kthread+0x711/0x8a0 kernel/kthread.c:463 ret_from_fork+0x4bc/0x870 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 </TASK> Allocated by task 44: kasan_save_stack mm/kasan/common.c:56 [inline] kasan_save_track+0x3e/0x80 mm/kasan/common.c:77 poison_kmalloc_redzone mm/kasan/common.c:400 [inline] __kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:417 kasan_kmalloc include/linux/kasan.h:262 [inline] __kmalloc_cache_noprof+0x1ef/0x6c0 mm/slub.c:5748 kmalloc_noprof include/linux/slab.h:957 [inline] mptcp_pm_alloc_anno_list+0x104/0x460 net/mptcp/pm.c:385 mptcp_pm_create_subflow_or_signal_addr+0xf9d/0x1360 net/mptcp/pm_kernel.c:355 mptcp_pm_nl_fully_established net/mptcp/pm_kernel.c:409 [inline] __mptcp_pm_kernel_worker+0x417/0x1ef0 net/mptcp/pm_kernel.c:1529 mptcp_pm_worker+0x1ee/0x320 net/mptcp/pm.c:1008 mptcp_worker+0xd5/0x1170 net/mptcp/protocol.c:2762 process_one_work kernel/workqueue.c:3263 [inline] process_scheduled_works+0xae1/0x17b0 kernel/workqueue.c:3346 worker_thread+0x8a0/0xda0 kernel/workqueue.c:3427 kthread+0x711/0x8a0 kernel/kthread.c:463 ret_from_fork+0x4bc/0x870 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 Freed by task 6630: kasan_save_stack mm/kasan/common.c:56 [inline] kasan_save_track+0x3e/0x80 mm/kasan/common.c:77 __kasan_save_free_info+0x46/0x50 mm/kasan/generic.c:587 kasan_save_free_info mm/kasan/kasan.h:406 [inline] poison_slab_object mm/kasan/common.c:252 [inline] __kasan_slab_free+0x5c/0x80 mm/kasan/common.c:284 kasan_slab_free include/linux/kasan.h:234 [inline] slab_free_hook mm/slub.c:2523 [inline] slab_free mm/slub.c:6611 [inline] kfree+0x197/0x950 mm/slub.c:6818 mptcp_remove_anno_list_by_saddr+0x2d/0x40 net/mptcp/pm.c:158 mptcp_pm_flush_addrs_and_subflows net/mptcp/pm_kernel.c:1209 [inline] mptcp_nl_flush_addrs_list net/mptcp/pm_kernel.c:1240 [inline] mptcp_pm_nl_flush_addrs_doit+0x593/0xbb0 net/mptcp/pm_kernel.c:1281 genl_family_rcv_msg_doit+0x215/0x300 net/netlink/genetlink.c:1115 genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline] genl_rcv_msg+0x60e/0x790 net/netlink/genetlink.c:1210 netlink_rcv_skb+0x208/0x470 net/netlink/af_netlink.c:2552 genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219 netlink_unicast_kernel net/netlink/af_netlink.c:1320 [inline] netlink_unicast+0x846/0xa10 net/netlink/af_netlink.c:1346 netlink_sendmsg+0x805/0xb30 net/netlink/af_netlink.c:1896 sock_sendmsg_nosec net/socket.c:727 [inline] __sock_sendmsg+0x21c/0x270 net/socket.c:742 ____sys_sendmsg+0x508/0x820 net/socket.c:2630 ___sys_sendmsg+0x21f/0x2a0 net/socket.c:2684 __sys_sendmsg net/socket.c:2716 [inline] __do_sys_sendmsg net/socket.c:2721 [inline] __se_sys_sendmsg net/socket.c:2719 [inline] __x64_sys_sendmsg+0x1a1/0x260 net/socket.c:2719 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f Cc: stable@vger.kernel.org Fixes: `00cfd77b90` ("mptcp: retransmit ADD_ADDR when timeout") Reported-by: syzbot+2a6fbf0f0530375968df@syzkaller.appspotmail.com Closes: https://lore.kernel.org/691ad3c3.a70a0220.f6df1.0004.GAE@google.com Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Geliang Tang <geliang@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251117100745.1913963-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-18 18:33:01 -08:00
Jakub Kicinski	c3995fc1a8	Merge tag 'ipsec-2025-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2025-11-18 1) Misc fixes for xfrm_state creation/modification/deletion. Patchset from Sabrina Dubroca. 2) Fix inner packet family determination for xfrm offloads. From Jianbo Liu. 3) Don't push locally generated packets directly to L2 tunnel mode offloading, they still need processing from the standard xfrm path. From Jianbo Liu. 4) Fix memory leaks in xfrm_add_acquire for policy offloads and policy security contexts. From Zilin Guan. * tag 'ipsec-2025-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: fix memory leak in xfrm_add_acquire() xfrm: Prevent locally generated packets from direct output in tunnel mode xfrm: Determine inner GSO type from packet inner protocol xfrm: Check inner packet family directly from skb_dst xfrm: check all hash buckets for leftover states during netns deletion xfrm: set err and extack on failure to create pcpu SA xfrm: call xfrm_dev_state_delete when xfrm_state_migrate fails to add the state xfrm: make state as DEAD before final put when migrate fails xfrm: also call xfrm_state_delete_tunnel at destroy time for states that were never added xfrm: drop SA reference in xfrm_state_update if dir doesn't match ==================== Link: https://patch.msgid.link/20251118085344.2199815-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-18 17:58:44 -08:00
Shay Drory	f94c1a114a	devlink: rate: Unset parent pointer in devl_rate_nodes_destroy The function devl_rate_nodes_destroy is documented to "Unset parent for all rate objects". However, it was only calling the driver-specific `rate_leaf_parent_set` or `rate_node_parent_set` ops and decrementing the parent's refcount, without actually setting the `devlink_rate->parent` pointer to NULL. This leaves a dangling pointer in the `devlink_rate` struct, which cause refcount error in netdevsim[1] and mlx5[2]. In addition, this is inconsistent with the behavior of `devlink_nl_rate_parent_node_set`, where the parent pointer is correctly cleared. This patch fixes the issue by explicitly setting `devlink_rate->parent` to NULL after notifying the driver, thus fulfilling the function's documented behavior for all rate objects. [1] repro steps: echo 1 > /sys/bus/netdevsim/new_device devlink dev eswitch set netdevsim/netdevsim1 mode switchdev echo 1 > /sys/bus/netdevsim/devices/netdevsim1/sriov_numvfs devlink port function rate add netdevsim/netdevsim1/test_node devlink port function rate set netdevsim/netdevsim1/128 parent test_node echo 1 > /sys/bus/netdevsim/del_device dmesg: refcount_t: decrement hit 0; leaking memory. WARNING: CPU: 8 PID: 1530 at lib/refcount.c:31 refcount_warn_saturate+0x42/0xe0 CPU: 8 UID: 0 PID: 1530 Comm: bash Not tainted 6.18.0-rc4+ #1 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:refcount_warn_saturate+0x42/0xe0 Call Trace: <TASK> devl_rate_leaf_destroy+0x8d/0x90 __nsim_dev_port_del+0x6c/0x70 [netdevsim] nsim_dev_reload_destroy+0x11c/0x140 [netdevsim] nsim_drv_remove+0x2b/0xb0 [netdevsim] device_release_driver_internal+0x194/0x1f0 bus_remove_device+0xc6/0x130 device_del+0x159/0x3c0 device_unregister+0x1a/0x60 del_device_store+0x111/0x170 [netdevsim] kernfs_fop_write_iter+0x12e/0x1e0 vfs_write+0x215/0x3d0 ksys_write+0x5f/0xd0 do_syscall_64+0x55/0x10f0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 [2] devlink dev eswitch set pci/0000:08:00.0 mode switchdev devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 1000 devlink port function rate add pci/0000:08:00.0/group1 devlink port function rate set pci/0000:08:00.0/32768 parent group1 modprobe -r mlx5_ib mlx5_fwctl mlx5_core dmesg: refcount_t: decrement hit 0; leaking memory. WARNING: CPU: 7 PID: 16151 at lib/refcount.c:31 refcount_warn_saturate+0x42/0xe0 CPU: 7 UID: 0 PID: 16151 Comm: bash Not tainted 6.17.0-rc7_for_upstream_min_debug_2025_10_02_12_44 #1 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 RIP: 0010:refcount_warn_saturate+0x42/0xe0 Call Trace: <TASK> devl_rate_leaf_destroy+0x8d/0x90 mlx5_esw_offloads_devlink_port_unregister+0x33/0x60 [mlx5_core] mlx5_esw_offloads_unload_rep+0x3f/0x50 [mlx5_core] mlx5_eswitch_unload_sf_vport+0x40/0x90 [mlx5_core] mlx5_sf_esw_event+0xc4/0x120 [mlx5_core] notifier_call_chain+0x33/0xa0 blocking_notifier_call_chain+0x3b/0x50 mlx5_eswitch_disable_locked+0x50/0x110 [mlx5_core] mlx5_eswitch_disable+0x63/0x90 [mlx5_core] mlx5_unload+0x1d/0x170 [mlx5_core] mlx5_uninit_one+0xa2/0x130 [mlx5_core] remove_one+0x78/0xd0 [mlx5_core] pci_device_remove+0x39/0xa0 device_release_driver_internal+0x194/0x1f0 unbind_store+0x99/0xa0 kernfs_fop_write_iter+0x12e/0x1e0 vfs_write+0x215/0x3d0 ksys_write+0x5f/0xd0 do_syscall_64+0x53/0x1f0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Fixes: `d755598450` ("devlink: Allow setting parent node of rate objects") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1763381149-1234377-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-18 17:12:21 -08:00
Florian Fuchs	0f08f0b0fb	net: ps3_gelic_net: handle skb allocation failures Handle skb allocation failures in RX path, to avoid NULL pointer dereference and RX stalls under memory pressure. If the refill fails with -ENOMEM, complete napi polling and wake up later to retry via timer. Also explicitly re-enable RX DMA after oom, so the dmac doesn't remain stopped in this situation. Previously, memory pressure could lead to skb allocation failures and subsequent Oops like: Oops: Kernel access of bad area, sig: 11 [#2] Hardware name: SonyPS3 Cell Broadband Engine 0x701000 PS3 NIP [c0003d0000065900] gelic_net_poll+0x6c/0x2d0 [ps3_gelic] (unreliable) LR [c0003d00000659c4] gelic_net_poll+0x130/0x2d0 [ps3_gelic] Call Trace: gelic_net_poll+0x130/0x2d0 [ps3_gelic] (unreliable) __napi_poll+0x44/0x168 net_rx_action+0x178/0x290 Steps to reproduce the issue: 1. Start a continuous network traffic, like scp of a 20GB file 2. Inject failslab errors using the kernel fault injection: echo -1 > /sys/kernel/debug/failslab/times echo 30 > /sys/kernel/debug/failslab/interval echo 100 > /sys/kernel/debug/failslab/probability 3. After some time, traces start to appear, kernel Oopses and the system stops Step 2 is not always necessary, as it is usually already triggered by the transfer of a big enough file. Fixes: `02c1889166` ("ps3: gigabit ethernet driver for PS3, take3") Signed-off-by: Florian Fuchs <fuchsfl@gmail.com> Link: https://patch.msgid.link/20251113181000.3914980-1-fuchsfl@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-18 12:31:09 +01:00
Pavel Zhigulin	896f1a2493	net: qlogic/qede: fix potential out-of-bounds read in qede_tpa_cont() and qede_tpa_end() The loops in 'qede_tpa_cont()' and 'qede_tpa_end()', iterate over 'cqe->len_list[]' using only a zero-length terminator as the stopping condition. If the terminator was missing or malformed, the loop could run past the end of the fixed-size array. Add an explicit bound check using ARRAY_SIZE() in both loops to prevent a potential out-of-bounds access. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: `55482edc25` ("qede: Add slowpath/fastpath support and enable hardware GRO") Signed-off-by: Pavel Zhigulin <Pavel.Zhigulin@kaspersky.com> Link: https://patch.msgid.link/20251113112757.4166625-1-Pavel.Zhigulin@kaspersky.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-18 11:09:58 +01:00
Lorenzo Bianconi	8e0a754b08	net: airoha: Do not loopback traffic to GDM2 if it is available on the device Airoha_eth driver forwards offloaded uplink traffic (packets received on GDM1 and forwarded to GDM{3,4}) to GDM2 in order to apply hw QoS. This is correct if the device does not support a dedicated GDM2 port. In this case, in order to enable hw offloading for uplink traffic, the packets should be sent to GDM{3,4} directly. Fixes: `9cd451d414` ("net: airoha: Add loopback support for GDM2") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20251113-airoha-hw-offload-gdm2-fix-v1-1-7e4ca300872f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-17 20:00:07 -08:00
Ido Schimmel	bed22c7b90	selftests: net: lib: Do not overwrite error messages ret_set_ksft_status() calls ksft_status_merge() with the current return status and the last one. It treats a non-zero return code from ksft_status_merge() as an indication that the return status was overwritten by the last one and therefore overwrites the return message with the last one. Currently, ksft_status_merge() returns a non-zero return code even if the current return status and the last one are equal. This results in return messages being overwritten which is counter-productive since we are more interested in the first failure message and not the last one. Fix by changing ksft_status_merge() to only return a non-zero return code if the current return status was actually changed. Add a test case which checks that the first error message is not overwritten. Before: # ./lib_sh_test.sh [...] TEST: RET tfail2 tfail -> fail [FAIL] retmsg=tfail expected tfail2 [...] # echo $? 1 After: # ./lib_sh_test.sh [...] TEST: RET tfail2 tfail -> fail [ OK ] [...] # echo $? 0 Fixes: `596c8819cb` ("selftests: forwarding: Have RET track kselftest framework constants") Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20251116081029.69112-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-17 19:32:12 -08:00
Aleksei Nikiforov	da02a18248	s390/ctcm: Fix double-kfree The function 'mpc_rcvd_sweep_req(mpcginfo)' is called conditionally from function 'ctcmpc_unpack_skb'. It frees passed mpcginfo. After that a call to function 'kfree' in function 'ctcmpc_unpack_skb' frees it again. Remove 'kfree' call in function 'mpc_rcvd_sweep_req(mpcginfo)'. Bug detected by the clang static analyzer. Fixes: `0c0b20587b` ("s390/ctcm: fix potential memory leak") Reviewed-by: Aswin Karuvally <aswin@linux.ibm.com> Signed-off-by: Aleksei Nikiforov <aleksei.nikiforov@linux.ibm.com> Signed-off-by: Aswin Karuvally <aswin@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251112182724.1109474-1-aswin@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-17 16:58:25 -08:00
Jesper Dangaard Brouer	5442a9da69	veth: more robust handing of race to avoid txq getting stuck Commit `dc82a33297` ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops") introduced a race condition that can lead to a permanently stalled TXQ. This was observed in production on ARM64 systems (Ampere Altra Max). The race occurs in veth_xmit(). The producer observes a full ptr_ring and stops the queue (netif_tx_stop_queue()). The subsequent conditional logic, intended to re-wake the queue if the consumer had just emptied it (if (__ptr_ring_empty(...)) netif_tx_wake_queue()), can fail. This leads to a "lost wakeup" where the TXQ remains stopped (QUEUE_STATE_DRV_XOFF) and traffic halts. This failure is caused by an incorrect use of the __ptr_ring_empty() API from the producer side. As noted in kernel comments, this check is not guaranteed to be correct if a consumer is operating on another CPU. The empty test is based on ptr_ring->consumer_head, making it reliable only for the consumer. Using this check from the producer side is fundamentally racy. This patch fixes the race by adopting the more robust logic from an earlier version V4 of the patchset, which always flushed the peer: (1) In veth_xmit(), the racy conditional wake-up logic and its memory barrier are removed. Instead, after stopping the queue, we unconditionally call __veth_xdp_flush(rq). This guarantees that the NAPI consumer is scheduled, making it solely responsible for re-waking the TXQ. This handles the race where veth_poll() consumes all packets and completes NAPI before veth_xmit() on the producer side has called netif_tx_stop_queue. The __veth_xdp_flush(rq) will observe rx_notify_masked is false and schedule NAPI. (2) On the consumer side, the logic for waking the peer TXQ is moved out of veth_xdp_rcv() and placed at the end of the veth_poll() function. This placement is part of fixing the race, as the netif_tx_queue_stopped() check must occur after rx_notify_masked is potentially set to false during NAPI completion. This handles the race where veth_poll() consumes all packets, but haven't finished (rx_notify_masked is still true). The producer veth_xmit() stops the TXQ and __veth_xdp_flush(rq) will observe rx_notify_masked is true, meaning not starting NAPI. Then veth_poll() change rx_notify_masked to false and stops NAPI. Before exiting veth_poll() will observe TXQ is stopped and wake it up. Fixes: `dc82a33297` ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops") Reviewed-by: Toshiaki Makita <toshiaki.makita1@gmail.com> Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://patch.msgid.link/176295323282.307447.14790015927673763094.stgit@firesoul Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-14 18:16:53 -08:00
Ilya Maximets	dfe28c4167	net: openvswitch: remove never-working support for setting nsh fields The validation of the set(nsh(...)) action is completely wrong. It runs through the nsh_key_put_from_nlattr() function that is the same function that validates NSH keys for the flow match and the push_nsh() action. However, the set(nsh(...)) has a very different memory layout. Nested attributes in there are doubled in size in case of the masked set(). That makes proper validation impossible. There is also confusion in the code between the 'masked' flag, that says that the nested attributes are doubled in size containing both the value and the mask, and the 'is_mask' that says that the value we're parsing is the mask. This is causing kernel crash on trying to write into mask part of the match with SW_FLOW_KEY_PUT() during validation, while validate_nsh() doesn't allocate any memory for it: BUG: kernel NULL pointer dereference, address: 0000000000000018 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 1c2383067 P4D 1c2383067 PUD 20b703067 PMD 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 8 UID: 0 Kdump: loaded Not tainted 6.17.0-rc4+ #107 PREEMPT(voluntary) RIP: 0010:nsh_key_put_from_nlattr+0x19d/0x610 [openvswitch] Call Trace: <TASK> validate_nsh+0x60/0x90 [openvswitch] validate_set.constprop.0+0x270/0x3c0 [openvswitch] __ovs_nla_copy_actions+0x477/0x860 [openvswitch] ovs_nla_copy_actions+0x8d/0x100 [openvswitch] ovs_packet_cmd_execute+0x1cc/0x310 [openvswitch] genl_family_rcv_msg_doit+0xdb/0x130 genl_family_rcv_msg+0x14b/0x220 genl_rcv_msg+0x47/0xa0 netlink_rcv_skb+0x53/0x100 genl_rcv+0x24/0x40 netlink_unicast+0x280/0x3b0 netlink_sendmsg+0x1f7/0x430 ____sys_sendmsg+0x36b/0x3a0 ___sys_sendmsg+0x87/0xd0 __sys_sendmsg+0x6d/0xd0 do_syscall_64+0x7b/0x2c0 entry_SYSCALL_64_after_hwframe+0x76/0x7e The third issue with this process is that while trying to convert the non-masked set into masked one, validate_set() copies and doubles the size of the OVS_KEY_ATTR_NSH as if it didn't have any nested attributes. It should be copying each nested attribute and doubling them in size independently. And the process must be properly reversed during the conversion back from masked to a non-masked variant during the flow dump. In the end, the only two outcomes of trying to use this action are either validation failure or a kernel crash. And if somehow someone manages to install a flow with such an action, it will most definitely not do what it is supposed to, since all the keys and the masks are mixed up. Fixing all the issues is a complex task as it requires re-writing most of the validation code. Given that and the fact that this functionality never worked since introduction, let's just remove it altogether. It's better to re-introduce it later with a proper implementation instead of trying to fix it in stable releases. Fixes: `b2d0f5d5dc` ("openvswitch: enable NSH support") Reported-by: Junvy Yang <zhuque@tencent.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Link: https://patch.msgid.link/20251112112246.95064-1-i.maximets@ovn.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-14 18:13:24 -08:00
Eric Dumazet	035bca3f01	mptcp: fix race condition in mptcp_schedule_work() syzbot reported use-after-free in mptcp_schedule_work() [1] Issue here is that mptcp_schedule_work() schedules a work, then gets a refcount on sk->sk_refcnt if the work was scheduled. This refcount will be released by mptcp_worker(). [A] if (schedule_work(...)) { [B] sock_hold(sk); return true; } Problem is that mptcp_worker() can run immediately and complete before [B] We need instead : sock_hold(sk); if (schedule_work(...)) return true; sock_put(sk); [1] refcount_t: addition on 0; use-after-free. WARNING: CPU: 1 PID: 29 at lib/refcount.c:25 refcount_warn_saturate+0xfa/0x1d0 lib/refcount.c:25 Call Trace: <TASK> __refcount_add include/linux/refcount.h:-1 [inline] __refcount_inc include/linux/refcount.h:366 [inline] refcount_inc include/linux/refcount.h:383 [inline] sock_hold include/net/sock.h:816 [inline] mptcp_schedule_work+0x164/0x1a0 net/mptcp/protocol.c:943 mptcp_tout_timer+0x21/0xa0 net/mptcp/protocol.c:2316 call_timer_fn+0x17e/0x5f0 kernel/time/timer.c:1747 expire_timers kernel/time/timer.c:1798 [inline] __run_timers kernel/time/timer.c:2372 [inline] __run_timer_base+0x648/0x970 kernel/time/timer.c:2384 run_timer_base kernel/time/timer.c:2393 [inline] run_timer_softirq+0xb7/0x180 kernel/time/timer.c:2403 handle_softirqs+0x22f/0x710 kernel/softirq.c:622 __do_softirq kernel/softirq.c:656 [inline] run_ktimerd+0xcf/0x190 kernel/softirq.c:1138 smpboot_thread_fn+0x542/0xa60 kernel/smpboot.c:160 kthread+0x711/0x8a0 kernel/kthread.c:463 ret_from_fork+0x4bc/0x870 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 Cc: stable@vger.kernel.org Fixes: `3b1d6210a9` ("mptcp: implement and use MPTCP-level retransmission") Reported-by: syzbot+355158e7e301548a1424@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/6915b46f.050a0220.3565dc.0028.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251113103924.3737425-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-14 18:12:35 -08:00
Pavel Zhigulin	b0c959fec1	net: mlxsw: linecards: fix missing error check in mlxsw_linecard_devlink_info_get() The call to devlink_info_version_fixed_put() in mlxsw_linecard_devlink_info_get() did not check for errors, although it is checked everywhere in the code. Add missed 'err' check to the mlxsw_linecard_devlink_info_get() Fixes: `3fc0c51905` ("mlxsw: core_linecards: Expose device PSID over device info") Signed-off-by: Pavel Zhigulin <Pavel.Zhigulin@kaspersky.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20251113161922.813828-1-Pavel.Zhigulin@kaspersky.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-14 18:01:32 -08:00
Pavel Zhigulin	e6751b0b19	net: dsa: hellcreek: fix missing error handling in LED registration The LED setup routine registered both led_sync_good and led_is_gm devices without checking the return values of led_classdev_register(). If either registration failed, the function continued silently, leaving the driver in a partially-initialized state and leaking a registered LED classdev. Add proper error handling Fixes: `7d9ee2e8ff` ("net: dsa: hellcreek: Add PTP status LEDs") Signed-off-by: Pavel Zhigulin <Pavel.Zhigulin@kaspersky.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Kurt Kanzenbach <kurt@linutronix.de> Link: https://patch.msgid.link/20251113135745.92375-1-Pavel.Zhigulin@kaspersky.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-14 17:46:32 -08:00
Zilin Guan	a55ef3bff8	xfrm: fix memory leak in xfrm_add_acquire() The xfrm_add_acquire() function constructs an xfrm policy by calling xfrm_policy_construct(). This allocates the policy structure and potentially associates a security context and a device policy with it. However, at the end of the function, the policy object is freed using only kfree() . This skips the necessary cleanup for the security context and device policy, leading to a memory leak. To fix this, invoke the proper cleanup functions xfrm_dev_policy_delete(), xfrm_dev_policy_free(), and security_xfrm_policy_free() before freeing the policy object. This approach mirrors the error handling path in xfrm_add_policy(), ensuring that all associated resources are correctly released. Fixes: `980ebd2579` ("[IPSEC]: Sync series - acquire insert") Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2025-11-14 10:12:36 +01:00
Zilin Guan	407a06507c	mlxsw: spectrum: Fix memory leak in mlxsw_sp_flower_stats() The function mlxsw_sp_flower_stats() calls mlxsw_sp_acl_ruleset_get() to obtain a ruleset reference. If the subsequent call to mlxsw_sp_acl_rule_lookup() fails to find a rule, the function returns an error without releasing the ruleset reference, causing a memory leak. Fix this by using a goto to the existing error handling label, which calls mlxsw_sp_acl_ruleset_put() to properly release the reference. Fixes: `7c1b8eb175` ("mlxsw: spectrum: Add support for TC flower offload statistics") Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20251112052114.1591695-1-zilin@seu.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-13 17:25:33 -08:00
Jiaming Zhang	f796a8dec9	net: core: prevent NULL deref in generic_hwtstamp_ioctl_lower() The ethtool tsconfig Netlink path can trigger a null pointer dereference. A call chain such as: tsconfig_prepare_data() -> dev_get_hwtstamp_phylib() -> vlan_hwtstamp_get() -> generic_hwtstamp_get_lower() -> generic_hwtstamp_ioctl_lower() results in generic_hwtstamp_ioctl_lower() being called with kernel_cfg->ifr as NULL. The generic_hwtstamp_ioctl_lower() function does not expect a NULL ifr and dereferences it, leading to a system crash. Fix this by adding a NULL check for kernel_cfg->ifr in generic_hwtstamp_ioctl_lower(). If ifr is NULL, return -EINVAL. Fixes: `6e9e2eed4f` ("net: ethtool: Add support for tsconfig command to get/set hwtstamp config") Closes: https://lore.kernel.org/cd6a7056-fa6d-43f8-b78a-f5e811247ba8@linux.dev Signed-off-by: Jiaming Zhang <r772577952@gmail.com> Reviewed-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20251111173652.749159-2-r772577952@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-13 17:23:21 -08:00
Baruch Siach	7410c86fc0	MAINTAINERS: Remove eth bridge website Ethernet bridge website URL shows "This page isn’t available". Signed-off-by: Baruch Siach <baruch@tkos.co.il> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/0a32aaf7fa4473e7574f7327480e8fbc4fef2741.1762946223.git.baruch@tkos.co.il Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-13 16:24:14 -08:00
Linus Torvalds	d0309c0543	Merge tag 'net-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from Bluetooth and Wireless. No known outstanding regressions. Current release - regressions: - eth: - bonding: fix mii_status when slave is down - mlx5e: fix missing error assignment in mlx5e_xfrm_add_state() Previous releases - regressions: - sched: limit try_bulk_dequeue_skb() batches - ipv4: route: prevent rt_bind_exception() from rebinding stale fnhe - af_unix: initialise scc_index in unix_add_edge() - netpoll: fix incorrect refcount handling causing incorrect cleanup - bluetooth: don't hold spin lock over sleeping functions - hsr: Fix supervision frame sending on HSRv0 - sctp: prevent possible shift out-of-bounds - tipc: fix use-after-free in tipc_mon_reinit_self(). - dsa: tag_brcm: do not mark link local traffic as offloaded - eth: virtio-net: fix incorrect flags recording in big mode Previous releases - always broken: - sched: initialize struct tc_ife to fix kernel-infoleak - wifi: - mac80211: reject address change while connecting - iwlwifi: avoid toggling links due to wrong element use - bluetooth: cancel mesh send timer when hdev removed - strparser: fix signed/unsigned mismatch bug - handshake: fix memory leak in tls_handshake_accept() Misc: - selftests: mptcp: fix some flaky tests" * tag 'net-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (60 commits) hsr: Follow standard for HSRv0 supervision frames hsr: Fix supervision frame sending on HSRv0 virtio-net: fix incorrect flags recording in big mode ipv4: route: Prevent rt_bind_exception() from rebinding stale fnhe wifi: iwlwifi: mld: always take beacon ies in link grading wifi: iwlwifi: mvm: fix beacon template/fixed rate wifi: iwlwifi: fix aux ROC time event iterator usage net_sched: limit try_bulk_dequeue_skb() batches selftests: mptcp: join: properly kill background tasks selftests: mptcp: connect: trunc: read all recv data selftests: mptcp: join: userspace: longer transfer selftests: mptcp: join: endpoints: longer transfer selftests: mptcp: join: rm: set backup flag selftests: mptcp: connect: fix fallback note due to OoO ethtool: fix incorrect kernel-doc style comment in ethtool.h mlx5: Fix default values in create CQ Bluetooth: btrtl: Avoid loading the config file on security chips net/mlx5e: Fix potentially misleading debug message net/mlx5e: Fix wraparound in rate limiting for values above 255 Gbps net/mlx5e: Fix maxrate wraparound in threshold between units ...	2025-11-13 11:20:25 -08:00
Paolo Abeni	94909c53e4	Merge branch 'hsr-send-correct-hsrv0-supervision-frames' Felix Maurer says: ==================== hsr: Send correct HSRv0 supervision frames Hangbin recently reported that the hsr selftests were failing and noted that the entries in the node table were not merged, i.e., had 00:00:00:00:00:00 as MacAddressB forever [1]. This failure only occured with HSRv0 because it was not sending supervision frames anymore. While debugging this I found that we were not really following the HSRv0 standard for the supervision frames we sent, so I additionally made a few changes to get closer to the standard and restore a more correct behavior we had a while ago. The selftests can still fail because they take a while and run into the timeout. I did not include a change of the timeout because I have more improvements to the selftests mostly ready that change the test duration but are net-next material. [1]: https://lore.kernel.org/netdev/aMONxDXkzBZZRfE5@fedora/ ==================== Link: https://patch.msgid.link/cover.1762876095.git.fmaurer@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-13 15:55:06 +01:00
Felix Maurer	b2c26c82f7	hsr: Follow standard for HSRv0 supervision frames For HSRv0, the path_id has the following meaning: - 0000: PRP supervision frame - 0001-1001: HSR ring identifier - 1010-1011: Frames from PRP network (A/B, with RedBoxes) - 1111: HSR supervision frame Follow the IEC 62439-3:2010 standard more closely by setting the right path_id for HSRv0 supervision frames (actually, it is correctly set when the frame is constructed, but hsr_set_path_id() overwrites it) and set a fixed HSR ring identifier of 1. The ring identifier seems to be generally unused and we ignore it anyways on reception, but some fixed identifier is definitely better than using one identifier in one direction and a wrong identifier in the other. This was also the behavior before commit `f266a683a4` ("net/hsr: Better frame dispatch") which introduced the alternating path_id. This was later moved to hsr_set_path_id() in commit `451d8123f8` ("net: prp: add packet handling support"). The IEC 62439-3:2010 also contains 6 unused bytes after the MacAddressA in the HSRv0 supervision frames. Adjust a TODO comment accordingly. Fixes: `f266a683a4` ("net/hsr: Better frame dispatch") Fixes: `451d8123f8` ("net: prp: add packet handling support") Signed-off-by: Felix Maurer <fmaurer@redhat.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/ea0d5133cd593856b2fa673d6e2067bf1d4d1794.1762876095.git.fmaurer@redhat.com Tested-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-13 15:55:04 +01:00
Felix Maurer	96a3a03abf	hsr: Fix supervision frame sending on HSRv0 On HSRv0, no supervision frames were sent. The supervison frames were generated successfully, but failed the check for a sufficiently long mac header, i.e., at least sizeof(struct hsr_ethhdr), in hsr_fill_frame_info() because the mac header only contained the ethernet header. Fix this by including the HSR header in the mac header when generating HSR supervision frames. Note that the mac header now also includes the TLV fields. This matches how we set the headers on rx and also the size of struct hsrv0_ethhdr_sp. Reported-by: Hangbin Liu <liuhangbin@gmail.com> Closes: https://lore.kernel.org/netdev/aMONxDXkzBZZRfE5@fedora/ Fixes: `9cfb5e7f0d` ("net: hsr: fix hsr_init_sk() vs network/transport headers.") Signed-off-by: Felix Maurer <fmaurer@redhat.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/4354114fea9a642fe71f49aeeb6c6159d1d61840.1762876095.git.fmaurer@redhat.com Tested-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-13 15:55:04 +01:00
Linus Torvalds	2ccec59446	Merge tag 'erofs-for-6.18-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: - Add Chunhai Guo as a EROFS reviewer to get more eyes from interested industry vendors - Fix infinite loop caused by incomplete crafted zstd-compressed data (thanks to Robert again!) * tag 'erofs-for-6.18-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: avoid infinite loop due to incomplete zstd-compressed data MAINTAINERS: erofs: add myself as reviewer	2025-11-13 05:02:59 -08:00
Linus Torvalds	967a72fa7f	Merge tag 'v6.18-rc5-smb-server-fixes' of git://git.samba.org/ksmbd Pull smb server fixes from Steve French: - Fix smbdirect (RDMA) disconnect hang bug - Fix potential Denial of Service when connection limit exceeded - Fix smbdirect (RDMA) connection (potentially accessing freed memory) bug * tag 'v6.18-rc5-smb-server-fixes' of git://git.samba.org/ksmbd: smb: server: let smb_direct_disconnect_rdma_connection() turn CREATED into DISCONNECTED ksmbd: close accepted socket when per-IP limit rejects connection smb: server: rdma: avoid unmapping posted recv on accept failure	2025-11-13 04:57:38 -08:00
Xuan Zhuo	0eff2eaa53	virtio-net: fix incorrect flags recording in big mode The purpose of commit `703eec1b24` ("virtio_net: fixing XDP for fully checksummed packets handling") is to record the flags in advance, as their value may be overwritten in the XDP case. However, the flags recorded under big mode are incorrect, because in big mode, the passed buf does not point to the rx buffer, but rather to the page of the submitted buffer. This commit fixes this issue. For the small mode, the commit `c11a49d58a` ("virtio_net: Fix mismatched buf address when unmapping for small packets") fixed it. Tested-by: Alyssa Ross <hi@alyssa.is> Fixes: `703eec1b24` ("virtio_net: fixing XDP for fully checksummed packets handling") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://patch.msgid.link/20251111090828.23186-1-xuanzhuo@linux.alibaba.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-13 13:16:30 +01:00
Linus Torvalds	6fa9041b71	Merge tag 'nfsd-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: "Address recently reported issues or issues found at the recent NFS bake-a-thon held in Raleigh, NC. Issues reported with v6.18-rc: - Address a kernel build issue - Reorder SEQUENCE processing to avoid spurious NFS4ERR_SEQ_MISORDERED Issues that need expedient stable backports: - Close a refcount leak exposure - Report support for NFSv4.2 CLONE correctly - Fix oops during COPY_NOTIFY processing - Prevent rare crash after XDR encoding failure - Prevent crash due to confused or malicious NFSv4.1 client" * tag 'nfsd-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: Revert "SUNRPC: Make RPCSEC_GSS_KRB5 select CRYPTO instead of depending on it" nfsd: ensure SEQUENCE replay sends a valid reply. NFSD: Never cache a COMPOUND when the SEQUENCE operation fails NFSD: Skip close replay processing if XDR encoding fails NFSD: free copynotify stateid in nfs4_free_ol_stateid() nfsd: add missing FATTR4_WORD2_CLONE_BLKSIZE from supported attributes nfsd: fix refcount leak in nfsd_set_fh_dentry()	2025-11-12 18:41:01 -08:00
Linus Torvalds	92385a075a	Merge tag 'dma-mapping-6.18-2025-11-12' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping fixes from Marek Szyprowski: - two minor fixes for DMA API infrastructure: restoring proper structure padding used in benchmark tests (Qinxin Xia) and global DMA_BIT_MASK macro rework to make it a bit more clang friendly (James Clark) * tag 'dma-mapping-6.18-2025-11-12' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: dma-mapping: Allow use of DMA_BIT_MASK(64) in global scope dma-mapping: benchmark: Restore padding to ensure uABI remained consistent	2025-11-12 18:31:22 -08:00
Linus Torvalds	e927c520e1	Merge tag 'loongarch-fixes-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch fixes from Huacai Chen: - Fix a Rust build error - Fix exception/interrupt, memory management, perf event, hardware breakpoint, kexec and KVM bugs * tag 'loongarch-fixes-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: LoongArch: KVM: Fix max supported vCPUs set with EIOINTC LoongArch: KVM: Skip PMU checking on vCPU context switch LoongArch: KVM: Restore guest PMU if it is enabled LoongArch: KVM: Add delay until timer interrupt injected LoongArch: KVM: Set page with write attribute if dirty track disabled LoongArch: kexec: Print out debugging message if required LoongArch: kexec: Initialize the kexec_buf structure LoongArch: Use correct accessor to read FWPC/MWPC LoongArch: Refine the init_hw_perf_events() function LoongArch: Remove __GFP_HIGHMEM masking in pud_alloc_one() LoongArch: Let {pte,pmd}_modify() record the status of _PAGE_DIRTY LoongArch: Consolidate max_pfn & max_low_pfn calculation LoongArch: Consolidate early_ioremap()/ioremap_prot() LoongArch: Use physical addresses for CSR_MERRENTRY/CSR_TLBRENTRY LoongArch: Clarify 3 MSG interrupt features rust: Add -fno-isolate-erroneous-paths-dereference to bindgen_skip_c_flags	2025-11-12 18:21:30 -08:00
Linus Torvalds	89ee862a4d	Merge tag 'alpha-fixes-v6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha Pull alpha fix from Matt Turner: "Add Magnus as a maintainer of the alpha port" * tag 'alpha-fixes-v6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha: MAINTAINERS: Add Magnus Lindholm as maintainer for alpha port	2025-11-12 18:18:12 -08:00
Jakub Kicinski	fe82c4f8a2	Merge tag 'wireless-2025-11-12' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Couple more fixes: - mwl8k: work around FW expecting a DSSS element in beacons - ath11k: report correct TX status - iwlwifi: avoid toggling links due to wrong element use - iwlwifi: fix beacon template rate on older devices - iwlwifi: fix loop iterator being used after loop - mac80211: disallow address changes while using the address - mac80211: avoid bad rate warning in monitor/sniffer mode - hwsim: fix potential NULL deref (on monitor injection) * tag 'wireless-2025-11-12' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: iwlwifi: mld: always take beacon ies in link grading wifi: iwlwifi: mvm: fix beacon template/fixed rate wifi: iwlwifi: fix aux ROC time event iterator usage wifi: mwl8k: inject DSSS Parameter Set element into beacons if missing wifi: mac80211_hwsim: Fix possible NULL dereference wifi: mac80211: skip rate verification for not captured PSDUs wifi: mac80211: reject address change while connecting wifi: ath11k: zero init info->status in wmi_process_mgmt_tx_comp() ==================== Link: https://patch.msgid.link/20251112114621.15716-5-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-12 09:33:09 -08:00
Chuang Wang	ac1499fcd4	ipv4: route: Prevent rt_bind_exception() from rebinding stale fnhe The sit driver's packet transmission path calls: sit_tunnel_xmit() -> update_or_create_fnhe(), which lead to fnhe_remove_oldest() being called to delete entries exceeding FNHE_RECLAIM_DEPTH+random. The race window is between fnhe_remove_oldest() selecting fnheX for deletion and the subsequent kfree_rcu(). During this time, the concurrent path's __mkroute_output() -> find_exception() can fetch the soon-to-be-deleted fnheX, and rt_bind_exception() then binds it with a new dst using a dst_hold(). When the original fnheX is freed via RCU, the dst reference remains permanently leaked. CPU 0 CPU 1 __mkroute_output() find_exception() [fnheX] update_or_create_fnhe() fnhe_remove_oldest() [fnheX] rt_bind_exception() [bind dst] RCU callback [fnheX freed, dst leak] This issue manifests as a device reference count leak and a warning in dmesg when unregistering the net device: unregister_netdevice: waiting for sitX to become free. Usage count = N Ido Schimmel provided the simple test validation method [1]. The fix clears 'oldest->fnhe_daddr' before calling fnhe_flush_routes(). Since rt_bind_exception() checks this field, setting it to zero prevents the stale fnhe from being reused and bound to a new dst just before it is freed. [1] ip netns add ns1 ip -n ns1 link set dev lo up ip -n ns1 address add 192.0.2.1/32 dev lo ip -n ns1 link add name dummy1 up type dummy ip -n ns1 route add 192.0.2.2/32 dev dummy1 ip -n ns1 link add name gretap1 up arp off type gretap \ local 192.0.2.1 remote 192.0.2.2 ip -n ns1 route add 198.51.0.0/16 dev gretap1 taskset -c 0 ip netns exec ns1 mausezahn gretap1 \ -A 198.51.100.1 -B 198.51.0.0/16 -t udp -p 1000 -c 0 -q & taskset -c 2 ip netns exec ns1 mausezahn gretap1 \ -A 198.51.100.1 -B 198.51.0.0/16 -t udp -p 1000 -c 0 -q & sleep 10 ip netns pids ns1 \| xargs kill ip netns del ns1 Cc: stable@vger.kernel.org Fixes: `67d6d681e1` ("ipv4: make exception cache less predictible") Signed-off-by: Chuang Wang <nashuiliang@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251111064328.24440-1-nashuiliang@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-12 06:46:36 -08:00
Johannes Berg	a35f64a216	Merge tag 'iwlwifi-fixes-2025-11-12' of https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next Miri Korenblit says: ==================== iwlwifi fixes: - avoid link toggling - fix beacon template rate - don't use iterator outside the loop ==================== Link: https://patch.msgid.link/DM3PPF63A6024A9E52FF4A7B23F283B7FC7A3CCA@DM3PPF63A6024A9.namprd11.prod.outlook.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-11-12 09:51:05 +01:00
Miri Korenblit	1a222625b4	wifi: iwlwifi: mld: always take beacon ies in link grading One of the factors of a link's grade is the channel load, which is calculated from the AP's bss load element. The current code takes this element from the beacon for an active link, and from bss->ies for an inactive link. bss->ies is set to either the beacon's ies or to the probe response ones, with preference to the probe response (meaning that if there was even one probe response, the ies of it will be stored in bss->ies and won't be overiden by the beacon ies). The probe response can be very old, i.e. from the connection time, where a beacon is updated before each link selection (which is triggered only after a passive scan). In such case, the bss load element in the probe response will not include the channel load caused by the STA, where the beacon will. This will cause the inactive link to always have a lower channel load, and therefore an higher grade than the active link's one. This causes repeated link switches, causing the throughput to drop. Fix this by always taking the ies from the beacon, as those are for sure new. Fixes: `d1e879ec60` ("wifi: iwlwifi: add iwlmld sub-driver") Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20251110145652.b493dbb1853a.I058ba7309c84159f640cc9682d1bda56dd56a536@changeid Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>	2025-11-12 09:54:46 +02:00
Johannes Berg	3592c0083f	wifi: iwlwifi: mvm: fix beacon template/fixed rate During the development of the rate changes, I evidently made some changes that shouldn't have been there; beacon templates with rate_n_flags are only in old versions, so no changes to them should have been necessary, and evidently broke on some devices. This also would have broken fixed (injection) rates, it would seem. Restore the old handling of this. Fixes: `dabc88cb3b` ("wifi: iwlwifi: handle v3 rates") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220558 Reviewed-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20251008112044.3bb8ea849d8d.I90f4d2b2c1f62eaedaf304a61d2ab9e50c491c2d@changeid Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>	2025-11-12 09:54:46 +02:00
Junjie Cao	f4c737d449	wifi: iwlwifi: fix aux ROC time event iterator usage The list_for_each_entry() iterator must not be used outside the loop. Even though we break and check for NULL, doing so still violates kernel iteration rules and triggers Coccinelle's use_after_iter.cocci warning. Cache the matched entry in aux_roc_te and use it consistently after the loop. This follows iterator best practices, resolves the warning, and makes the code more maintainable. Signed-off-by: Junjie Cao <junjie.cao@intel.com> Link: https://patch.msgid.link/20251016014919.383565-1-junjie.cao@intel.com Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>	2025-11-12 09:54:46 +02:00
Eric Dumazet	0345552a65	net_sched: limit try_bulk_dequeue_skb() batches After commit 100dfa74cad9 ("inet: dev_queue_xmit() llist adoption") I started seeing many qdisc requeues on IDPF under high TX workload. $ tc -s qd sh dev eth1 handle 1: ; sleep 1; tc -s qd sh dev eth1 handle 1: qdisc mq 1: root Sent 43534617319319 bytes 268186451819 pkt (dropped 0, overlimits 0 requeues 3532840114) backlog 1056Kb 6675p requeues 3532840114 qdisc mq 1: root Sent 43554665866695 bytes 268309964788 pkt (dropped 0, overlimits 0 requeues 3537737653) backlog 781164b 4822p requeues 3537737653 This is caused by try_bulk_dequeue_skb() being only limited by BQL budget. perf record -C120-239 -e qdisc:qdisc_dequeue sleep 1 ; perf script ... netperf 75332 [146] 2711.138269: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1292 skbaddr=0xff378005a1e9f200 netperf 75332 [146] 2711.138953: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1213 skbaddr=0xff378004d607a500 netperf 75330 [144] 2711.139631: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1233 skbaddr=0xff3780046be20100 netperf 75333 [147] 2711.140356: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1093 skbaddr=0xff37800514845b00 netperf 75337 [151] 2711.141037: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1353 skbaddr=0xff37800460753300 netperf 75337 [151] 2711.141877: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1367 skbaddr=0xff378004e72c7b00 netperf 75330 [144] 2711.142643: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1202 skbaddr=0xff3780045bd60000 ... This is bad because : 1) Large batches hold one victim cpu for a very long time. 2) Driver often hit their own TX ring limit (all slots are used). 3) We call dev_requeue_skb() 4) Requeues are using a FIFO (q->gso_skb), breaking qdisc ability to implement FQ or priority scheduling. 5) dequeue_skb() gets packets from q->gso_skb one skb at a time with no xmit_more support. This is causing many spinlock games between the qdisc and the device driver. Requeues were supposed to be very rare, lets keep them this way. Limit batch sizes to /proc/sys/net/core/dev_weight (default 64) as __qdisc_run() was designed to use. Fixes: `5772e9a346` ("qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://patch.msgid.link/20251109161215.2574081-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-11 17:56:50 -08:00
Magnus Lindholm	d58041d2c6	MAINTAINERS: Add Magnus Lindholm as maintainer for alpha port Acked-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Acked-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Magnus Lindholm <linmag7@gmail.com> Signed-off-by: Matt Turner <mattst88@gmail.com>	2025-11-11 20:52:04 -05:00
Jakub Kicinski	7a6fa4f89e	Merge branch 'selftests-mptcp-join-fix-some-flaky-tests' Matthieu Baerts says: ==================== selftests: mptcp: join: fix some flaky tests When looking at the recent CI results on NIPA and MPTCP CIs, a few MPTCP Join tests are marked as unstable. Here are some fixes for that. - Patch 1: a small fix for mptcp_connect.sh, printing a note as initially intended. For >=v5.13. - Patch 2: avoid unexpected reset when closing subflows. For >= 5.13. - Patches 3-4: longer transfer when not waiting for the end. For >=5.18. - Patch 5: read all received data when expecting a reset. For >= v6.1. - Patch 6: a fix to properly kill background tasks. For >= v6.5. ==================== Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-0-a4332c714e10@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-11 17:49:53 -08:00
Matthieu Baerts (NGI0)	852b644acb	selftests: mptcp: join: properly kill background tasks The 'run_tests' function is executed in the background, but killing its associated PID would not kill the children tasks running in the background. To properly kill all background tasks, 'kill -- -PID' could be used, but this requires kill from procps-ng. Instead, all children tasks are listed using 'ps', and 'kill' is called with all PIDs of this group. Fixes: `31ee4ad86a` ("selftests: mptcp: join: stop transfer when check is done (part 1)") Cc: stable@vger.kernel.org Fixes: `04b57c9e09` ("selftests: mptcp: join: stop transfer when check is done (part 2)") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-6-a4332c714e10@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-11 17:49:49 -08:00
Matthieu Baerts (NGI0)	ee79980f7a	selftests: mptcp: connect: trunc: read all recv data MPTCP Join "fastclose server" selftest is sometimes failing because the client output file doesn't have the expected size, e.g. 296B instead of 1024B. When looking at a packet trace when this happens, the server sent the expected 1024B in two parts -- 100B, then 924B -- then the MP_FASTCLOSE. It is then strange to see the client only receiving 296B, which would mean it only got a part of the second packet. The problem is then not on the networking side, but rather on the data reception side. When mptcp_connect is launched with '-f -1', it means the connection might stop before having sent everything, because a reset has been received. When this happens, the program was directly stopped. But it is also possible there are still some data to read, simply because the previous 'read' step was done with a buffer smaller than the pending data, see do_rnd_read(). In this case, it is important to read what's left in the kernel buffers before stopping without error like before. SIGPIPE is now ignored, not to quit the app before having read everything. Fixes: `6bf41020b7` ("selftests: mptcp: update and extend fastclose test-cases") Cc: stable@vger.kernel.org Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-5-a4332c714e10@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-11 17:49:49 -08:00
Matthieu Baerts (NGI0)	290493078b	selftests: mptcp: join: userspace: longer transfer In rare cases, when the test environment is very slow, some userspace tests can fail because some expected events have not been seen. Because the tests are expecting a long on-going connection, and they are not waiting for the end of the transfer, it is fine to make the connection longer. This connection will be killed at the end, after the verifications, so making it longer doesn't change anything, apart from avoid it to end before the end of the verifications To play it safe, all userspace tests not waiting for the end of the transfer are now sharing a longer file (128KB) at slow speed. Fixes: `4369c198e5` ("selftests: mptcp: test userspace pm out of transfer") Cc: stable@vger.kernel.org Fixes: `b2e2248f36` ("selftests: mptcp: userspace pm create id 0 subflow") Fixes: `e3b47e460b` ("selftests: mptcp: userspace pm remove initial subflow") Fixes: `b9fb176081` ("selftests: mptcp: userspace pm send RM_ADDR for ID 0") Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-4-a4332c714e10@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-11 17:49:49 -08:00
Matthieu Baerts (NGI0)	6457595db9	selftests: mptcp: join: endpoints: longer transfer In rare cases, when the test environment is very slow, some userspace tests can fail because some expected events have not been seen. Because the tests are expecting a long on-going connection, and they are not waiting for the end of the transfer, it is fine to make the connection longer. This connection will be killed at the end, after the verifications, so making it longer doesn't change anything, apart from avoid it to end before the end of the verifications To play it safe, all endpoints tests not waiting for the end of the transfer are now sharing a longer file (128KB) at slow speed. Fixes: `69c6ce7b6e` ("selftests: mptcp: add implicit endpoint test case") Cc: stable@vger.kernel.org Fixes: `e274f71540` ("selftests: mptcp: add subflow limits test-cases") Fixes: `b5e2fb832f` ("selftests: mptcp: add explicit test case for remove/readd") Fixes: `e06959e9ee` ("selftests: mptcp: join: test for flush/re-add endpoints") Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-3-a4332c714e10@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-11 17:49:48 -08:00

1 2 3 4 5 ...

1398107 Commits