commit 826035498e upstream.
This new helper function validates that unknown family and chain type
coming from userspace do not trigger an out-of-bound array access. Bail
out in case __nft_chain_type_get() returns NULL from
nft_chain_parse_hook().
Fixes: 9370761c56 ("netfilter: nf_tables: convert built-in tables/chains to chain types")
Reported-by: syzbot+156a04714799b1d480bc@syzkaller.appspotmail.com
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b067fa009c ]
If this flag is set, timeout and state are irrelevant to userspace.
Fixes: 90964016e5 ("netfilter: nf_conntrack: add IPS_OFFLOAD status bit")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 270a8a297f ]
This is fixing flow offload for UDP traffic where packets only follow
one single direction.
The flow_offload_fixup_tcp() mechanism works fine in case that the
offloaded entry remains in SYN_RECV state, given sequence tracking is
reset and that conntrack handles syn+ack packets as a retransmission, ie.
sES + synack => sIG
for reply traffic.
Fixes: a3c90f7a23 ("netfilter: nf_tables: flow offload expression")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a01cbae57e ]
Use the element from the loop iteration, not the same element we want to
deactivate otherwise this branch always evaluates true.
Fixes: 6c03ae210c ("netfilter: nft_set_hash: add non-resizable hashtable implementation")
Reported-by: Florian Westphal <fw@strlen.de>
Tested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a3fb3698ca ]
When interface is down, offload cleanup function(nf_flow_table_do_cleanup)
is called and that checks whether interface index of offload and
index of link down interface is same. but only interface index checking
is not enough because flowtable is not pernet list.
So that, if other netns's interface that has index is same with offload
is down, that offload will be removed.
This patch adds netns checking code to the offload cleanup routine.
Fixes: 59c466dd68 ("netfilter: nf_flow_table: add a new flow state for tearing down offloading")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4a3e71b7b7 ]
The nft_osf extension, like xt_osf, is not supported from the output
path.
Fixes: b96af92d6e ("netfilter: nf_tables: implement Passive OS fingerprint module in nft_osf")
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 335178d542 upstream.
syzbot reported following crash:
list_del corruption, ffff88808c9bb000->prev is LIST_POISON2 (dead000000000122)
[..]
Call Trace:
__list_del_entry include/linux/list.h:131 [inline]
list_del_rcu include/linux/rculist.h:148 [inline]
nf_tables_commit+0x1068/0x3b30 net/netfilter/nf_tables_api.c:7183
[..]
The commit transaction list has:
NFT_MSG_NEWTABLE
NFT_MSG_NEWFLOWTABLE
NFT_MSG_DELFLOWTABLE
NFT_MSG_DELTABLE
A missing generation check during DELTABLE processing causes it to queue
the DELFLOWTABLE operation a second time, so we corrupt the list here:
case NFT_MSG_DELFLOWTABLE:
list_del_rcu(&nft_trans_flowtable(trans)->list);
nf_tables_flowtable_notify(&trans->ctx,
because we have two different DELFLOWTABLE transactions for the same
flowtable. We then call list_del_rcu() twice for the same flowtable->list.
The object handling seems to suffer from the same bug so add a generation
check too and only queue delete transactions for flowtables/objects that
are still active in the next generation.
Reported-by: syzbot+37a6804945a3a13b1572@syzkaller.appspotmail.com
Fixes: 3b49e2e94e ("netfilter: nf_tables: add flow table netlink frontend")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 22dad713b8 upstream.
The set uadt functions assume lineno is never NULL, but it is in
case of ip_set_utest().
syzkaller managed to generate a netlink message that calls this with
LINENO attr present:
general protection fault: 0000 [#1] PREEMPT SMP KASAN
RIP: 0010:hash_mac4_uadt+0x1bc/0x470 net/netfilter/ipset/ip_set_hash_mac.c:104
Call Trace:
ip_set_utest+0x55b/0x890 net/netfilter/ipset/ip_set_core.c:1867
nfnetlink_rcv_msg+0xcf2/0xfb0 net/netfilter/nfnetlink.c:229
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
nfnetlink_rcv+0x1ba/0x460 net/netfilter/nfnetlink.c:563
pass a dummy lineno storage, its easier than patching all set
implementations.
This seems to be a day-0 bug.
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Reported-by: syzbot+34bd2369d38707f3f4a7@syzkaller.appspotmail.com
Fixes: a7b4f989a6 ("netfilter: ipset: IP set core support")
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1d9a7acd3d upstream.
The timeout pointer can be NULL which means we should modify the
per-nets timeout instead.
All do this, except sctp and dccp which instead give:
general protection fault: 0000 [#1] PREEMPT SMP KASAN
net/netfilter/nf_conntrack_proto_dccp.c:682
ctnl_timeout_parse_policy+0x150/0x1d0 net/netfilter/nfnetlink_cttimeout.c:67
cttimeout_default_set+0x150/0x1c0 net/netfilter/nfnetlink_cttimeout.c:368
nfnetlink_rcv_msg+0xcf2/0xfb0 net/netfilter/nfnetlink.c:229
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
Reported-by: syzbot+46a4ad33f345d1dd346e@syzkaller.appspotmail.com
Fixes: c779e84960 ("netfilter: conntrack: remove get_timeout() indirection")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0d2c96af79 ]
Userspace might bogusly sent NFT_DATA_VERDICT in several netlink
attributes that assume NFT_DATA_VALUE. Moreover, make sure that error
path invokes nft_data_release() to decrement the reference count on the
chain object.
Fixes: 96518518cc ("netfilter: add nftables")
Fixes: 0f3cd9b369 ("netfilter: nf_tables: add range expression")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bffc124b6f ]
Only NFTA_SET_ELEM_KEY and NFTA_SET_ELEM_FLAGS make sense for elements
whose NFT_SET_ELEM_INTERVAL_END flag is set on.
Fixes: 96518518cc ("netfilter: add nftables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit db3b665dd7 ]
The existing rbtree implementation might store consecutive elements
where the closing element and the opening element might overlap, eg.
[ a, a+1) [ a+1, a+2)
This patch removes the optimization for non-anonymous sets in the exact
matching case, where it is assumed to stop searching in case that the
closing element is found. Instead, invalidate candidate interval and
keep looking further in the tree.
The lookup/get operation might return false, while there is an element
in the rbtree. Moreover, the get operation returns true as if a+2 would
be in the tree. This happens with named sets after several set updates.
The existing lookup optimization (that only works for the anonymous
sets) might not reach the opening [ a+1,... element if the closing
...,a+1) is found in first place when walking over the rbtree. Hence,
walking the full tree in that case is needed.
This patch fixes the lookup and get operations.
Fixes: e701001e7c ("netfilter: nft_rbtree: allow adjacent intervals with dynamic updates")
Fixes: ba0e4d9917 ("netfilter: nf_tables: get set elements via netlink")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 18a110b022 ]
Curtis Taylor and Jon Maxwell reported and debugged a crash on 3.10
based kernel.
Crash occurs in ctnetlink_conntrack_events because net->nfnl socket is
NULL. The nfnl socket was set to NULL by netns destruction running on
another cpu.
The exiting network namespace calls the relevant destructors in the
following order:
1. ctnetlink_net_exit_batch
This nulls out the event callback pointer in struct netns.
2. nfnetlink_net_exit_batch
This nulls net->nfnl socket and frees it.
3. nf_conntrack_cleanup_net_list
This removes all remaining conntrack entries.
This is order is correct. The only explanation for the crash so ar is:
cpu1: conntrack is dying, eviction occurs:
-> nf_ct_delete()
-> nf_conntrack_event_report \
-> nf_conntrack_eventmask_report
-> notify->fcn() (== ctnetlink_conntrack_events).
cpu1: a. fetches rcu protected pointer to obtain ctnetlink event callback.
b. gets interrupted.
cpu2: runs netns exit handlers:
a runs ctnetlink destructor, event cb pointer set to NULL.
b runs nfnetlink destructor, nfnl socket is closed and set to NULL.
cpu1: c. resumes and trips over NULL net->nfnl.
Problem appears to be that ctnetlink_net_exit_batch only prevents future
callers of nf_conntrack_eventmask_report() from obtaining the callback.
It doesn't wait of other cpus that might have already obtained the
callbacks address.
I don't see anything in upstream kernels that would prevent similar
crash: We need to wait for all cpus to have exited the event callback.
Fixes: 9592a5c01e ("netfilter: ctnetlink: netns support")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8cb4ec44de ]
On Big Endian architectures, u16 port value was extracted from the wrong
parts of u32 sreg_port, just like commit 10596608c4 ("netfilter:
nf_tables: fix mismatch in big-endian system") describes.
Fixes: 4ed8eb6570 ("netfilter: nf_tables: Add native tproxy support")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Florian Westphal <fw@strlen.de>
Acked-by: Máté Eckl <ecklm94@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bd085ef678 ]
The MTU update code is supposed to be invoked in response to real
networking events that update the PMTU. In IPv6 PMTU update function
__ip6_rt_update_pmtu() we called dst_confirm_neigh() to update neighbor
confirmed time.
But for tunnel code, it will call pmtu before xmit, like:
- tnl_update_pmtu()
- skb_dst_update_pmtu()
- ip6_rt_update_pmtu()
- __ip6_rt_update_pmtu()
- dst_confirm_neigh()
If the tunnel remote dst mac address changed and we still do the neigh
confirm, we will not be able to update neigh cache and ping6 remote
will failed.
So for this ip_tunnel_xmit() case, _EVEN_ if the MTU is changed, we
should not be invoking dst_confirm_neigh() as we have no evidence
of successful two-way communication at this point.
On the other hand it is also important to keep the neigh reachability fresh
for TCP flows, so we cannot remove this dst_confirm_neigh() call.
To fix the issue, we have to add a new bool parameter for dst_ops.update_pmtu
to choose whether we should do neigh update or not. I will add the parameter
in this patch and set all the callers to true to comply with the previous
way, and fix the tunnel code one by one on later patches.
v5: No change.
v4: No change.
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
dst_ops.update_pmtu to control whether we should do neighbor confirm.
Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.
Suggested-by: David Miller <davem@davemloft.net>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0b9173f468 ]
Bridge packets that are forwarded have skb->dst == NULL and get
dropped by the check introduced by
b60a77386b (net: make skb_dst_force
return true when dst is refcounted).
To fix this we check skb_dst() before skb_dst_force(), so we don't
drop skb packet with dst == NULL. This holds also for skb at the
PRE_ROUTING hook so we remove the second check.
Fixes: b60a77386b ("net: make skb_dst_force return true when dst is refcounted")
Signed-off-by: Marco Oliverio <marco.oliverio@tanaza.com>
Signed-off-by: Rocco Folino <rocco.folino@tanaza.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 447750f281 ]
Its possible to set both HANDLE and POSITION when replacing a rule.
In this case, the rule at POSITION gets replaced using the
userspace-provided handle. Rule handles are supposed to be generated
by the kernel only.
Duplicate handles should be harmless, however better disable this "feature"
by only checking for the POSITION attribute on insert operations.
Fixes: 5e94846686 ("netfilter: nf_tables: add insert operation")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit eb8950861c ]
If nla_nest_start() may fail. The fix checks its return value and goes
to nla_put_failure if it fails.
Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8294059931 ]
Each media stream negotiation between 2 SIP peers will trigger creation
of 4 different expectations (2 RTP and 2 RTCP):
- INVITE will create expectations for the media packets sent by the
called peer
- reply to the INVITE will create expectations for media packets sent
by the caller
The dport used by these expectations usually match the ones selected
by the SIP peers, but they might get translated due to conflicts with
another expectation. When such event occur, it is important to do
this translation in both directions, dport translation on the receiving
path and sport translation on the sending path.
This commit fixes the sport translation when the peer requiring it is
also the one that starts the media stream. In this scenario, first media
stream packet is forwarded from LAN to WAN and will rely on
nf_nat_sip_expected() to do the necessary sport translation. However, the
expectation matched by this packet does not contain the necessary information
for doing SNAT, this data being stored in the paired expectation created by
the sender's SIP message (INVITE or reply to it).
Signed-off-by: Alin Nastac <alin.nastac@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d701d81172 ]
Zero pad private area, otherwise we expose private kernel pointer to
userspace. This patch also zeroes the tail area after the ->matchsize
and ->targetsize that results from XT_ALIGN().
Fixes: 0ca743a559 ("netfilter: nf_tables: add compatibility layer for x_tables")
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fa5950e498 ]
None of these spots really needs to crash the kernel.
In one two cases we can jsut report error to userspace, in the other
cases we can just use WARN_ON (and leak memory instead).
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 97664bc2c7 ]
Same as commit 1b4a75108d ("netfilter: ipset: Copy the right MAC
address in bitmap:ip,mac and hash:ip,mac sets"), another copy and paste
went wrong in commit 8cc4ccf583 ("netfilter: ipset: Allow matching on
destination MAC address for mac and ipmac sets").
When I fixed this for IPv4 in 1b4a75108d, I didn't realise that
hash:ip,mac sets also support IPv6 as family, and this is covered by a
separate function, hash_ipmac6_kadt().
In hash:ip,mac sets, the first dimension is the IP address, and the
second dimension is the MAC address: check the IPSET_DIM_TWO_SRC flag
in flags while deciding which MAC address to copy, destination or
source.
This way, mixing source and destination matches for the two dimensions
of ip,mac hash type works as expected, also for IPv6. With this setup:
ip netns add A
ip link add veth1 type veth peer name veth2 netns A
ip addr add 2001:db8::1/64 dev veth1
ip -net A addr add 2001:db8::2/64 dev veth2
ip link set veth1 up
ip -net A link set veth2 up
dst=$(ip netns exec A cat /sys/class/net/veth2/address)
ip netns exec A ipset create test_hash hash:ip,mac family inet6
ip netns exec A ipset add test_hash 2001:db8::1,${dst}
ip netns exec A ip6tables -A INPUT -p icmpv6 --icmpv6-type 135 -j ACCEPT
ip netns exec A ip6tables -A INPUT -m set ! --match-set test_hash src,dst -j DROP
ipset now correctly matches a test packet:
# ping -c1 2001:db8::2 >/dev/null
# echo $?
0
Reported-by: Chen, Yi <yiche@redhat.com>
Fixes: 8cc4ccf583 ("netfilter: ipset: Allow matching on destination MAC address for mac and ipmac sets")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c24b75e0f9 ]
syzbot reported the following issue :
BUG: KCSAN: data-race in update_defense_level / update_defense_level
read to 0xffffffff861a6260 of 4 bytes by task 3006 on cpu 1:
update_defense_level+0x621/0xb30 net/netfilter/ipvs/ip_vs_ctl.c:177
defense_work_handler+0x3d/0xd0 net/netfilter/ipvs/ip_vs_ctl.c:225
process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
worker_thread+0xa0/0x800 kernel/workqueue.c:2415
kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
write to 0xffffffff861a6260 of 4 bytes by task 7333 on cpu 0:
update_defense_level+0xa62/0xb30 net/netfilter/ipvs/ip_vs_ctl.c:205
defense_work_handler+0x3d/0xd0 net/netfilter/ipvs/ip_vs_ctl.c:225
process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
worker_thread+0xa0/0x800 kernel/workqueue.c:2415
kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 7333 Comm: kworker/0:5 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events defense_work_handler
Indeed, old_secure_tcp is currently a static variable, while it
needs to be a per netns variable.
Fixes: a0840e2e16 ("IPVS: netns, ip_vs_ctl local vars moved to ipvs struct.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 30b7244d79 upstream.
The copy_to_user() function returns the number of bytes remaining to be
copied. In this code, that positive return is checked at the end of the
function and we return zero/success. What we should do instead is
return -EFAULT.
Fixes: a7b4f989a6 ("netfilter: ipset: IP set core support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 29edbc3ebd ]
Set types bitmap:ipmac and hash:ipmac check that MAC addresses
are not all zeroes.
Introduce one missing check, and make the remaining ones
consistent, using is_zero_ether_addr() instead of comparing
against an array containing zeroes.
This was already done for hash:mac sets in commit 26c97c5d8d
("netfilter: ipset: Use is_zero_ether_addr instead of static and
memcmp").
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit acab713177 ]
This un-breaks lookups in sets that have the 'dynamic' flag set.
Given this active example configuration:
table filter {
set set1 {
type ipv4_addr
size 64
flags dynamic,timeout
timeout 1m
}
chain input {
type filter hook input priority 0; policy accept;
}
}
... this works:
nft add rule ip filter input add @set1 { ip saddr }
-> whenever rule is triggered, the source ip address is inserted
into the set (if it did not exist).
This won't work:
nft add rule ip filter input ip saddr @set1 counter
Error: Could not process rule: Operation not supported
In other words, we can add entries to the set, but then can't make
matching decision based on that set.
That is just wrong -- all set backends support lookups (else they would
not be very useful).
The failure comes from an explicit rejection in nft_lookup.c.
Looking at the history, it seems like NFT_SET_EVAL used to mean
'set contains expressions' (aka. "is a meter"), for instance something like
nft add rule ip filter input meter example { ip saddr limit rate 10/second }
or
nft add rule ip filter input meter example { ip saddr counter }
The actual meaning of NFT_SET_EVAL however, is
'set can be updated from the packet path'.
'meters' and packet-path insertions into sets, such as
'add @set { ip saddr }' use exactly the same kernel code (nft_dynset.c)
and thus require a set backend that provides the ->update() function.
The only set that provides this also is the only one that has the
NFT_SET_EVAL feature flag.
Removing the wrong check makes the above example work.
While at it, also fix the flag check during set instantiation to
allow supported combinations only.
Fixes: 8aeff920dc ("netfilter: nf_tables: add stateful object reference to set elements")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3a069024d3 ]
The find_pattern() debug output was printing the 'skip' character.
This can be a NULL-byte and messes up further pr_debug() output.
Output without the fix:
kernel: nf_conntrack_ftp: Pattern matches!
kernel: nf_conntrack_ftp: Skipped up to `<7>nf_conntrack_ftp: find_pattern `PORT': dlen = 8
kernel: nf_conntrack_ftp: find_pattern `EPRT': dlen = 8
Output with the fix:
kernel: nf_conntrack_ftp: Pattern matches!
kernel: nf_conntrack_ftp: Skipped up to 0x0 delimiter!
kernel: nf_conntrack_ftp: Match succeeded!
kernel: nf_conntrack_ftp: conntrack_ftp: match `172,17,0,100,200,207' (20 bytes at 4150681645)
kernel: nf_conntrack_ftp: find_pattern `PORT': dlen = 8
Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3cf2f450ff ]
Simplify the check in physdev_mt_check() to emit an error message
only when passed an invalid chain (ie, NF_INET_LOCAL_OUT).
This avoids cluttering up the log with errors against valid rules.
For large/heavily modified rulesets, current behavior can quickly
overwhelm the ring buffer, because this function gets called on
every change, regardless of the rule that was changed.
Signed-off-by: Todd Seidelmann <tseidelmann@linode.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 89a26cd4b5 ]
When running a 64-bit kernel with a 32-bit iptables binary, the size of
the xt_nfacct_match_info struct diverges.
kernel: sizeof(struct xt_nfacct_match_info) : 40
iptables: sizeof(struct xt_nfacct_match_info)) : 36
Trying to append nfacct related rules results in an unhelpful message.
Although it is suggested to look for more information in dmesg, nothing
can be found there.
# iptables -A <chain> -m nfacct --nfacct-name <acct-object>
iptables: Invalid argument. Run `dmesg' for more information.
This patch fixes the memory misalignment by enforcing 8-byte alignment
within the struct's first revision. This solution is often used in many
other uapi netfilter headers.
Signed-off-by: Juliana Rodrigueiro <juliana.rodrigueiro@intra2net.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 110e48725d upstream.
Set up the default timeout for this new entry otherwise the garbage
collector might quickly remove it right after the flowtable insertion.
Fixes: ac2a66665e ("netfilter: add generic flow table infrastructure")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit dfe42be15f ]
TCP rst and fin packets do not qualify to place a flow into the
flowtable. Most likely there will be no more packets after connection
closure. Without this patch, this flow entry expires and connection
tracking picks up the entry in ESTABLISHED state using the fixup
timeout, which makes this look inconsistent to the user for a connection
that is actually already closed.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6a0a8d10a3 ]
If a rule that has already a bound anonymous set fails to be added, the
preparation phase releases the rule and the bound set. However, the
transaction object from the abort path still has a reference to the set
object that is stale, leading to a use-after-free when checking for the
set->bound field. Add a new field to the transaction that specifies if
the set is bound, so the abort path can skip releasing it since the rule
command owns it and it takes care of releasing it. After this update,
the set->bound field is removed.
[ 24.649883] Unable to handle kernel paging request at virtual address 0000000000040434
[ 24.657858] Mem abort info:
[ 24.660686] ESR = 0x96000004
[ 24.663769] Exception class = DABT (current EL), IL = 32 bits
[ 24.669725] SET = 0, FnV = 0
[ 24.672804] EA = 0, S1PTW = 0
[ 24.675975] Data abort info:
[ 24.678880] ISV = 0, ISS = 0x00000004
[ 24.682743] CM = 0, WnR = 0
[ 24.685723] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000428952000
[ 24.692207] [0000000000040434] pgd=0000000000000000
[ 24.697119] Internal error: Oops: 96000004 [#1] SMP
[...]
[ 24.889414] Call trace:
[ 24.891870] __nf_tables_abort+0x3f0/0x7a0
[ 24.895984] nf_tables_abort+0x20/0x40
[ 24.899750] nfnetlink_rcv_batch+0x17c/0x588
[ 24.904037] nfnetlink_rcv+0x13c/0x190
[ 24.907803] netlink_unicast+0x18c/0x208
[ 24.911742] netlink_sendmsg+0x1b0/0x350
[ 24.915682] sock_sendmsg+0x4c/0x68
[ 24.919185] ___sys_sendmsg+0x288/0x2c8
[ 24.923037] __sys_sendmsg+0x7c/0xd0
[ 24.926628] __arm64_sys_sendmsg+0x2c/0x38
[ 24.930744] el0_svc_common.constprop.0+0x94/0x158
[ 24.935556] el0_svc_handler+0x34/0x90
[ 24.939322] el0_svc+0x8/0xc
[ 24.942216] Code: 37280300 f9404023 91014262 aa1703e0 (f9401863)
[ 24.948336] ---[ end trace cebbb9dcbed3b56f ]---
Fixes: f6ac858589 ("netfilter: nf_tables: unbind set in rule from commit path")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6c1f7e2c1b ]
Shijie Luo reported that when stress-testing ipset with multiple concurrent
create, rename, flush, list, destroy commands, it can result
ipset <version>: Broken LIST kernel message: missing DATA part!
error messages and broken list results. The problem was the rename operation
was not properly handled with respect of listing. The patch fixes the issue.
Reported-by: Shijie Luo <luoshijie1@huawei.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1b4a75108d ]
In commit 8cc4ccf583 ("ipset: Allow matching on destination MAC address
for mac and ipmac sets"), ipset.git commit 1543514c46a7, I added to the
KADT functions for sets matching on MAC addreses the copy of source or
destination MAC address depending on the configured match.
This was done correctly for hash:mac, but for hash:ip,mac and
bitmap:ip,mac, copying and pasting the same code block presents an
obvious problem: in these two set types, the MAC address is the second
dimension, not the first one, and we are actually selecting the MAC
address depending on whether the first dimension (IP address) specifies
source or destination.
Fix this by checking for the IPSET_DIM_TWO_SRC flag in option flags.
This way, mixing source and destination matches for the two dimensions
of ip,mac set types works as expected. With this setup:
ip netns add A
ip link add veth1 type veth peer name veth2 netns A
ip addr add 192.0.2.1/24 dev veth1
ip -net A addr add 192.0.2.2/24 dev veth2
ip link set veth1 up
ip -net A link set veth2 up
dst=$(ip netns exec A cat /sys/class/net/veth2/address)
ip netns exec A ipset create test_bitmap bitmap:ip,mac range 192.0.0.0/16
ip netns exec A ipset add test_bitmap 192.0.2.1,${dst}
ip netns exec A iptables -A INPUT -m set ! --match-set test_bitmap src,dst -j DROP
ip netns exec A ipset create test_hash hash:ip,mac
ip netns exec A ipset add test_hash 192.0.2.1,${dst}
ip netns exec A iptables -A INPUT -m set ! --match-set test_hash src,dst -j DROP
ipset correctly matches a test packet:
# ping -c1 192.0.2.2 >/dev/null
# echo $?
0
Reported-by: Chen Yi <yiche@redhat.com>
Fixes: 8cc4ccf583 ("ipset: Allow matching on destination MAC address for mac and ipmac sets")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b89d15480d ]
In commit 8cc4ccf583 ("ipset: Allow matching on destination MAC address
for mac and ipmac sets"), ipset.git commit 1543514c46a7, I removed the
KADT check that prevents matching on destination MAC addresses for
hash:mac sets, but forgot to remove the same check for hash:ip,mac set.
Drop this check: functionality is now commented in man pages and there's
no reason to restrict to source MAC address matching anymore.
Reported-by: Chen Yi <yiche@redhat.com>
Fixes: 8cc4ccf583 ("ipset: Allow matching on destination MAC address for mac and ipmac sets")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 656c8e9cc1 upstream.
Change ct id hash calculation to only use invariants.
Currently the ct id hash calculation is based on some fields that can
change in the lifetime on a conntrack entry in some corner cases. The
current hash uses the whole tuple which contains an hlist pointer which
will change when the conntrack is placed on the dying list resulting in
a ct id change.
This patch also removes the reply-side tuple and extension pointer from
the hash calculation so that the ct id will will not change from
initialization until confirmation.
Fixes: 3c79107631 ("netfilter: ctnetlink: don't use conntrack/expect object addresses as id")
Signed-off-by: Dirk Morris <dmorris@metaloft.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 28b1d6ef53 ]
The rule below doesn't work as the kernel raises -ERANGE.
nft add rule netdev nftlb lb01 ip daddr set \
symhash mod 1 map { 0 : 192.168.0.10 } fwd to "eth0"
This patch allows to use the symhash modulus with one
element, in the same way that the other types of hashes and
algorithms that uses the modulus parameter.
Signed-off-by: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 959b69ef57 ]
Jakub Jankowski reported following oddity:
After 3 way handshake completes, timeout of new connection is set to
max_retrans (300s) instead of established (5 days).
shortened excerpt from pcap provided:
25.070622 IP (flags [DF], proto TCP (6), length 52)
10.8.5.4.1025 > 10.8.1.2.80: Flags [S], seq 11, win 64240, [wscale 8]
26.070462 IP (flags [DF], proto TCP (6), length 48)
10.8.1.2.80 > 10.8.5.4.1025: Flags [S.], seq 82, ack 12, win 65535, [wscale 3]
27.070449 IP (flags [DF], proto TCP (6), length 40)
10.8.5.4.1025 > 10.8.1.2.80: Flags [.], ack 83, win 512, length 0
Turns out the last_win is of u16 type, but we store the scaled value:
512 << 8 (== 0x20000) becomes 0 window.
The Fixes tag is not correct, as the bug has existed forever, but
without that change all that this causes might cause is to mistake a
window update (to-nonzero-from-zero) for a retransmit.
Fixes: fbcd253d24 ("netfilter: conntrack: lower timeout to RETRANS seconds if window is 0")
Reported-by: Jakub Jankowski <shasta@toxcorp.com>
Tested-by: Jakub Jankowski <shasta@toxcorp.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>