linux

mirror of https://github.com/raspberrypi/linux.git synced 2025-12-17 07:14:25 +00:00

Author	SHA1	Message	Date
Florian Westphal	3d21d90d68	netfilter: nf_log: don't zap all loggers on unregister commit `205ee117d4` upstream. like nf_log_unset, nf_log_unregister must not reset the list of loggers. Otherwise, a call to nf_log_unregister() will render loggers of other nf protocols unusable: iptables -A INPUT -j LOG modprobe nf_log_arp ; rmmod nf_log_arp iptables -A INPUT -j LOG iptables: No chain/target/match by that name Fixes: `30e0c6a6be` ("netfilter: nf_log: prepare net namespace support for loggers") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-10-22 14:49:23 -07:00
Pablo Neira Ayuso	9177f81fcb	netfilter: nft_compat: skip family comparison in case of NFPROTO_UNSPEC commit `ba378ca9c0` upstream. Fix lookup of existing match/target structures in the corresponding list by skipping the family check if NFPROTO_UNSPEC is used. This is resulting in the allocation and insertion of one match/target structure for each use of them. So this not only bloats memory consumption but also severely affects the time to reload the ruleset from the iptables-compat utility. After this patch, iptables-compat-restore and iptables-compat take almost the same time to reload large rulesets. Fixes: `0ca743a559` ("netfilter: nf_tables: add compatibility layer for x_tables") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-10-22 14:49:23 -07:00
Pablo Neira Ayuso	e1f5251641	netfilter: nf_log: wait for rcu grace after logger unregistration commit `ad5001cc7c` upstream. The nf_log_unregister() function needs to call synchronize_rcu() to make sure that the objects are not dereferenced anymore on module removal. Fixes: `5962815a6a` ("netfilter: nf_log: use an array of loggers instead of list") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-10-22 14:49:23 -07:00
Daniel Borkmann	14573919af	netfilter: conntrack: use nf_ct_tmpl_free in CT/synproxy error paths commit `9cf94eab8b` upstream. Commit `0838aa7fcf` ("netfilter: fix netns dependencies with conntrack templates") migrated templates to the new allocator api, but forgot to update error paths for them in CT and synproxy to use nf_ct_tmpl_free() instead of nf_conntrack_free(). Due to that, memory is being freed into the wrong kmemcache, but also we drop the per net reference count of ct objects causing an imbalance. In Brad's case, this leads to a wrap-around of net->ct.count and thus lets __nf_conntrack_alloc() refuse to create a new ct object: [ 10.340913] xt_addrtype: ipv6 does not support BROADCAST matching [ 10.810168] nf_conntrack: table full, dropping packet [ 11.917416] r8169 0000:07:00.0 eth0: link up [ 11.917438] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 12.815902] nf_conntrack: table full, dropping packet [ 15.688561] nf_conntrack: table full, dropping packet [ 15.689365] nf_conntrack: table full, dropping packet [ 15.690169] nf_conntrack: table full, dropping packet [ 15.690967] nf_conntrack: table full, dropping packet [...] With slab debugging, it also reports the wrong kmemcache (kmalloc-512 vs. nf_conntrack_ffffffff81ce75c0) and reports poison overwrites, etc. Thus, to fix the problem, export and use nf_ct_tmpl_free() instead. Fixes: `0838aa7fcf` ("netfilter: fix netns dependencies with conntrack templates") Reported-by: Brad Jackson <bjackson0971@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-10-22 14:49:23 -07:00
Elad Raz	43fd0843d4	netfilter: ipset: Fixing unnamed union init commit `96be5f2806` upstream. In continue to proposed Vinson Lee's post [1], this patch fixes compilation issues founded at gcc 4.4.7. The initialization of .cidr field of unnamed unions causes compilation error in gcc 4.4.x. References Visible links [1] https://lkml.org/lkml/2015/7/5/74 Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-10-22 14:49:23 -07:00
Jozsef Kadlecsik	44016f5ec2	netfilter: ipset: Out of bound access in hash:net* types fixed commit `6fe7ccfd77` upstream. Dave Jones reported that KASan detected out of bounds access in hash:net* types: [ 23.139532] ================================================================== [ 23.146130] BUG: KASan: out of bounds access in hash_net4_add_cidr+0x1db/0x220 at addr ffff8800d4844b58 [ 23.152937] Write of size 4 by task ipset/457 [ 23.159742] ============================================================================= [ 23.166672] BUG kmalloc-512 (Not tainted): kasan: bad access detected [ 23.173641] ----------------------------------------------------------------------------- [ 23.194668] INFO: Allocated in hash_net_create+0x16a/0x470 age=7 cpu=1 pid=456 [ 23.201836] __slab_alloc.constprop.66+0x554/0x620 [ 23.208994] __kmalloc+0x2f2/0x360 [ 23.216105] hash_net_create+0x16a/0x470 [ 23.223238] ip_set_create+0x3e6/0x740 [ 23.230343] nfnetlink_rcv_msg+0x599/0x640 [ 23.237454] netlink_rcv_skb+0x14f/0x190 [ 23.244533] nfnetlink_rcv+0x3f6/0x790 [ 23.251579] netlink_unicast+0x272/0x390 [ 23.258573] netlink_sendmsg+0x5a1/0xa50 [ 23.265485] SYSC_sendto+0x1da/0x2c0 [ 23.272364] SyS_sendto+0xe/0x10 [ 23.279168] entry_SYSCALL_64_fastpath+0x12/0x6f The bug is fixed in the patch and the testsuite is extended in ipset to check cidr handling more thoroughly. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-10-22 14:49:23 -07:00
Pablo Neira Ayuso	fa193d93ec	netfilter: nfnetlink: work around wrong endianess in res_id field commit `a9de9777d6` upstream. The convention in nfnetlink is to use network byte order in every header field as well as in the attribute payload. The initial version of the batching infrastructure assumes that res_id comes in host byte order though. The only client of the batching infrastructure is nf_tables, so let's add a workaround to address this inconsistency. We currently have 11 nfnetlink subsystems according to NFNL_SUBSYS_COUNT, so we can assume that the subsystem 2560, ie. htons(10), will not be allocated anytime soon, so it can be an alias of nf_tables from the nfnetlink batching path when interpreting the res_id field. Based on original patch from Florian Westphal. Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-10-22 14:49:23 -07:00
Joe Stringer	f58e5aa7b8	netfilter: conntrack: Use flags in nf_ct_tmpl_alloc() The flags were ignored for this function when it was introduced. Also fix the style problem in kzalloc. Fixes: `0838aa7fc` (netfilter: fix netns dependencies with conntrack templates) Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-08-05 10:56:43 +02:00
Dan Carpenter	1a727c6361	netfilter: nf_conntrack: checking for IS_ERR() instead of NULL We recently changed this from nf_conntrack_alloc() to nf_ct_tmpl_alloc() so the error handling needs to changed to check for NULL instead of IS_ERR(). Fixes: `0838aa7fcf` ('netfilter: fix netns dependencies with conntrack templates') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-07-30 14:04:19 +02:00
Pablo Neira Ayuso	f0ad462189	netfilter: nf_conntrack: silence warning on falling back to vmalloc() Since `88eab472ec` ("netfilter: conntrack: adjust nf_conntrack_buckets default value"), the hashtable can easily hit this warning. We got reports from users that are getting this message in a quite spamming fashion, so better silence this. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Florian Westphal <fw@strlen.de>	2015-07-30 12:32:55 +02:00
Joe Stringer	4b31814d20	netfilter: nf_conntrack: Support expectations in different zones When zones were originally introduced, the expectation functions were all extended to perform lookup using the zone. However, insertion was not modified to check the zone. This means that two expectations which are intended to apply for different connections that have the same tuple but exist in different zones cannot both be tracked. Fixes: `5d0aa2ccd4` (netfilter: nf_conntrack: add support for "conntrack zones") Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-07-22 17:00:47 +02:00
Pablo Neira Ayuso	b64f48dcda	Merge tag 'ipvs-fixes-for-v4.2' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs Simon Horman says: ==================== IPVS Fixes for v4.2 please consider this fix for v4.2. For reasons that are not clear to me it is a bumper crop. It seems to me that they are all relevant to stable. Please let me know if you need my help to get the fixes into stable. * ipvs: fix ipv6 route unreach panic This problem appears to be present since IPv6 support was added to IPVS in v2.6.28. * ipvs: skb_orphan in case of forwarding This appears to resolve a problem resulting from a side effect of `41063e9dd1` ("ipv4: Early TCP socket demux.") which was included in v3.6. * ipvs: do not use random local source address for tunnels This appears to resolve a problem introduced by `026ace060d` ("ipvs: optimize dst usage for real server") in v3.10. * ipvs: fix crash if scheduler is changed This appears to resolve a problem introduced by `ceec4c3816` ("ipvs: convert services to rcu") in v3.10. Julian has provided backports of the fix: * [PATCHv2 3.10.81] ipvs: fix crash if scheduler is changed http://www.spinics.net/lists/lvs-devel/msg04008.html * [PATCHv2 3.12.44,3.14.45,3.18.16,4.0.6] ipvs: fix crash if scheduler is changed http://www.spinics.net/lists/lvs-devel/msg04007.html Please let me know how you would like to handle guiding these backports into stable. * ipvs: fix crash with sync protocol v0 and FTP This appears to resolve a problem introduced by `749c42b620` ("ipvs: reduce sync rate with time thresholds") in v3.5 ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-07-20 15:01:19 +02:00
Pablo Neira Ayuso	0838aa7fcf	netfilter: fix netns dependencies with conntrack templates Quoting Daniel Borkmann: "When adding connection tracking template rules to a netns, f.e. to configure netfilter zones, the kernel will endlessly busy-loop as soon as we try to delete the given netns in case there's at least one template present, which is problematic i.e. if there is such bravery that the priviledged user inside the netns is assumed untrusted. Minimal example: ip netns add foo ip netns exec foo iptables -t raw -A PREROUTING -d 1.2.3.4 -j CT --zone 1 ip netns del foo What happens is that when nf_ct_iterate_cleanup() is being called from nf_conntrack_cleanup_net_list() for a provided netns, we always end up with a net->ct.count > 0 and thus jump back to i_see_dead_people. We don't get a soft-lockup as we still have a schedule() point, but the serving CPU spins on 100% from that point onwards. Since templates are normally allocated with nf_conntrack_alloc(), we also bump net->ct.count. The issue why they are not yet nf_ct_put() is because the per netns .exit() handler from x_tables (which would eventually invoke xt_CT's xt_ct_tg_destroy() that drops reference on info->ct) is called in the dependency chain at a later point in time than the per netns .exit() handler for the connection tracker. This is clearly a chicken'n'egg problem: after the connection tracker .exit() handler, we've teared down all the connection tracking infrastructure already, so rightfully, xt_ct_tg_destroy() cannot be invoked at a later point in time during the netns cleanup, as that would lead to a use-after-free. At the same time, we cannot make x_tables depend on the connection tracker module, so that the xt_ct_tg_destroy() would be invoked earlier in the cleanup chain." Daniel confirms this has to do with the order in which modules are loaded or having compiled nf_conntrack as modules while x_tables built-in. So we have no guarantees regarding the order in which netns callbacks are executed. Fix this by allocating the templates through kmalloc() from the respective SYNPROXY and CT targets, so they don't depend on the conntrack kmem cache. Then, release then via nf_ct_tmpl_free() from destroy_conntrack(). This branch is marked as unlikely since conntrack templates are rarely allocated and only from the configuration plane path. Note that templates are not kept in any list to avoid further dependencies with nf_conntrack anymore, thus, the tmpl larval list is removed. Reported-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Tested-by: Daniel Borkmann <daniel@iogearbox.net>	2015-07-20 14:58:19 +02:00
Julian Anastasov	e3895c0334	ipvs: call skb_sender_cpu_clear Reset XPS's sender_cpu on forwarding. Signed-off-by: Julian Anastasov <ja@ssi.bg> Fixes: `2bd82484bb` ("xps: fix xps for stacked devices") Signed-off-by: Simon Horman <horms@verge.net.au>	2015-07-14 16:41:27 +09:00
Julian Anastasov	56184858d1	ipvs: fix crash with sync protocol v0 and FTP Fix crash in 3.5+ if FTP is used after switching sync_version to 0. Fixes: `749c42b620` ("ipvs: reduce sync rate with time thresholds") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-07-14 16:41:27 +09:00
Alex Gartrell	71563f3414	ipvs: skb_orphan in case of forwarding It is possible that we bind against a local socket in early_demux when we are actually going to want to forward it. In this case, the socket serves no purpose and only serves to confuse things (particularly functions which implicitly expect sk_fullsock to be true, like ip_local_out). Additionally, skb_set_owner_w is totally broken for non full-socks. Signed-off-by: Alex Gartrell <agartrell@fb.com> Fixes: `41063e9dd1` ("ipv4: Early TCP socket demux.") Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-07-14 16:41:27 +09:00
Julian Anastasov	05f00505a8	ipvs: fix crash if scheduler is changed I overlooked the svc->sched_data usage from schedulers when the services were converted to RCU in 3.10. Now the rare ipvsadm -E command can change the scheduler but due to the reverse order of ip_vs_bind_scheduler and ip_vs_unbind_scheduler we provide new sched_data to the old scheduler resulting in a crash. To fix it without changing the scheduler methods we have to use synchronize_rcu() only for the editing case. It means all svc->scheduler readers should expect a NULL value. To avoid breakage for the service listing and ipvsadm -R we can use the "none" name to indicate that scheduler is not assigned, a state when we drop new connections. Reported-by: Alexander Vasiliev <a.vasylev@404-group.com> Fixes: `ceec4c3816` ("ipvs: convert services to rcu") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-07-14 16:41:27 +09:00
Julian Anastasov	4754957f04	ipvs: do not use random local source address for tunnels Michael Vallaly reports about wrong source address used in rare cases for tunneled traffic. Looks like __ip_vs_get_out_rt in 3.10+ is providing uninitialized dest_dst->dst_saddr.ip because ip_vs_dest_dst_alloc uses kmalloc. While we retry after seeing EINVAL from routing for data that does not look like valid local address, it still succeeded when this memory was previously used from other dests and with different local addresses. As result, we can use valid local address that is not suitable for our real server. Fix it by providing 0.0.0.0 every time our cache is refreshed. By this way we will get preferred source address from routing. Reported-by: Michael Vallaly <lvs@nolatency.com> Fixes: `026ace060d` ("ipvs: optimize dst usage for real server") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-07-14 16:41:27 +09:00
Alex Gartrell	326bf17ea5	ipvs: fix ipv6 route unreach panic Previously there was a trivial panic unshare -n /bin/bash <<EOF ip addr add dev lo face::1/128 ipvsadm -A -t [face::1]:15213 ipvsadm -a -t [face::1]:15213 -r b00c::1 echo boom \| nc face::1 15213 EOF This patch allows us to replicate the net logic above and simply capture the skb_dst(skb)->dev and use that for the purpose of the invocation. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-07-14 16:41:27 +09:00
Dmitry Torokhov	484836ec2d	netfilter: IDLETIMER: fix lockdep warning Dynamically allocated sysfs attributes should be initialized with sysfs_attr_init() otherwise lockdep will be angry with us: [ 45.468653] BUG: key ffffffc030fad4e0 not in .data! [ 45.468655] ------------[ cut here ]------------ [ 45.468666] WARNING: CPU: 0 PID: 1176 at /mnt/host/source/src/third_party/kernel/v3.18/kernel/locking/lockdep.c:2991 lockdep_init_map+0x12c/0x490() [ 45.468672] DEBUG_LOCKS_WARN_ON(1) [ 45.468672] CPU: 0 PID: 1176 Comm: iptables Tainted: G U W 3.18.0 #43 [ 45.468674] Hardware name: XXX [ 45.468675] Call trace: [ 45.468680] [<ffffffc0002072b4>] dump_backtrace+0x0/0x10c [ 45.468683] [<ffffffc0002073d0>] show_stack+0x10/0x1c [ 45.468688] [<ffffffc000a86cd4>] dump_stack+0x74/0x94 [ 45.468692] [<ffffffc000217ae0>] warn_slowpath_common+0x84/0xb0 [ 45.468694] [<ffffffc000217b84>] warn_slowpath_fmt+0x4c/0x58 [ 45.468697] [<ffffffc0002530a4>] lockdep_init_map+0x128/0x490 [ 45.468701] [<ffffffc000367ef0>] __kernfs_create_file+0x80/0xe4 [ 45.468704] [<ffffffc00036862c>] sysfs_add_file_mode_ns+0x104/0x170 [ 45.468706] [<ffffffc00036870c>] sysfs_create_file_ns+0x58/0x64 [ 45.468711] [<ffffffc000930430>] idletimer_tg_checkentry+0x14c/0x324 [ 45.468714] [<ffffffc00092a728>] xt_check_target+0x170/0x198 [ 45.468717] [<ffffffc000993efc>] check_target+0x58/0x6c [ 45.468720] [<ffffffc000994c64>] translate_table+0x30c/0x424 [ 45.468723] [<ffffffc00099529c>] do_ipt_set_ctl+0x144/0x1d0 [ 45.468728] [<ffffffc0009079f0>] nf_setsockopt+0x50/0x60 [ 45.468732] [<ffffffc000946870>] ip_setsockopt+0x8c/0xb4 [ 45.468735] [<ffffffc0009661c0>] raw_setsockopt+0x10/0x50 [ 45.468739] [<ffffffc0008c1550>] sock_common_setsockopt+0x14/0x20 [ 45.468742] [<ffffffc0008bd190>] SyS_setsockopt+0x88/0xb8 [ 45.468744] ---[ end trace 41d156354d18c039 ]--- Signed-off-by: Dmitry Torokhov <dtor@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-07-13 17:23:25 +02:00
Pablo Neira Ayuso	95dd8653de	netfilter: ctnetlink: put back references to master ct and expect objects We have to put back the references to the master conntrack and the expectation that we just created, otherwise we'll leak them. Fixes: `0ef71ee1a5` ("netfilter: ctnetlink: refactor ctnetlink_create_expect") Reported-by: Tim Wiess <Tim.Wiess@watchguard.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-07-10 14:18:03 +02:00
Pablo Neira Ayuso	6742b9e310	netfilter: nfnetlink: keep going batch handling on missing modules After a fresh boot with no modules in place at all and a large rulesets, the existing nfnetlink_rcv_batch() funcion can take long time to commit the ruleset due to the many abort path. This is specifically a problem for the existing client of this code, ie. nf_tables, since it results in several synchronize_rcu() call in a row. This patch changes the policy to keep full batch processing on missing modules errors so we abort only once. Reported-by: Eric Leblond <eric@regit.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-07-02 17:59:33 +02:00
Eric W. Biederman	f307170d6e	netfilter: nf_queue: Don't recompute the hook_list head If someone sends packets from one of the netdevice ingress hooks to the a userspace queue, and then userspace later accepts the packet, the netfilter code can enter an infinite loop as the list head will never be found. Pass in the saved list_head to avoid this. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-07-02 15:03:13 +02:00
Eric W. Biederman	8405a8fff3	netfilter: nf_qeueue: Drop queue entries on nf_unregister_hook Add code to nf_unregister_hook to flush the nf_queue when a hook is unregistered. This guarantees that the pointer that the nf_queue code retains into the nf_hook list will remain valid while a packet is queued. I tested what would happen if we do not flush queued packets and was trivially able to obtain the oops below. All that was required was to stop the nf_queue listening process, to delete all of the nf_tables, and to awaken the nf_queue listening process. > BUG: unable to handle kernel paging request at 0000000100000001 > IP: [<0000000100000001>] 0x100000001 > PGD b9c35067 PUD 0 > Oops: 0010 [#1] SMP > Modules linked in: > CPU: 0 PID: 519 Comm: lt-nfqnl_test Not tainted > task: ffff8800b9c8c050 ti: ffff8800ba9d8000 task.ti: ffff8800ba9d8000 > RIP: 0010:[<0000000100000001>] [<0000000100000001>] 0x100000001 > RSP: 0018:ffff8800ba9dba40 EFLAGS: 00010a16 > RAX: ffff8800bab48a00 RBX: ffff8800ba9dba90 RCX: ffff8800ba9dba90 > RDX: ffff8800b9c10128 RSI: ffff8800ba940900 RDI: ffff8800bab48a00 > RBP: ffff8800b9c10128 R08: ffffffff82976660 R09: ffff8800ba9dbb28 > R10: dead000000100100 R11: dead000000200200 R12: ffff8800ba940900 > R13: ffffffff8313fd50 R14: ffff8800b9c95200 R15: 0000000000000000 > FS: 00007fb91fc34700(0000) GS:ffff8800bfa00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000100000001 CR3: 00000000babfb000 CR4: 00000000000007f0 > Stack: > ffffffff8206ab0f ffffffff82982240 ffff8800bab48a00 ffff8800b9c100a8 > ffff8800b9c10100 0000000000000001 ffff8800ba940900 ffff8800b9c10128 > ffffffff8206bd65 ffff8800bfb0d5e0 ffff8800bab48a00 0000000000014dc0 > Call Trace: > [<ffffffff8206ab0f>] ? nf_iterate+0x4f/0xa0 > [<ffffffff8206bd65>] ? nf_reinject+0x125/0x190 > [<ffffffff8206dee5>] ? nfqnl_recv_verdict+0x255/0x360 > [<ffffffff81386290>] ? nla_parse+0x80/0xf0 > [<ffffffff8206c42c>] ? nfnetlink_rcv_msg+0x13c/0x240 > [<ffffffff811b2fec>] ? __memcg_kmem_get_cache+0x4c/0x150 > [<ffffffff8206c2f0>] ? nfnl_lock+0x20/0x20 > [<ffffffff82068159>] ? netlink_rcv_skb+0xa9/0xc0 > [<ffffffff820677bf>] ? netlink_unicast+0x12f/0x1c0 > [<ffffffff82067ade>] ? netlink_sendmsg+0x28e/0x650 > [<ffffffff81fdd814>] ? sock_sendmsg+0x44/0x50 > [<ffffffff81fde07b>] ? ___sys_sendmsg+0x2ab/0x2c0 > [<ffffffff810e8f73>] ? __wake_up+0x43/0x70 > [<ffffffff8141a134>] ? tty_write+0x1c4/0x2a0 > [<ffffffff81fde9f4>] ? __sys_sendmsg+0x44/0x80 > [<ffffffff823ff8d7>] ? system_call_fastpath+0x12/0x6a > Code: Bad RIP value. > RIP [<0000000100000001>] 0x100000001 > RSP <ffff8800ba9dba40> > CR2: 0000000100000001 > ---[ end trace 08eb65d42362793f ]--- Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-23 06:23:23 -07:00
Eric W. Biederman	fdab6a4cbd	netfilter: nftables: Do not run chains in the wrong network namespace Currenlty nf_tables chains added in one network namespace are being run in all network namespace. The issues are myriad with the simplest being an unprivileged user can cause any network packets to be dropped. Address this by simply not running nf_tables chains in the wrong network namespace. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-23 06:23:22 -07:00
Pablo Neira Ayuso	10c04a8e71	netfilter: use forward declaration instead of including linux/proc_fs.h We don't need to pull the full definitions in that file, a simple forward declaration is enough. Moreover, include linux/procfs.h from nf_synproxy_core, otherwise this hits a compilation error due to missing declarations, ie. net/netfilter/nf_synproxy_core.c: In function ‘synproxy_proc_init’: net/netfilter/nf_synproxy_core.c:326:2: error: implicit declaration of function ‘proc_create’ [-Werror=implicit-function-declaration] if (!proc_create("synproxy", S_IRUGO, net->proc_net_stat, ^ Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2015-06-18 21:14:30 +02:00
Eric W. Biederman	2fd1dc910b	netfilter: Kill unused copies of RCV_SKB_FAIL This appears to have been a dead macro in both nfnetlink_log.c and nfnetlink_queue_core.c since these pieces of code were added in 2005. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-06-18 21:14:27 +02:00
Harout Hedeshian	01555e74bd	netfilter: xt_socket: add XT_SOCKET_RESTORESKMARK flag xt_socket is useful for matching sockets with IP_TRANSPARENT and taking some action on the matching packets. However, it lacks the ability to match only a small subset of transparent sockets. Suppose there are 2 applications, each with its own set of transparent sockets. The first application wants all matching packets dropped, while the second application wants them forwarded somewhere else. Add the ability to retore the skb->mark from the sk_mark. The mark is only restored if a matching socket is found and the transparent / nowildcard conditions are satisfied. Now the 2 hypothetical applications can differentiate their sockets based on a mark value set with SO_MARK. iptables -t mangle -I PREROUTING -m socket --transparent \ --restore-skmark -j action iptables -t mangle -A action -m mark --mark 10 -j action2 iptables -t mangle -A action -m mark --mark 11 -j action3 Signed-off-by: Harout Hedeshian <harouth@codeaurora.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-06-18 13:05:09 +02:00
Roman Kubiak	ef493bd930	netfilter: nfnetlink_queue: add security context information This patch adds an additional attribute when sending packet information via netlink in netfilter_queue module. It will send additional security context data, so that userspace applications can verify this context against their own security databases. Signed-off-by: Roman Kubiak <r.kubiak@samsung.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-06-18 13:02:24 +02:00
Pablo Neira Ayuso	835b803377	netfilter: nf_tables_netdev: unregister hooks on net_device removal In case the net_device is gone, we have to unregister the hooks and put back the reference on the net_device object. Once it comes back, register them again. This also covers the device rename case. This patch also adds a new flag to indicate that the basechain is disabled, so their hooks are not registered. This flag is used by the netdev family to handle the case where the net_device object is gone. Currently this flag is not exposed to userspace. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-06-15 23:02:35 +02:00
Pablo Neira Ayuso	d8ee8f7c56	netfilter: nf_tables: add nft_register_basechain() and nft_unregister_basechain() This wrapper functions take care of hook registration for basechains. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-06-15 23:02:33 +02:00
Pablo Neira Ayuso	2cbce139fc	netfilter: nf_tables: attach net_device to basechain The device is part of the hook configuration, so instead of a global configuration per table, set it to each of the basechain that we create. This patch reworks `ebddf1a8d7` ("netfilter: nf_tables: allow to bind table to net_device"). Note that this adds a dev_name field in the nft_base_chain structure which is required the netdev notification subscription that follows up in a patch to handle gone net_devices. Suggested-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-06-15 23:02:31 +02:00
Eric Dumazet	711bdde6a8	netfilter: x_tables: remove XT_TABLE_INFO_SZ and a dereference. After Florian patches, there is no need for XT_TABLE_INFO_SZ anymore : Only one copy of table is kept, instead of one copy per cpu. We also can avoid a dereference if we put table data right after xt_table_info. It reduces register pressure and helps compiler. Then, we attempt a kmalloc() if total size is under order-3 allocation, to reduce TLB pressure, as in many cases, rules fit in 32 KB. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-06-15 20:19:20 +02:00
Pablo Neira Ayuso	53b8762727	Merge branch 'master' of git://blackhole.kfki.hu/nf-next Jozsef Kadlecsik says: ==================== ipset patches for nf-next Please consider to apply the next bunch of patches for ipset. First comes the small changes, then the bugfixes and at the end the RCU related patches. * Use MSEC_PER_SEC consistently instead of the number. * Use SET_WITH_() helpers to test set extensions from Sergey Popovich. Check extensions attributes before getting extensions from Sergey Popovich. * Permit CIDR equal to the host address CIDR in IPv6 from Sergey Popovich. * Make sure we always return line number on batch in the case of error from Sergey Popovich. * Check CIDR value only when attribute is given from Sergey Popovich. * Fix cidr handling for hash:net types, reported by Jonathan Johnson. * Fix parallel resizing and listing of the same set so that the original set is kept for the whole dumping. * Make sure listing doesn't grab a set which is just being destroyed. * Remove rbtree from ip_set_hash_netiface.c in order to introduce RCU. * Replace rwlock_t with spinlock_t in "struct ip_set", change the locking in the core and simplifications in the timeout routines. * Introduce RCU locking in bitmap:* types with a slight modification in the logic on how an element is added. * Introduce RCU locking in hash:* types. This is the most complex part of the changes. * Introduce RCU locking in list type where standard rculist is used. * Fix coding styles reported by checkpatch.pl. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-06-15 18:33:09 +02:00
Pablo Neira Ayuso	f09becc79f	netfilter: Kconfig: get rid of parens around depends on According to the reporter, they are not needed. Reported-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-06-15 17:26:37 +02:00
Jozsef Kadlecsik	ca0f6a5cd9	netfilter: ipset: Fix coding styles reported by checkpatch.pl Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:18 +02:00
Jozsef Kadlecsik	00590fdd5b	netfilter: ipset: Introduce RCU locking in list type Standard rculist is used. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:17 +02:00
Jozsef Kadlecsik	18f84d41d3	netfilter: ipset: Introduce RCU locking in hash:* types Three types of data need to be protected in the case of the hash types: a. The hash buckets: standard rcu pointer operations are used. b. The element blobs in the hash buckets are stored in an array and a bitmap is used for book-keeping to tell which elements in the array are used or free. c. Networks per cidr values and the cidr values themselves are stored in fix sized arrays and need no protection. The values are modified in such an order that in the worst case an element testing is repeated once with the same cidr value. The ipset hash approach uses arrays instead of lists and therefore is incompatible with rhashtable. Performance is tested by Jesper Dangaard Brouer: Simple drop in FORWARD ~~~~~~~~~~~~~~~~~~~~~~ Dropping via simple iptables net-mask match:: iptables -t raw -N simple \|\| iptables -t raw -F simple iptables -t raw -I simple -s 198.18.0.0/15 -j DROP iptables -t raw -D PREROUTING -j simple iptables -t raw -I PREROUTING -j simple Drop performance in "raw": 11.3Mpps Generator: sending 12.2Mpps (tx:12264083 pps) Drop via original ipset in RAW table ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Create a set with lots of elements:: sudo ./ipset destroy test echo "create test hash:ip hashsize 65536" > test.set for x in `seq 0 255`; do for y in `seq 0 255`; do echo "add test 198.18.$x.$y" >> test.set done done sudo ./ipset restore < test.set Dropping via ipset:: iptables -t raw -F iptables -t raw -N net198 \|\| iptables -t raw -F net198 iptables -t raw -I net198 -m set --match-set test src -j DROP iptables -t raw -I PREROUTING -j net198 Drop performance in "raw" with ipset: 8Mpps Perf report numbers ipset drop in "raw":: + 24.65% ksoftirqd/1 [ip_set] [k] ip_set_test - 21.42% ksoftirqd/1 [kernel.kallsyms] [k] _raw_read_lock_bh - _raw_read_lock_bh + 99.88% ip_set_test - 19.42% ksoftirqd/1 [kernel.kallsyms] [k] _raw_read_unlock_bh - _raw_read_unlock_bh + 99.72% ip_set_test + 4.31% ksoftirqd/1 [ip_set_hash_ip] [k] hash_ip4_kadt + 2.27% ksoftirqd/1 [ixgbe] [k] ixgbe_fetch_rx_buffer + 2.18% ksoftirqd/1 [ip_tables] [k] ipt_do_table + 1.81% ksoftirqd/1 [ip_set_hash_ip] [k] hash_ip4_test + 1.61% ksoftirqd/1 [kernel.kallsyms] [k] __netif_receive_skb_core + 1.44% ksoftirqd/1 [kernel.kallsyms] [k] build_skb + 1.42% ksoftirqd/1 [kernel.kallsyms] [k] ip_rcv + 1.36% ksoftirqd/1 [kernel.kallsyms] [k] __local_bh_enable_ip + 1.16% ksoftirqd/1 [kernel.kallsyms] [k] dev_gro_receive + 1.09% ksoftirqd/1 [kernel.kallsyms] [k] __rcu_read_unlock + 0.96% ksoftirqd/1 [ixgbe] [k] ixgbe_clean_rx_irq + 0.95% ksoftirqd/1 [kernel.kallsyms] [k] __netdev_alloc_frag + 0.88% ksoftirqd/1 [kernel.kallsyms] [k] kmem_cache_alloc + 0.87% ksoftirqd/1 [xt_set] [k] set_match_v3 + 0.85% ksoftirqd/1 [kernel.kallsyms] [k] inet_gro_receive + 0.83% ksoftirqd/1 [kernel.kallsyms] [k] nf_iterate + 0.76% ksoftirqd/1 [kernel.kallsyms] [k] put_compound_page + 0.75% ksoftirqd/1 [kernel.kallsyms] [k] __rcu_read_lock Drop via ipset in RAW table with RCU-locking ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ With RCU locking, the RW-lock is gone. Drop performance in "raw" with ipset with RCU-locking: 11.3Mpps Performance-tested-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:17 +02:00
Jozsef Kadlecsik	96f51428c4	netfilter: ipset: Introduce RCU locking in bitmap:* types There's nothing much required because the bitmap types use atomic bit operations. However the logic of adding elements slightly changed: first the MAC address updated (which is not atomic), then the element activated (added). The extensions may call kfree_rcu() therefore we call rcu_barrier() at module removal. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:16 +02:00
Jozsef Kadlecsik	b57b2d1fa5	netfilter: ipset: Prepare the ipset core to use RCU at set level Replace rwlock_t with spinlock_t in "struct ip_set" and change the locking accordingly. Convert the comment extension into an rcu-avare object. Also, simplify the timeout routines. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:16 +02:00
Jozsef Kadlecsik	bd55389cc3	netfilter:ipset Remove rbtree from hash:net,iface Remove rbtree in order to introduce RCU instead of rwlock in ipset Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:15 +02:00
Jozsef Kadlecsik	9c1ba5c809	netfilter: ipset: Make sure listing doesn't grab a set which is just being destroyed. There was a small window when all sets are destroyed and a concurrent listing of all sets could grab a set which is just being destroyed. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:15 +02:00
Jozsef Kadlecsik	c4c997839c	netfilter: ipset: Fix parallel resizing and listing of the same set When elements added to a hash:* type of set and resizing triggered, parallel listing could start to list the original set (before resizing) and "continue" with listing the new set. Fix it by references and using the original hash table for listing. Therefore the destroying of the original hash table may happen from the resizing or listing functions. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:15 +02:00
Jozsef Kadlecsik	f690cbaed9	netfilter: ipset: Fix cidr handling for hash:net types Commit "Simplify cidr handling for hash:net types" broke the cidr handling for the hash:net types when the sets were used by the SET target: entries with invalid cidr values were added to the sets. Reported by Jonathan Johnson. Testsuite entry is added to verify the fix. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:14 +02:00
Sergey Popovich	aff227581e	netfilter: ipset: Check CIDR value only when attribute is given There is no reason to check CIDR value regardless attribute specifying CIDR is given. Initialize cidr array in element structure on element structure declaration to let more freedom to the compiler to optimize initialization right before element structure is used. Remove local variables cidr and cidr2 for netnet and netportnet hashes as we do not use packed cidr value for such set types and can store value directly in e.cidr[]. Signed-off-by: Sergey Popovich <popovich_sergei@mail.ua> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:14 +02:00
Sergey Popovich	a212e08e8e	netfilter: ipset: Make sure we always return line number on batch Even if we return with generic IPSET_ERR_PROTOCOL it is good idea to return line number if we called in batch mode. Moreover we are not always exiting with IPSET_ERR_PROTOCOL. For example hash:ip,port,net may return IPSET_ERR_HASH_RANGE_UNSUPPORTED or IPSET_ERR_INVALID_CIDR. Signed-off-by: Sergey Popovich <popovich_sergei@mail.ua> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:13 +02:00
Sergey Popovich	2c227f278a	netfilter: ipset: Permit CIDR equal to the host address CIDR in IPv6 Permit userspace to supply CIDR length equal to the host address CIDR length in netlink message. Prohibit any other CIDR length for IPv6 variant of the set. Also return -IPSET_ERR_HASH_RANGE_UNSUPPORTED instead of generic -IPSET_ERR_PROTOCOL in IPv6 variant of hash:ip,port,net when IPSET_ATTR_IP_TO attribute is given. Signed-off-by: Sergey Popovich <popovich_sergei@mail.ua> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:13 +02:00
Sergey Popovich	7dd37bc8e6	netfilter: ipset: Check extensions attributes before getting extensions. Make all extensions attributes checks within ip_set_get_extensions() and reduce number of duplicated code. Signed-off-by: Sergey Popovich <popovich_sergei@mail.ua> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:13 +02:00
Sergey Popovich	edda079174	netfilter: ipset: Use SET_WITH_*() helpers to test set extensions Signed-off-by: Sergey Popovich <popovich_sergei@mail.ua> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-06-14 10:40:12 +02:00
Florian Westphal	482cfc3185	netfilter: xtables: avoid percpu ruleset duplication We store the rule blob per (possible) cpu. Unfortunately this means we can waste lot of memory on big smp machines. ipt_entry structure ('rule head') is 112 byte, so e.g. with maxcpu=64 one single rule eats close to 8k RAM. Since previous patch made counters percpu it appears there is nothing left in the rule blob that needs to be percpu. On my test system (144 possible cpus, 400k dummy rules) this change saves close to 9 Gigabyte of RAM. Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-06-12 14:27:10 +02:00

1 2 3 4 5 ...

3044 Commits