commit 0fdea1e8a2 upstream.
Commit 718ba5b873, moved the responsibility for unlocking the socket to
xs_tcp_setup_socket, meaning that the socket will be unlocked before we
know that it has finished trying to connect. The following patch is based on
an initial patch by Russell King to ensure that we delay clearing the
XPRT_CONNECTING flag until we either know that we failed to initiate
a connection attempt, or the connection attempt itself failed.
Fixes: 718ba5b873 ("SUNRPC: Add helpers to prevent socket create from racing")
Reported-by: Russell King <linux@arm.linux.org.uk>
Reported-by: Russell King <rmk+kernel@arm.linux.org.uk>
Tested-by: Russell King <rmk+kernel@arm.linux.org.uk>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 99b1a4c32a upstream.
It is rather pointless to test the value of transport->inet after
calling xs_reset_transport(), since it will always be zero, and
so we will never see any exponential back off behaviour.
Also don't force early connections for SOFTCONN tasks. If the server
disconnects us, we should respect the exponential backoff.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2f064f3485 upstream.
Commit c48a11c7ad ("netvm: propagate page->pfmemalloc to skb") added
checks for page->pfmemalloc to __skb_fill_page_desc():
if (page->pfmemalloc && !page->mapping)
skb->pfmemalloc = true;
It assumes page->mapping == NULL implies that page->pfmemalloc can be
trusted. However, __delete_from_page_cache() can set set page->mapping
to NULL and leave page->index value alone. Due to being in union, a
non-zero page->index will be interpreted as true page->pfmemalloc.
So the assumption is invalid if the networking code can see such a page.
And it seems it can. We have encountered this with a NFS over loopback
setup when such a page is attached to a new skbuf. There is no copying
going on in this case so the page confuses __skb_fill_page_desc which
interprets the index as pfmemalloc flag and the network stack drops
packets that have been allocated using the reserves unless they are to
be queued on sockets handling the swapping which is the case here and
that leads to hangs when the nfs client waits for a response from the
server which has been dropped and thus never arrive.
The struct page is already heavily packed so rather than finding another
hole to put it in, let's do a trick instead. We can reuse the index
again but define it to an impossible value (-1UL). This is the page
index so it should never see the value that large. Replace all direct
users of page->pfmemalloc by page_is_pfmemalloc which will hide this
nastiness from unspoiled eyes.
The information will get lost if somebody wants to use page->index
obviously but that was the case before and the original code expected
that the information should be persisted somewhere else if that is
really needed (e.g. what SLAB and SLUB do).
[akpm@linux-foundation.org: fix blooper in slub]
Fixes: c48a11c7ad ("netvm: propagate page->pfmemalloc to skb")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Debugged-by: Vlastimil Babka <vbabka@suse.com>
Debugged-by: Jiri Bohac <jbohac@suse.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3633ebebab upstream.
We already set a station to be associated when peering completes, both
in user space and in the kernel. Thus we should always have an
associated sta before sending data frames to that station.
Failure to check assoc state can cause crashes in the lower-level driver
due to transmitting unicast data frames before driver sta structures
(e.g. ampdu state in ath9k) are initialized. This occurred when
forwarding in the presence of fixed mesh paths: frames were transmitted
to stations with whom we hadn't yet completed peering.
Reported-by: Alexis Green <agreen@cococorp.com>
Tested-by: Jesse Jones <jjones@cococorp.com>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a068acf2ee upstream.
Many file systems that implement the show_options hook fail to correctly
escape their output which could lead to unescaped characters (e.g. new
lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files. This
could lead to confusion, spoofed entries (resulting in things like
systemd issuing false d-bus "mount" notifications), and who knows what
else. This looks like it would only be the root user stepping on
themselves, but it's possible weird things could happen in containers or
in other situations with delegated mount privileges.
Here's an example using overlay with setuid fusermount trusting the
contents of /proc/mounts (via the /etc/mtab symlink). Imagine the use
of "sudo" is something more sneaky:
$ BASE="ovl"
$ MNT="$BASE/mnt"
$ LOW="$BASE/lower"
$ UP="$BASE/upper"
$ WORK="$BASE/work/ 0 0
none /proc fuse.pwn user_id=1000"
$ mkdir -p "$LOW" "$UP" "$WORK"
$ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt
$ cat /proc/mounts
none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0
none /proc fuse.pwn user_id=1000 0 0
$ fusermount -u /proc
$ cat /proc/mounts
cat: /proc/mounts: No such file or directory
This fixes the problem by adding new seq_show_option and
seq_show_option_n helpers, and updating the vulnerable show_option
handlers to use them as needed. Some, like SELinux, need to be open
coded due to unusual existing escape mechanisms.
[akpm@linux-foundation.org: add lost chunk, per Kees]
[keescook@chromium.org: seq_show_option should be using const parameters]
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Jan Kara <jack@suse.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Cc: J. R. Okajima <hooanon05g@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 999b8b88c6 upstream.
Some use of those functions were providing unitialized values to those
functions. Notably, when reading 0 bytes from an empty file on a 9P
filesystem, the return code of read() was not 0.
Tested with this simple program:
#include <assert.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
int main(int argc, const char **argv)
{
assert(argc == 2);
char buffer[256];
int fd = open(argv[1], O_RDONLY|O_NOCTTY);
assert(fd >= 0);
assert(read(fd, buffer, 0) == 0);
return 0;
}
Signed-off-by: Vincent Bernat <vincent@bernat.im>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f5eeb5fa19 upstream.
At the last iteration of the loop, j may equal zero and thus
tp_list[j - 1] causes an invalid read.
Change the logic of the loop so that j - 1 is always >= 0.
Signed-off-by: Adrien Schildknecht <adrien+dev@schischi.me>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 25ba265390 upstream.
The l2cap_conn->smp pointer may be NULL for various valid reasons where SMP has
failed to initialize properly. One such scenario is when crypto support is
missing, another when the adapter has been powered on through a legacy method.
The smp_conn_security() function should have the appropriate check for this
situation to avoid NULL pointer dereferences.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4fabb59449 upstream.
Fixes: 3e0249f9c0 ("RDS/IB: add refcount tracking to struct rds_ib_device")
There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr
failed(mr pool running out). this lead to the refcount overflow.
A complain in line 117(see following) is seen. From vmcore:
s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev->refcount is -2147475448.
That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is very likely
to return ERR_PTR(-EAGAIN).
115 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev)
116 {
117 BUG_ON(atomic_read(&rds_ibdev->refcount) <= 0);
118 if (atomic_dec_and_test(&rds_ibdev->refcount))
119 queue_work(rds_wq, &rds_ibdev->free_work);
120 }
fix is to drop refcount when rds_ib_alloc_fmr failed.
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4479004e64 upstream.
If we don't do this, and we then fail to recreate the debugfs
directory during a mode change, then we will fail later trying
to add stations to this now bogus directory:
BUG: unable to handle kernel NULL pointer dereference at 0000006c
IP: [<c0a92202>] mutex_lock+0x12/0x30
Call Trace:
[<c0678ab4>] start_creating+0x44/0xc0
[<c0679203>] debugfs_create_dir+0x13/0xf0
[<f8a938ae>] ieee80211_sta_debugfs_add+0x6e/0x490 [mac80211]
Signed-off-by: Tom Hughes <tom@compton.nu>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d3b58c47d3 upstream.
Commit 514ac99c64 "can: fix multiple delivery of a single CAN frame for
overlapping CAN filters" requires the skb->tstamp to be set to check for
identical CAN skbs.
Without timestamping to be required by user space applications this timestamp
was not generated which lead to commit 36c01245eb "can: fix loss of CAN frames
in raw_rcv" - which forces the timestamp to be set in all CAN related skbuffs
by introducing several __net_timestamp() calls.
This forces e.g. out of tree drivers which are not using alloc_can{,fd}_skb()
to add __net_timestamp() after skbuff creation to prevent the frame loss fixed
in mainline Linux.
This patch removes the timestamp dependency and uses an atomic counter to
create an unique identifier together with the skbuff pointer.
Btw: the new skbcnt element introduced in struct can_skb_priv has to be
initialized with zero in out-of-tree drivers which are not using
alloc_can{,fd}_skb() too.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 67e808fbb0 upstream.
Braino in "9p: switch p9_client_write() to passing it struct iov_iter *";
if response is impossible to parse and we discard the request, get the
out of the loop right there.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a84b69cb6e upstream.
If we'd already sent a request and decide to abort it, we *must*
issue TFLUSH properly and not just blindly reuse the tag, or
we'll get seriously screwed when response eventually arrives
and we confuse it for response to later request that had reused
the same tag.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ab499db80f upstream.
There was a possible race between
ieee80211_reconfig() and
ieee80211_delayed_tailroom_dec(). This could
result in inability to transmit data if driver
crashed during roaming or rekeying and subsequent
skbs with insufficient tailroom appeared.
This race was probably never seen in the wild
because a device driver would have to crash AND
recover within 0.5s which is very unlikely.
I was able to prove this race exists after
changing the delay to 10s locally and crashing
ath10k via debugfs immediately after GTK
rekeying. In case of ath10k the counter went below
0. This was harmless but other drivers which
actually require tailroom (e.g. for WEP ICV or
MMIC) could end up with the counter at 0 instead
of >0 and introduce insufficient skb tailroom
failures because mac80211 would not resize skbs
appropriately anymore.
Fixes: 8d1f7ecd2a ("mac80211: defer tailroom counter manipulation when roaming")
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6cbfb1bb66 upstream.
It was possible for mac80211 to be coerced into an
unexpected flow causing sdata union to become
corrupted. Station pointer was put into
sdata->u.vlan.sta memory location while it was
really master AP's sdata->u.ap.next_beacon. This
led to station entry being later freed as
next_beacon before __sta_info_flush() in
ieee80211_stop_ap() and a subsequent invalid
pointer dereference crash.
The problem was that ieee80211_ptr->use_4addr
wasn't cleared on interface type changes.
This could be reproduced with the following steps:
# host A and host B have just booted; no
# wpa_s/hostapd running; all vifs are down
host A> iw wlan0 set type station
host A> iw wlan0 set 4addr on
host A> printf 'interface=wlan0\nssid=4addrcrash\nchannel=1\nwds_sta=1' > /tmp/hconf
host A> hostapd -B /tmp/conf
host B> iw wlan0 set 4addr on
host B> ifconfig wlan0 up
host B> iw wlan0 connect -w hostAssid
host A> pkill hostapd
# host A crashed:
[ 127.928192] BUG: unable to handle kernel NULL pointer dereference at 00000000000006c8
[ 127.929014] IP: [<ffffffff816f4f32>] __sta_info_flush+0xac/0x158
...
[ 127.934578] [<ffffffff8170789e>] ieee80211_stop_ap+0x139/0x26c
[ 127.934578] [<ffffffff8100498f>] ? dump_trace+0x279/0x28a
[ 127.934578] [<ffffffff816dc661>] __cfg80211_stop_ap+0x84/0x191
[ 127.934578] [<ffffffff816dc7ad>] cfg80211_stop_ap+0x3f/0x58
[ 127.934578] [<ffffffff816c5ad6>] nl80211_stop_ap+0x1b/0x1d
[ 127.934578] [<ffffffff815e53f8>] genl_family_rcv_msg+0x259/0x2b5
Note: This isn't a revert of f8cdddb8d6
("cfg80211: check iface combinations only when
iface is running") as far as functionality is
considered because b6a550156b ("cfg80211/mac80211:
move more combination checks to mac80211") moved
the logic somewhere else already.
Fixes: f8cdddb8d6 ("cfg80211: check iface combinations only when iface is running")
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8df734e865 upstream.
The csa counter has moved from sdata to beacon/presp but
it is not updated accordingly for mesh and ibss. Fix this.
Fixes: af296bdb8d ("mac80211: move csa counters from sdata to beacon/presp")
Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 82cd003a77 upstream.
struct crush_bucket_tree::num_nodes is u8, so ceph_decode_8_safe()
should be used. -Wconversion catches this, but I guess it went
unnoticed in all the noise it spews. The actual problem (at least for
common crushmaps) isn't the u32 -> u8 truncation though - it's the
advancement by 4 bytes instead of 1 in the crushmap buffer.
Fixes: http://tracker.ceph.com/issues/2759
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8a70cefa30 upstream.
The AF_IEEE802154 sockaddr looks like this:
struct sockaddr_ieee802154 {
sa_family_t family; /* AF_IEEE802154 */
struct ieee802154_addr_sa addr;
};
struct ieee802154_addr_sa {
int addr_type;
u16 pan_id;
union {
u8 hwaddr[IEEE802154_ADDR_LEN];
u16 short_addr;
};
};
On most architectures there will be implicit structure padding here,
in two different places:
* In struct sockaddr_ieee802154, two bytes of padding between 'family'
(unsigned short) and 'addr', so that 'addr' starts on a four byte
boundary.
* In struct ieee802154_addr_sa, two bytes at the end of the structure,
to make the structure 16 bytes.
When calling recvmsg(2) on a PF_IEEE802154 SOCK_DGRAM socket, the
ieee802154 stack constructs a struct sockaddr_ieee802154 on the
kernel stack without clearing these padding fields, and, depending
on the addr_type, between four and ten bytes of uncleared kernel
stack will be copied to userspace.
We can't just insert two 'u16 __pad's in the right places and zero
those before copying an address to userspace, as not all architectures
insert this implicit padding -- from a quick test it seems that avr32,
cris and m68k don't insert this padding, while every other architecture
that I have cross compilers for does insert this padding.
The easiest way to plug the leak is to just memset the whole struct
sockaddr_ieee802154 before filling in the fields we want to fill in,
and that's what this patch does.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 781f899f2f upstream.
During the initial setup stage of a controller, the low-level transport
is actually active. This means that HCI_UP is true. To avoid toggling
the transport off and back on again for normal operation the kernel
holds a grace period with HCI_AUTO_OFF that will turn the low-level
transport off in case no user is present.
The idea of the grace period is important to avoid having to initialize
all of the controller twice. So legacy ioctl and the new management
interface knows how to clear this grace period and then start normal
operation.
For the user channel operation this grace period has not been taken into
account which results in the problem that HCI_UP and HCI_AUTO_OFF are
set and the kernel will return EBUSY. However from a system point of
view the controller is ready to be grabbed by either the ioctl, the
management interface or the user channel.
This patch brings the user channel to the same level as the other two
entries for operating a controller.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 34b99df4e6 ]
ICMP messages can trigger ICMP and local errors. In this case
serr->port is 0 and starting from Linux 4.0 we do not return
the original target address to the error queue readers.
Add function to define which errors provide addr_offset.
With this fix my ping command is not silent anymore.
Fixes: c247f0534c ("ip: fix error queue empty skb handling")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 51f458d961 ]
Unfortunately, Michal's change to fix AP_VLAN crypto tailroom
caused a locking issue that was reported by lockdep, but only
in a few cases - the issue was a classic ABBA deadlock caused
by taking the mtx after the key_mtx, where normally they're
taken the other way around.
As the key mutex protects the field in question (I'm adding a
few annotations to make that clear) only the iteration needs
to be protected, but we can also iterate the interface list
with just RCU protection while holding the key mutex.
Fixes: f9dca80b98 ("mac80211: fix AP_VLAN crypto tailroom calculation")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2c51a97f76 ]
The lockless lookups can return entry that is unlinked.
Sometimes they get reference before last neigh_cleanup_and_release,
sometimes they do not need reference. Later, any
modification attempts may result in the following problems:
1. entry is not destroyed immediately because neigh_update
can start the timer for dead entry, eg. on change to NUD_REACHABLE
state. As result, entry lives for some time but is invisible
and out of control.
2. __neigh_event_send can run in parallel with neigh_destroy
while refcnt=0 but if timer is started and expired refcnt can
reach 0 for second time leading to second neigh_destroy and
possible crash.
Thanks to Eric Dumazet and Ying Xue for their work and analyze
on the __neigh_event_send change.
Fixes: 767e97e1e0 ("neigh: RCU conversion of struct neighbour")
Fixes: a263b30936 ("ipv4: Make neigh lookups directly in output packet path.")
Fixes: 6fd6ce2056 ("ipv6: Do not depend on rt->n in ip6_finish_output2().")
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 468479e604 ]
PACKET_FANOUT_LB computes f->rr_cur such that it is modulo
f->num_members. It returns the old value unconditionally, but
f->num_members may have changed since the last store. Ensure
that the return value is always < num.
When modifying the logic, simplify it further by replacing the loop
with an unconditional atomic increment.
Fixes: dc99f60069 ("packet: Add fanout support.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f98f4514d0 ]
We need to tell compiler it must not read f->num_members multiple
times. Otherwise testing if num is not zero is flaky, and we could
attempt an invalid divide by 0 in fanout_demux_cpu()
Note bug was present in packet_rcv_fanout_hash() and
packet_rcv_fanout_lb() but final 3.1 had a simple location
after commit 95ec3eb417 ("packet: Add 'cpu' fanout policy.")
Fixes: dc99f60069 ("packet: Add fanout support.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2dab80a8b4 ]
After the ->set() spinlocks were removed br_stp_set_bridge_priority
was left running without any protection when used via sysfs. It can
race with port add/del and could result in use-after-free cases and
corrupted lists. Tested by running port add/del in a loop with stp
enabled while setting priority in a loop, crashes are easily
reproducible.
The spinlocks around sysfs ->set() were removed in commit:
14f98f258f ("bridge: range check STP parameters")
There's also a race condition in the netlink priority support that is
fixed by this change, but it was introduced recently and the fixes tag
covers it, just in case it's needed the commit is:
af615762e9 ("bridge: add ageing_time, stp_state, priority over netlink")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Fixes: 14f98f258f ("bridge: range check STP parameters")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2d45a02d01 ]
->auto_asconf_splist is per namespace and mangled by functions like
sctp_setsockopt_auto_asconf() which doesn't guarantee any serialization.
Also, the call to inet_sk_copy_descendant() was backuping
->auto_asconf_list through the copy but was not honoring
->do_auto_asconf, which could lead to list corruption if it was
different between both sockets.
This commit thus fixes the list handling by using ->addr_wq_lock
spinlock to protect the list. A special handling is done upon socket
creation and destruction for that. Error handlig on sctp_init_sock()
will never return an error after having initialized asconf, so
sctp_destroy_sock() can be called without addrq_wq_lock. The lock now
will be take on sctp_close_sock(), before locking the socket, so we
don't do it in inverse order compared to sctp_addr_wq_timeout_handler().
Instead of taking the lock on sctp_sock_migrate() for copying and
restoring the list values, it's preferred to avoid rewritting it by
implementing sctp_copy_descendant().
Issue was found with a test application that kept flipping sysctl
default_auto_asconf on and off, but one could trigger it by issuing
simultaneous setsockopt() calls on multiple sockets or by
creating/destroying sockets fast enough. This is only triggerable
locally.
Fixes: 9f7d653b67 ("sctp: Add Auto-ASCONF support (core).")
Reported-by: Ji Jianwen <jiji@redhat.com>
Suggested-by: Neil Horman <nhorman@tuxdriver.com>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 36c01245eb upstream.
As reported by Manfred Schlaegl here
http://marc.info/?l=linux-netdev&m=143482089824232&w=2
commit 514ac99c64 "can: fix multiple delivery of a single CAN frame for
overlapping CAN filters" requires the skb->tstamp to be set to check for
identical CAN skbs.
As net timestamping is influenced by several players (netstamp_needed and
netdev_tstamp_prequeue) Manfred missed a proper timestamp which leads to
CAN frame loss.
As skb timestamping became now mandatory for CAN related skbs this patch
makes sure that received CAN skbs always have a proper timestamp set.
Maybe there's a better solution in the future but this patch fixes the
CAN frame loss so far.
Reported-by: Manfred Schlaegl <manfred.schlaegl@gmx.at>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, we can ask to authenticate DATA chunks and we can send DATA
chunks on the same packet as COOKIE_ECHO, but if you try to combine
both, the DATA chunk will be sent unauthenticated and peer won't accept
it, leading to a communication failure.
This happens because even though the data was queued after it was
requested to authenticate DATA chunks, it was also queued before we
could know that remote peer can handle authenticating, so
sctp_auth_send_cid() returns false.
The fix is whenever we set up an active key, re-check send queue for
chunks that now should be authenticated. As a result, such packet will
now contain COOKIE_ECHO + AUTH + DATA chunks, in that order.
Reported-by: Liu Wei <weliu@redhat.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We saw excessive direct memory compaction triggered by skb_page_frag_refill.
This causes performance issues and add latency. Commit 5640f76858
introduces the order-3 allocation. According to the changelog, the order-3
allocation isn't a must-have but to improve performance. But direct memory
compaction has high overhead. The benefit of order-3 allocation can't
compensate the overhead of direct memory compaction.
This patch makes the order-3 page allocation atomic. If there is no memory
pressure and memory isn't fragmented, the alloction will still success, so we
don't sacrifice the order-3 benefit here. If the atomic allocation fails,
direct memory compaction will not be triggered, skb_page_frag_refill will
fallback to order-0 immediately, hence the direct memory compaction overhead is
avoided. In the allocation failure case, kswapd is waken up and doing
compaction, so chances are allocation could success next time.
alloc_skb_with_frags is the same.
The mellanox driver does similar thing, if this is accepted, we must fix
the driver too.
V3: fix the same issue in alloc_skb_with_frags as pointed out by Eric
V2: make the changelog clearer
Cc: Eric Dumazet <edumazet@google.com>
Cc: Chris Mason <clm@fb.com>
Cc: Debabrata Banerjee <dbavatar@gmail.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a device is renamed and the original name is subsequently reused
for a new device, the following warning is generated:
sysctl duplicate entry: /net/mpls/conf/veth0//input
CPU: 3 PID: 1379 Comm: ip Not tainted 4.1.0-rc4+ #20
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
0000000000000000 0000000000000000 ffffffff81566aaf 0000000000000000
ffffffff81236279 ffff88002f7d7f00 0000000000000000 ffff88000db336d8
ffff88000db33698 0000000000000005 ffff88002e046000 ffff8800168c9280
Call Trace:
[<ffffffff81566aaf>] ? dump_stack+0x40/0x50
[<ffffffff81236279>] ? __register_sysctl_table+0x289/0x5a0
[<ffffffffa051a24f>] ? mpls_dev_notify+0x1ff/0x300 [mpls_router]
[<ffffffff8108db7f>] ? notifier_call_chain+0x4f/0x70
[<ffffffff81470e72>] ? register_netdevice+0x2b2/0x480
[<ffffffffa0524748>] ? veth_newlink+0x178/0x2d3 [veth]
[<ffffffff8147f84c>] ? rtnl_newlink+0x73c/0x8e0
[<ffffffff8147f27a>] ? rtnl_newlink+0x16a/0x8e0
[<ffffffff81459ff2>] ? __kmalloc_reserve.isra.30+0x32/0x90
[<ffffffff8147ccfd>] ? rtnetlink_rcv_msg+0x8d/0x250
[<ffffffff8145b027>] ? __alloc_skb+0x47/0x1f0
[<ffffffff8149badb>] ? __netlink_lookup+0xab/0xe0
[<ffffffff8147cc70>] ? rtnetlink_rcv+0x30/0x30
[<ffffffff8149e7a0>] ? netlink_rcv_skb+0xb0/0xd0
[<ffffffff8147cc64>] ? rtnetlink_rcv+0x24/0x30
[<ffffffff8149df17>] ? netlink_unicast+0x107/0x1a0
[<ffffffff8149e4be>] ? netlink_sendmsg+0x50e/0x630
[<ffffffff8145209c>] ? sock_sendmsg+0x3c/0x50
[<ffffffff81452beb>] ? ___sys_sendmsg+0x27b/0x290
[<ffffffff811bd258>] ? mem_cgroup_try_charge+0x88/0x110
[<ffffffff811bd5b6>] ? mem_cgroup_commit_charge+0x56/0xa0
[<ffffffff811d7700>] ? do_filp_open+0x30/0xa0
[<ffffffff8145336e>] ? __sys_sendmsg+0x3e/0x80
[<ffffffff8156c3f2>] ? system_call_fastpath+0x16/0x75
Fix this by unregistering the previous sysctl table (registered for
the path containing the original device name) and re-registering the
table for the path containing the new device name.
Fixes: 37bde79979 ("mpls: Per-device enabling of packet input")
Reported-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the addition of sysfs multicast router support if one set
multicast_router to "2" more than once, then the port would be added to
the hlist every time and could end up linking to itself and thus causing an
endless loop for rlist walkers.
So to reproduce just do:
echo 2 > multicast_router; echo 2 > multicast_router;
in a bridge port and let some igmp traffic flow, for me it hangs up
in br_multicast_flood().
Fix this by adding a check in br_multicast_add_router() if the port is
already linked.
The reason this didn't happen before the addition of multicast_router
sysfs entries is because there's a !hlist_unhashed check that prevents
it.
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Fixes: 0909e11758 ("bridge: Add multicast_router sysfs entries")
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the TIPC connection timer expires in a probing state, a
self abort message is supposed to be generated and delivered
to the local socket. This is currently broken, and the abort
message is actually sent out to the peer node with invalid
addressing information. This will cause the link to enter
a constant retransmission state and eventually reset.
We fix this by removing the self-abort message creation and
tear down connection immediately instead.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Until recently, mac80211 overwrote all the statistics it could
provide when getting called, but it now relies on the struct
having been zeroed by the caller. This was always the case in
nl80211, but wext used a static struct which could even cause
values from one device leak to another.
Using a static struct is OK (as even documented in a comment)
since the whole usage of this function and its return value is
always locked under RTNL. Not clearing the struct for calling
the driver has always been wrong though, since drivers were
free to only fill values they could report, so calling this
for one device and then for another would always have leaked
values from one to the other.
Fix this by initializing the structure in question before the
driver method call.
This fixes https://bugzilla.kernel.org/show_bug.cgi?id=99691
Cc: stable@vger.kernel.org
Reported-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Reported-by: Alexander Kaltsas <alexkaltsas@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 70008aa50e ("skbuff: convert to skb_orphan_frags") replaced
open coded tests of SKBTX_DEV_ZEROCOPY and skb_copy_ubufs with calls
to helper function skb_orphan_frags. Apply that to the last remaining
open coded site.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
UDP encapsulation is broken on IPv6. This is because the logic to resubmit
the nexthdr is inverted, checking for a ret value > 0 instead of < 0. Also,
the resubmit label is in the wrong position since we already get the
nexthdr value when performing decapsulation. In addition the skb pull is no
longer necessary either.
This changes the return value check to look for < 0, using it for the
nexthdr on the next iteration, and moves the resubmit label to the proper
location.
With these changes the v6 code now matches what we do in the v4 ip input
code wrt resubmitting when decapsulating.
Signed-off-by: Josh Hunt <johunt@akamai.com>
Acked-by: "Tom Herbert" <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The memory pointed to by idev->stats.icmpv6msgdev,
idev->stats.icmpv6dev and idev->stats.ipv6 can each be used in an RCU
read context without taking a reference on idev. For example, through
IP6_*_STATS_* calls in ip6_rcv. These memory blocks are freed without
waiting for an RCU grace period to elapse. This could lead to the
memory being written to after it has been freed.
Fix this by using call_rcu to free the memory used for stats, as well
as idev after an RCU grace period has elapsed.
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
br_fdb_update() can be called in process context in the following way:
br_fdb_add() -> __br_fdb_add() -> br_fdb_update() (if NTF_USE flag is set)
so we need to disable softirqs because there are softirq users of the
hash_lock. One easy way to reproduce this is to modify the bridge utility
to set NTF_USE, enable stp and then set maxageing to a low value so
br_fdb_cleanup() is called frequently and then just add new entries in
a loop. This happens because br_fdb_cleanup() is called from timer/softirq
context. The spin locks in br_fdb_update were _bh before commit f8ae737dee
("[BRIDGE]: forwarding remove unneeded preempt and bh diasables")
and at the time that commit was correct because br_fdb_update() couldn't be
called from process context, but that changed after commit:
292d139898 ("bridge: add NTF_USE support")
Using local_bh_disable/enable around br_fdb_update() allows us to keep
using the spin_lock/unlock in br_fdb_update for the fast-path.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: 292d139898 ("bridge: add NTF_USE support")
Signed-off-by: David S. Miller <davem@davemloft.net>
The mpls device is used in an RCU read context without a lock being
held. As the memory is freed without waiting for the RCU grace period
to elapse, the freed memory could still be in use.
Address this by using kfree_rcu to free the memory for the mpls device
after the RCU grace period has elapsed.
Fixes: 03c57747a7 ("mpls: Per-device MPLS state")
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
br_fdb_update() can be called in process context in the following way:
br_fdb_add() -> __br_fdb_add() -> br_fdb_update() (if NTF_USE flag is set)
so we need to use spin_lock_bh because there are softirq users of the
hash_lock. One easy way to reproduce this is to modify the bridge utility
to set NTF_USE, enable stp and then set maxageing to a low value so
br_fdb_cleanup() is called frequently and then just add new entries in
a loop. This happens because br_fdb_cleanup() is called from timer/softirq
context. These locks were _bh before commit f8ae737dee
("[BRIDGE]: forwarding remove unneeded preempt and bh diasables")
and at the time that commit was correct because br_fdb_update() couldn't be
called from process context, but that changed after commit:
292d139898 ("bridge: add NTF_USE support")
Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: 292d139898 ("bridge: add NTF_USE support")
Signed-off-by: David S. Miller <davem@davemloft.net>
421b3885bf "udp: ipv4: Add udp early
demux" introduced a regression that allowed sockets bound to INADDR_ANY
to receive packets from multicast groups that the socket had not joined.
For example a socket that had joined 224.168.2.9 could also receive
packets from 225.168.2.9 despite not having joined that group if
ip_early_demux is enabled.
Fix this by calling ip_check_mc_rcu() in udp_v4_early_demux() to verify
that the multicast packet is indeed ours.
Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Reported-by: Yurij M. Plotnikov <Yurij.Plotnikov@oktetlabs.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, openvswitch tries to disable LRO from the user space. This does
not work correctly when the device added is a vlan interface, though.
Instead of dealing with possibly complex stacked cross name space relations
in the user space, do the same as bridging does and call dev_disable_lro in
the kernel.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Flavio Leitner <fbl@redhat.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>