[ Upstream commit cea0cc80a6 ]
Commit dfcb9f4f99 ("sctp: deny peeloff operation on asocs with threads
sleeping on it") fixed the race between peeloff and wait sndbuf by
checking waitqueue_active(&asoc->wait) in sctp_do_peeloff().
But it actually doesn't work, as even if waitqueue_active returns false
the waiting sndbuf thread may still not yet hold sk lock. After asoc is
peeled off, sk is not asoc->base.sk any more, then to hold the old sk
lock couldn't make assoc safe to access.
This patch is to fix this by changing to hold the new sk lock if sk is
not asoc->base.sk, meanwhile, also set the sk in sctp_sendmsg with the
new sk.
With this fix, there is no more race between peeloff and waitbuf, the
check 'waitqueue_active' in sctp_do_peeloff can be removed.
Thanks Marcelo and Neil for making this clear.
v1->v2:
fix it by changing to lock the new sock instead of adding a flag in asoc.
Suggested-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ca3af4dd28 ]
Now in sctp_sendmsg sctp_wait_for_sndbuf could schedule out without
holding sock sk. It means the current asoc can be freed elsewhere,
like when receiving an abort packet.
If the asoc is just created in sctp_sendmsg and sctp_wait_for_sndbuf
returns err, the asoc will be freed again due to new_asoc is not nil.
An use-after-free issue would be triggered by this.
This patch is to fix it by setting new_asoc with nil if the asoc is
already dead when cpu schedules back, so that it will not be freed
again in sctp_sendmsg.
v1->v2:
set new_asoc as nil in sctp_sendmsg instead of sctp_wait_for_sndbuf.
Suggested-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0e73fc9a56 ]
The comparison on the timeout can lead to an array overrun
read on sctp_timer_tbl because of an off-by-one error. Fix
this by using < instead of <= and also compare to the array
size rather than SCTP_EVENT_TIMEOUT_MAX.
Fixes CoverityScan CID#1397639 ("Out-of-bounds read")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7c8a61d9ee ]
Alexandar Potapenko while testing the kernel with KMSAN and syzkaller
discovered that in some configurations sctp would leak 4 bytes of
kernel stack.
Working with his reproducer I discovered that those 4 bytes that
are leaked is the scope id of an ipv6 address returned by recvmsg.
With a little code inspection and a shrewd guess I discovered that
sctp_inet6_skb_msgname only initializes the scope_id field for link
local ipv6 addresses to the interface index the link local address
pertains to instead of initializing the scope_id field for all ipv6
addresses.
That is almost reasonable as scope_id's are meaniningful only for link
local addresses. Set the scope_id in all other cases to 0 which is
not a valid interface index to make it clear there is nothing useful
in the scope_id field.
There should be no danger of breaking userspace as the stack leak
guaranteed that previously meaningless random data was being returned.
Fixes: 372f525b495c ("SCTP: Resync with LKSCTP tree.")
History-tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Reported-by: Alexander Potapenko <glider@google.com>
Tested-by: Alexander Potapenko <glider@google.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit df80cd9b28 ]
Now when peeling off an association to the sock in another netns, all
transports in this assoc are not to be rehashed and keep use the old
key in hashtable.
As a transport uses sk->net as the hash key to insert into hashtable,
it would miss removing these transports from hashtable due to the new
netns when closing the sock and all transports are being freeed, then
later an use-after-free issue could be caused when looking up an asoc
and dereferencing those transports.
This is a very old issue since very beginning, ChunYu found it with
syzkaller fuzz testing with this series:
socket$inet6_sctp()
bind$inet6()
sendto$inet6()
unshare(0x40000000)
getsockopt$inet_sctp6_SCTP_GET_ASSOC_ID_LIST()
getsockopt$inet_sctp6_SCTP_SOCKOPT_PEELOFF()
This patch is to block this call when peeling one assoc off from one
netns to another one, so that the netns of all transport would not
go out-sync with the key in hashtable.
Note that this patch didn't fix it by rehashing transports, as it's
difficult to handle the situation when the tuple is already in use
in the new netns. Besides, no one would like to peel off one assoc
to another netns, considering ipaddrs, ifaces, etc. are usually
different.
Reported-by: ChunYu Wang <chunwang@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d04adf1b35 ]
Now when migrating sock to another one in sctp_sock_migrate(), it only
resets owner sk for the data in receive queues, not the chunks on out
queues.
It would cause that data chunks length on the sock is not consistent
with sk sk_wmem_alloc. When closing the sock or freeing these chunks,
the old sk would never be freed, and the new sock may crash due to
the overflow sk_wmem_alloc.
syzbot found this issue with this series:
r0 = socket$inet_sctp()
sendto$inet(r0)
listen(r0)
accept4(r0)
close(r0)
Although listen() should have returned error when one TCP-style socket
is in connecting (I may fix this one in another patch), it could also
be reproduced by peeling off an assoc.
This issue is there since very beginning.
This patch is to reset owner sk for the chunks on out queues so that
sk sk_wmem_alloc has correct value after accept one sock or peeloff
an assoc to one sock.
Note that when resetting owner sk for chunks on outqueue, it has to
sctp_clear_owner_w/skb_orphan chunks before changing assoc->base.sk
first and then sctp_set_owner_w them after changing assoc->base.sk,
due to that sctp_wfree and it's callees are using assoc->base.sk.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b71d21c274 ]
Commit 9b97420228 ("sctp: support ipv6 nonlocal bind")
introduced support for the above options as v4 sctp did,
so patched sctp_v6_available().
In the v4 implementation it's enough, because
sctp_inet_bind_verify() just returns with sctp_v4_available().
However sctp_inet6_bind_verify() has an extra check before that
for link-local scope_id, which won't respect the above options.
Added the checks before calling ipv6_chk_addr(), but
not before the validation of scope_id.
before (w/ both options):
./v6test fe80::10 sctp
bind failed, errno: 99 (Cannot assign requested address)
./v6test fe80::10 tcp
bind success, errno: 0 (Success)
after (w/ both options):
./v6test fe80::10 sctp
bind success, errno: 0 (Success)
Signed-off-by: Laszlo Toth <laszlth@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1cc276cec9 ]
Now sctp processes icmp redirect packet in sctp_icmp_redirect where
it calls sctp_transport_dst_check in which tp->dst can be released.
The problem is before calling sctp_transport_dst_check, it doesn't
check sock_owned_by_user, which means tp->dst could be freed while
a process is accessing it with owning the socket.
An use-after-free issue could be triggered by this.
This patch is to fix it by checking sock_owned_by_user before calling
sctp_transport_dst_check in sctp_icmp_redirect, so that it would not
release tp->dst if users still hold sock lock.
Besides, the same issue fixed in commit 45caeaa5ac ("dccp/tcp: fix
routing redirect race") on sctp also needs this check.
Fixes: 55be7a9c60 ("ipv4: Add redirect support to all protocol icmp error handlers")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7906b00f5c ]
Commit fb586f2530 ("sctp: delay calls to sk_data_ready() as much as
possible") minimized the number of wake ups that are triggered in case
the association receives a packet with multiple data chunks on it and/or
when io_events are enabled and then commit 0970f5b366 ("sctp: signal
sk_data_ready earlier on data chunks reception") moved the wake up to as
soon as possible. It thus relies on the state machine running later to
clean the flag that the event was already generated.
The issue is that there are 2 call paths that calls
sctp_ulpq_tail_event() outside of the state machine, causing the flag to
linger and possibly omitting a needed wake up in the sequence.
One of the call paths is when enabling SCTP_SENDER_DRY_EVENTS via
setsockopt(SCTP_EVENTS), as noticed by Harald Welte. The other is when
partial reliability triggers removal of chunks from the send queue when
the application calls sendmsg().
This commit fixes it by not setting the flag in case the socket is not
owned by the user, as it won't be cleaned later. This works for
user-initiated calls and also for rx path processing.
Fixes: fb586f2530 ("sctp: delay calls to sk_data_ready() as much as possible")
Reported-by: Harald Welte <laforge@gnumonks.org>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 912964eacb ]
Commit 6f29a13061 ("sctp: sctp_addr_id2transport should verify the
addr before looking up assoc") invoked sctp_verify_addr to verify the
addr.
But it didn't check af variable beforehand, once users pass an address
with family = 0 through sockopt, sctp_get_af_specific will return NULL
and NULL pointer dereference will be caused by af->sockaddr_len.
This patch is to fix it by returning NULL if af variable is NULL.
Fixes: 6f29a13061 ("sctp: sctp_addr_id2transport should verify the addr before looking up assoc")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 988c732211 ]
In sctp_for_each_transport, pos is used to save how many objs it has
dumped. Now it gets the last obj by sctp_transport_get_idx, then gets
the next obj by sctp_transport_get_next.
The issue is that in the meanwhile if some objs in transport hashtable
are removed and the objs nums are less than pos, sctp_transport_get_idx
would return NULL and hti.walker.tbl is NULL as well. At this moment
it should stop hti, instead of continue getting the next obj. Or it
would cause a NULL pointer dereference in sctp_transport_get_next.
This patch is to pass pos + 1 into sctp_transport_get_idx to get the
next obj directly, even if pos > objs nums, it would return NULL and
stop hti.
Fixes: 626d16f50f ("sctp: export some apis or variables for sctp_diag and reuse some for proc")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 581409dacc ]
Now sctp holds read_lock when foreach sctp_ep_hashtable without disabling
BH. If CPU schedules to another thread A at this moment, the thread A may
be trying to hold the write_lock with disabling BH.
As BH is disabled and CPU cannot schedule back to the thread holding the
read_lock, while the thread A keeps waiting for the read_lock. A dead
lock would be triggered by this.
This patch is to fix this dead lock by calling read_lock_bh instead to
disable BH when holding the read_lock in sctp_for_each_endpoint.
Fixes: 626d16f50f ("sctp: export some apis or variables for sctp_diag and reuse some for proc")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6f29a13061 ]
sctp_addr_id2transport is a function for sockopt to look up assoc by
address. As the address is from userspace, it can be a v4-mapped v6
address. But in sctp protocol stack, it always handles a v4-mapped
v6 address as a v4 address. So it's necessary to convert it to a v4
address before looking up assoc by address.
This patch is to fix it by calling sctp_verify_addr in which it can do
this conversion before calling sctp_endpoint_lookup_assoc, just like
what sctp_sendmsg and __sctp_connect do for the address from users.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5207f39963 ]
Now sctp gso puts segments into skb's frag_list, then processes these
segments in skb_segment. But skb_segment handles them only when gs is
enabled, as it's in the same branch with skb's frags.
Although almost all the NICs support sg other than some old ones, but
since commit 1e16aa3ddf ("net: gso: use feature flag argument in all
protocol gso handlers"), features &= skb->dev->hw_enc_features, and
xfrm_output_gso call skb_segment with features = 0, which means sctp
gso would call skb_segment with sg = 0, and skb_segment would not work
as expected.
This patch is to fix it by setting features param with NETIF_F_SG when
calling skb_segment so that it can go the right branch to process the
skb's frag_list.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 804ec7ebe8 ]
sometimes ICMP replies to INIT chunks are ignored by the client, even if
the encapsulated SCTP headers match an open socket. This happens when the
ICMP packet is carried by a paged skb: use skb_header_pointer() to read
packet contents beyond the SCTP header, so that chunk header and initiate
tag are validated correctly.
v2:
- don't use skb_header_pointer() to read the transport header, since
icmp_socket_deliver() already puts these 8 bytes in the linear area.
- change commit message to make specific reference to INIT chunks.
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit dbc2b5e9a0 ]
Commit 0ca50d12fe ("sctp: fix src address selection if using secondary
addresses") has fixed a src address selection issue when using secondary
addresses for ipv4.
Now sctp ipv6 also has the similar issue. When using a secondary address,
sctp_v6_get_dst tries to choose the saddr which has the most same bits
with the daddr by sctp_v6_addr_match_len. It may make some cases not work
as expected.
hostA:
[1] fd21:356b:459a:cf10::11 (eth1)
[2] fd21:356b:459a:cf20::11 (eth2)
hostB:
[a] fd21:356b:459a:cf30::2 (eth1)
[b] fd21:356b:459a:cf40::2 (eth2)
route from hostA to hostB:
fd21:356b:459a:cf30::/64 dev eth1 metric 1024 mtu 1500
The expected path should be:
fd21:356b:459a:cf10::11 <-> fd21:356b:459a:cf30::2
But addr[2] matches addr[a] more bits than addr[1] does, according to
sctp_v6_addr_match_len. It causes the path to be:
fd21:356b:459a:cf20::11 <-> fd21:356b:459a:cf30::2
This patch is to fix it with the same way as Marcelo's fix for sctp ipv4.
As no ip_dev_find for ipv6, this patch is to use ipv6_chk_addr to check
if the saddr is in a dev instead.
Note that for backwards compatibility, it will still do the addr_match_len
check here when no optimal is found.
Reported-by: Patrick Talbert <ptalbert@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 34b2789f1d ]
Now sctp doesn't check sock's state before listening on it. It could
even cause changing a sock with any state to become a listening sock
when doing sctp_listen.
This patch is to fix it by checking sock's state in sctp_listen, so
that it will listen on the sock with right state.
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dfcb9f4f99 upstream.
commit 2dcab59848 ("sctp: avoid BUG_ON on sctp_wait_for_sndbuf")
attempted to avoid a BUG_ON call when the association being used for a
sendmsg() is blocked waiting for more sndbuf and another thread did a
peeloff operation on such asoc, moving it to another socket.
As Ben Hutchings noticed, then in such case it would return without
locking back the socket and would cause two unlocks in a row.
Further analysis also revealed that it could allow a double free if the
application managed to peeloff the asoc that is created during the
sendmsg call, because then sctp_sendmsg() would try to free the asoc
that was created only for that call.
This patch takes another approach. It will deny the peeloff operation
if there is a thread sleeping on the asoc, so this situation doesn't
exist anymore. This avoids the issues described above and also honors
the syscalls that are already being handled (it can be multiple sendmsg
calls).
Joint work with Xin Long.
Fixes: 2dcab59848 ("sctp: avoid BUG_ON on sctp_wait_for_sndbuf")
Cc: Alexander Popov <alex.popov@linux.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 92e55f412c upstream.
Unlike ipv4, this control socket is shared by all cpus so we cannot use
it as scratchpad area to annotate the mark that we pass to ip6_xmit().
Add a new parameter to ip6_xmit() to indicate the mark. The SCTP socket
family caches the flowi6 structure in the sctp_transport structure, so
we cannot use to carry the mark unless we later on reset it back, which
I discarded since it looks ugly to me.
Fixes: bf99b4ded5 ("tcp: fix mark propagation with fwmark_reflect enabled")
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2dcab59848 ]
Alexander Popov reported that an application may trigger a BUG_ON in
sctp_wait_for_sndbuf if the socket tx buffer is full, a thread is
waiting on it to queue more data and meanwhile another thread peels off
the association being used by the first thread.
This patch replaces the BUG_ON call with a proper error handling. It
will return -EPIPE to the original sendmsg call, similarly to what would
have been done if the association wasn't found in the first place.
Acked-by: Alexander Popov <alex.popov@linux.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 08abb79542 ]
Prior to this patch, sctp_transport_lookup_process didn't rcu_read_unlock
when it failed to find a transport by sctp_addrs_lookup_transport.
This patch is to fix it by moving up rcu_read_unlock right before checking
transport and also to remove the out path.
Fixes: 1cceda7849 ("sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Now when users shutdown a sock with SEND_SHUTDOWN in sctp, even if
this sock has no connection (assoc), sk state would be changed to
SCTP_SS_CLOSING, which is not as we expect.
Besides, after that if users try to listen on this sock, kernel
could even panic when it dereference sctp_sk(sk)->bind_hash in
sctp_inet_listen, as bind_hash is null when sock has no assoc.
This patch is to move sk state change after checking sk assocs
is not empty, and also merge these two if() conditions and reduce
indent level.
Fixes: d46e416c11 ("sctp: sctp should change socket state when shutdown is received")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sctp_wait_for_connect() currently already holds the asoc to keep it
alive during the sleep, in case another thread release it. But Andrey
Konovalov and Dmitry Vyukov reported an use-after-free in such
situation.
Problem is that __sctp_connect() doesn't get a ref on the asoc and will
do a read on the asoc after calling sctp_wait_for_connect(), but by then
another thread may have closed it and the _put on sctp_wait_for_connect
will actually release it, causing the use-after-free.
Fix is, instead of doing the read after waiting for the connect, do it
before so, and avoid this issue as the socket is still locked by then.
There should be no issue on returning the asoc id in case of failure as
the application shouldn't trust on that number in such situations
anyway.
This issue doesn't exist in sctp_sendmsg() path.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prior to this patch, in rx path, before calling lock_sock, it needed to
hold assoc when got it by __sctp_lookup_association, in case other place
would free/put assoc.
But in __sctp_lookup_association, it lookup and hold transport, then got
assoc by transport->assoc, then hold assoc and put transport. It means
it didn't hold transport, yet it was returned and later on directly
assigned to chunk->transport.
Without the protection of sock lock, the transport may be freed/put by
other places, which would cause a use-after-free issue.
This patch is to fix this issue by holding transport instead of assoc.
As holding transport can make sure to access assoc is also safe, and
actually it looks up assoc by searching transport rhashtable, to hold
transport here makes more sense.
Note that the function will be renamed later on on another patch.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prior to this patch, it used a local variable to save the transport that is
looked up by __sctp_lookup_association(), and didn't return it back. But in
sctp_rcv, it is used to initialize chunk->transport. So when hitting this,
even if it found the transport, it was still initializing chunk->transport
with null instead.
This patch is to return the transport back through transport pointer
that is from __sctp_rcv_lookup_harder().
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In sctp_transport_lookup_process(), Commit 1cceda7849 ("sctp: fix
the issue sctp_diag uses lock_sock in rcu_read_lock") moved cb() out
of rcu lock, but it put transport and hold assoc instead, and ignore
that cb() still uses transport. It may cause a use-after-free issue.
This patch is to hold transport instead of assoc there.
Fixes: 1cceda7849 ("sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrey Konovalov reported that KASAN detected that SCTP was using a slab
beyond the boundaries. It was caused because when handling out of the
blue packets in function sctp_sf_ootb() it was checking the chunk len
only after already processing the first chunk, validating only for the
2nd and subsequent ones.
The fix is to just move the check upwards so it's also validated for the
1st chunk.
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 7303a14750 ("sctp: identify chunks that need to be fragmented
at IP level") made the chunk be fragmented at IP level in the next round
if it's size exceed PMTU.
But there still is another case, PMTU can be updated if transport's dst
expires and transport's pmtu_pending is set in sctp_packet_transmit. If
the new PMTU is less than the chunk, the same issue with that commit can
be triggered.
So we should drop this packet and let it retransmit in another round
where it would be fragmented at IP level.
This patch is to fix it by checking the chunk size after PMTU may be
updated and dropping this packet if it's size exceed PMTU.
Fixes: 90017accff ("sctp: Add GSO support")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@txudriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most of getsockopt handlers in net/sctp/socket.c check len against
sizeof some structure like:
if (len < sizeof(int))
return -EINVAL;
On the first look, the check seems to be correct. But since len is int
and sizeof returns size_t, int gets promoted to unsigned size_t too. So
the test returns false for negative lengths. Yes, (-1 < sizeof(long)) is
false.
Fix this in sctp by explicitly checking len < 0 before any getsockopt
handler is called.
Note that sctp_getsockopt_events already handled the negative case.
Since we added the < 0 check elsewhere, this one can be removed.
If not checked, this is the result:
UBSAN: Undefined behaviour in ../mm/page_alloc.c:2722:19
shift exponent 52 is too large for 32-bit type 'int'
CPU: 1 PID: 24535 Comm: syz-executor Not tainted 4.8.1-0-syzkaller #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
0000000000000000 ffff88006d99f2a8 ffffffffb2f7bdea 0000000041b58ab3
ffffffffb4363c14 ffffffffb2f7bcde ffff88006d99f2d0 ffff88006d99f270
0000000000000000 0000000000000000 0000000000000034 ffffffffb5096422
Call Trace:
[<ffffffffb3051498>] ? __ubsan_handle_shift_out_of_bounds+0x29c/0x300
...
[<ffffffffb273f0e4>] ? kmalloc_order+0x24/0x90
[<ffffffffb27416a4>] ? kmalloc_order_trace+0x24/0x220
[<ffffffffb2819a30>] ? __kmalloc+0x330/0x540
[<ffffffffc18c25f4>] ? sctp_getsockopt_local_addrs+0x174/0xca0 [sctp]
[<ffffffffc18d2bcd>] ? sctp_getsockopt+0x10d/0x1b0 [sctp]
[<ffffffffb37c1219>] ? sock_common_getsockopt+0xb9/0x150
[<ffffffffb37be2f5>] ? SyS_getsockopt+0x1a5/0x270
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-sctp@vger.kernel.org
Cc: netdev@vger.kernel.org
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When sctp dumps all the ep->assocs, it needs to lock_sock first,
but now it locks sock in rcu_read_lock, and lock_sock may sleep,
which would break rcu_read_lock.
This patch is to get and hold one sock when traversing the list.
After that and get out of rcu_read_lock, lock and dump it. Then
it will traverse the list again to get the next one until all
sctp socks are dumped.
For sctp_diag_dump_one, it fixes this issue by holding asoc and
moving cb() out of rcu_read_lock in sctp_transport_lookup_process.
Fixes: 8f840e47f1 ("sctp: add the sctp_diag.c file")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now before using prsctp polices, sctp uses asoc->prsctp_enable to
check if prsctp is enabled. However asoc->prsctp_enable is set only
means local host support prsctp, sctp should not abandon packet if
peer host doesn't enable prsctp.
So this patch is to use asoc->peer.prsctp_capable to check if prsctp
is enabled on both side, instead of asoc->prsctp_enable, as asoc's
peer.prsctp_capable is set only when local and peer both enable prsctp.
Fixes: a6c2f79287 ("sctp: implement prsctp TTL policy")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now sctp uses chunk->prsctp_param to save the prsctp param for all the
prsctp polices, we didn't need to introduce prsctp_param to sctp_chunk.
We can just use chunk->sinfo.sinfo_timetolive for RTX and BUF polices,
and reuse msg->expires_at for TTL policy, as the prsctp polices and old
expires policy are mutual exclusive.
This patch is to remove prsctp_param from sctp_chunk, and reuse msg's
expires_at for TTL and chunk's sinfo.sinfo_timetolive for RTX and BUF
polices.
Note that sctp can't use chunk's sinfo.sinfo_timetolive for TTL policy,
as it needs a u64 variables to save the expires_at time.
This one also fixes the "netperf-Throughput_Mbps -37.2% regression"
issue.
Fixes: a6c2f79287 ("sctp: implement prsctp TTL policy")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is to suppress the checkpatch.pl warning "Comparison to NULL
could be written". No functional changes here.
Signed-off-by: Jia He <hejianet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is to use the generic interfaces snmp_get_cpu_field{,64}_batch to
aggregate the data by going through all the items of each cpu sequentially.
Signed-off-by: Jia He <hejianet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sctp_acked() is using 32bit arithmetics on 16bits vars, via TSN_lte()
macros, which is weird and confusing.
Once the offset to ctsn is calculated, all wrapping is already handled
and thus to verify the Gap Ack blocks we can just use pure
less/big-or-equal than checks.
Also, rename gap variable to tsn_offset, so it's more meaningful, as
it doesn't point to any gap at all.
Even so, I don't think this discrepancy resulted in any practical bug.
This patch is a preparation for the next one, which will introduce
typecheck() for TSN_lte() macros and would cause a compile error here.
Suggested-by: David Laight <David.Laight@ACULAB.COM>
Reported-by: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
And avoid the usage of '&~3'. This is the last place still not using
the macro.
Also break the line to make it easier to read.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To something more meaningful these days, specially because this is
working on packet headers or lengths and which are not tied to any CPU
arch but to the protocol itself.
So, WORD_TRUNC becomes SCTP_TRUNC4 and WORD_ROUND becomes SCTP_PAD4.
Reported-by: David Laight <David.Laight@ACULAB.COM>
Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 311b21774f ("sctp: simplify sk_receive_queue locking"), a call
to 'skb_queue_splice_tail_init()' has been made explicit. Previously it was
hidden in 'sctp_skb_list_tail()'
Now, the code around it looks redundant. The '_init()' part of
'skb_queue_splice_tail_init()' should already do the same.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As David and Marcelo's suggestion, ENOMEM err shouldn't return back to
user in transmit path. Instead, sctp's retransmit would take care of
the chunks that fail to send because of ENOMEM.
This patch is only to do some release job when alloc_skb fails, not to
return ENOMEM back any more.
Besides, it also cleans up sctp_packet_transmit's err path, and fixes
some issues in err path:
- It didn't free the head skb in nomem: path.
- No need to check nskb in no_route: path.
- It should goto err: path if alloc_skb fails for head.
- Not all the NOMEMs should free nskb.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sctp_outq_flush return value is meaningless now, this patch is
to make sctp_outq_flush return void, as well as sctp_outq_fail
and sctp_outq_uncork.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Every time when sctp calls sctp_outq_flush, it sends out the chunks of
control queue, retransmit queue and data queue. Even if some trunks are
failed to transmit, it still has to flush all the transports, as it's
the only chance to clean that transmit_list.
So the latest transmit error here should be returned back. This transmit
error is an internal error of sctp stack.
I checked all the places where it uses the transmit error (the return
value of sctp_outq_flush), most of them are actually just save it to
sk_err.
Except for sctp_assoc/endpoint_bh_rcv, they will drop the chunk if
it's failed to send a REPLY, which is actually incorrect, as we can't
be sure the error that sctp_outq_flush returns is from sending that
REPLY.
So it's meaningless for sctp_outq_flush to return error back.
This patch is to save transmit error to sk_err in sctp_outq_flush, the
new error can update the old value. Eventually, sctp_wait_for_* would
check for it.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Last patch "sctp: do not return the transmit err back to sctp_sendmsg"
made sctp_primitive_SEND return err only when asoc state is unavailable.
In this case, chunks are not enqueued, they have no chance to be freed if
we don't take care of them later.
This Patch is actually to revert commit 1cd4d5c432 ("sctp: remove the
unused sctp_datamsg_free()"), commit 69b5777f2e ("sctp: hold the chunks
only after the chunk is enqueued in outq") and commit 8b570dc9f7 ("sctp:
only drop the reference on the datamsg after sending a msg"), to use
sctp_datamsg_free to free the chunks of current msg.
Fixes: 8b570dc9f7 ("sctp: only drop the reference on the datamsg after sending a msg")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Once a chunk is enqueued successfully, sctp queues can take care of it.
Even if it is failed to transmit (like because of nomem), it should be
put into retransmit queue.
If sctp report this error to users, it confuses them, they may resend
that msg, but actually in kernel sctp stack is in charge of retransmit
it already.
Besides, this error probably is not from the failure of transmitting
current msg, but transmitting or retransmitting another msg's chunks,
as sctp_outq_flush just tries to send out all transports' chunks.
This patch is to make sctp_cmd_send_msg return avoid, and not return the
transmit err back to sctp_sendmsg
Fixes: 8b570dc9f7 ("sctp: only drop the reference on the datamsg after sending a msg")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Data Chunks are only sent by sctp_primitive_SEND, in which sctp checks
the asoc's state through statetable before calling sctp_outq_tail. So
there's no need to check the asoc's state again in sctp_outq_tail.
Besides, sctp_do_sm is protected by lock_sock, even if sending msg is
interrupted by timer events, the event's processes still need to acquire
lock_sock first. It means no others CMDs can be enqueue into side effect
list before CMD_SEND_MSG to change asoc->state, so it's safe to remove it.
This patch is to remove redundant asoc->state check from sctp_outq_tail.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>