Commit Graph

7 Commits

Author SHA1 Message Date
Jakub Sitnicki
46745a88b5 bpf: Stop the psock parser before canceling its work
commit e8e3437762 upstream.

We might have never enabled (started) the psock's parser, in which case it
will not get stopped when destroying the psock. This leads to a warning
when trying to cancel parser's work from psock's deferred destructor:

[  405.325769] WARNING: CPU: 1 PID: 3216 at net/strparser/strparser.c:526 strp_done+0x3c/0x40
[  405.326712] Modules linked in: [last unloaded: test_bpf]
[  405.327359] CPU: 1 PID: 3216 Comm: kworker/1:164 Tainted: G        W         5.0.0 #42
[  405.328294] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014
[  405.329712] Workqueue: events sk_psock_destroy_deferred
[  405.330254] RIP: 0010:strp_done+0x3c/0x40
[  405.330706] Code: 28 e8 b8 d5 6b ff 48 8d bb 80 00 00 00 e8 9c d5 6b ff 48 8b 7b 18 48 85 ff 74 0d e8 1e a5 e8 ff 48 c7 43 18 00 00 00 00 5b c3 <0f> 0b eb cf 66 66 66 66 90 55 89 f5 53 48 89 fb 48 83 c7 28 e8 0b
[  405.332862] RSP: 0018:ffffc900026bbe50 EFLAGS: 00010246
[  405.333482] RAX: ffffffff819323e0 RBX: ffff88812cb83640 RCX: ffff88812cb829e8
[  405.334228] RDX: 0000000000000001 RSI: ffff88812cb837e8 RDI: ffff88812cb83640
[  405.335366] RBP: ffff88813fd22680 R08: 0000000000000000 R09: 000073746e657665
[  405.336472] R10: 8080808080808080 R11: 0000000000000001 R12: ffff88812cb83600
[  405.337760] R13: 0000000000000000 R14: ffff88811f401780 R15: ffff88812cb837e8
[  405.338777] FS:  0000000000000000(0000) GS:ffff88813fd00000(0000) knlGS:0000000000000000
[  405.339903] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  405.340821] CR2: 00007fb11489a6b8 CR3: 000000012d4d6000 CR4: 00000000000406e0
[  405.341981] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  405.343131] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  405.344415] Call Trace:
[  405.344821]  sk_psock_destroy_deferred+0x23/0x1b0
[  405.345585]  process_one_work+0x1ae/0x3e0
[  405.346110]  worker_thread+0x3c/0x3b0
[  405.346576]  ? pwq_unbound_release_workfn+0xd0/0xd0
[  405.347187]  kthread+0x11d/0x140
[  405.347601]  ? __kthread_parkme+0x80/0x80
[  405.348108]  ret_from_fork+0x35/0x40
[  405.348566] ---[ end trace a4a3af4026a327d4 ]---

Stop psock's parser just before canceling its work.

Fixes: 1d79895aef ("sk_msg: Always cancel strp work before freeing the psock")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-13 14:04:19 -07:00
Jakub Sitnicki
28fe1dcb59 sk_msg: Always cancel strp work before freeing the psock
commit 1d79895aef upstream.

Despite having stopped the parser, we still need to deinitialize it
by calling strp_done so that it cancels its work. Otherwise the worker
thread can run after we have freed the parser, and attempt to access
its workqueue resulting in a use-after-free:

==================================================================
BUG: KASAN: use-after-free in pwq_activate_delayed_work+0x1b/0x1d0
Read of size 8 at addr ffff888069975240 by task kworker/u2:2/93

CPU: 0 PID: 93 Comm: kworker/u2:2 Not tainted 5.0.0-rc2-00335-g28f9d1a3d4fe-dirty #14
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
Workqueue:            (null) (kstrp)
Call Trace:
 print_address_description+0x6e/0x2b0
 ? pwq_activate_delayed_work+0x1b/0x1d0
 kasan_report+0xfd/0x177
 ? pwq_activate_delayed_work+0x1b/0x1d0
 ? pwq_activate_delayed_work+0x1b/0x1d0
 pwq_activate_delayed_work+0x1b/0x1d0
 ? process_one_work+0x4aa/0x660
 pwq_dec_nr_in_flight+0x9b/0x100
 worker_thread+0x82/0x680
 ? process_one_work+0x660/0x660
 kthread+0x1b9/0x1e0
 ? __kthread_create_on_node+0x250/0x250
 ret_from_fork+0x1f/0x30

Allocated by task 111:
 sk_psock_init+0x3c/0x1b0
 sock_map_link.isra.2+0x103/0x4b0
 sock_map_update_common+0x94/0x270
 sock_map_update_elem+0x145/0x160
 __se_sys_bpf+0x152e/0x1e10
 do_syscall_64+0xb2/0x3e0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Freed by task 112:
 kfree+0x7f/0x140
 process_one_work+0x40b/0x660
 worker_thread+0x82/0x680
 kthread+0x1b9/0x1e0
 ret_from_fork+0x1f/0x30

The buggy address belongs to the object at ffff888069975180
 which belongs to the cache kmalloc-512 of size 512
The buggy address is located 192 bytes inside of
 512-byte region [ffff888069975180, ffff888069975380)
The buggy address belongs to the page:
page:ffffea0001a65d00 count:1 mapcount:0 mapping:ffff88806d401280 index:0x0 compound_mapcount: 0
flags: 0x4000000000010200(slab|head)
raw: 4000000000010200 dead000000000100 dead000000000200 ffff88806d401280
raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff888069975100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff888069975180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff888069975200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                           ^
 ffff888069975280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff888069975300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

Reported-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/netdev/CAJPywTLwgXNEZ2dZVoa=udiZmtrWJ0q5SuBW64aYs0Y1khXX3A@mail.gmail.com
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-13 14:04:19 -07:00
John Fastabend
47c12a8a77 bpf: sk_msg, fix socket data_ready events
[ Upstream commit 552de91068 ]

When a skb verdict program is in-use and either another BPF program
redirects to that socket or the new SK_PASS support is used the
data_ready callback does not wake up application. Instead because
the stream parser/verdict is using the sk data_ready callback we wake
up the stream parser/verdict block.

Fix this by adding a helper to check if the stream parser block is
enabled on the sk and if so call the saved pointer which is the
upper layers wake up function.

This fixes application stalls observed when an application is waiting
for data in a blocking read().

Fixes: d829e9c411 ("tls: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12 20:02:26 +01:00
John Fastabend
91a87d4fb0 bpf: sk_msg, zap ingress queue on psock down
[ Upstream commit a136678c0b ]

In addition to releasing any cork'ed data on a psock when the psock
is removed we should also release any skb's in the ingress work queue.
Otherwise the skb's eventually get free'd but late in the tear
down process so we see the WARNING due to non-zero sk_forward_alloc.

  void sk_stream_kill_queues(struct sock *sk)
  {
	...
	WARN_ON(sk->sk_forward_alloc);
	...
  }

Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12 20:02:26 +01:00
Vakul Garg
5c1e7e94a7 Prevent overflow of sk_msg in sk_msg_clone()
Fixed function sk_msg_clone() to prevent overflow of 'dst' while adding
pages in scatterlist entries. The overflow of 'dst' causes crash in kernel
tls module while doing record encryption.

Crash fixed by this patch.

[   78.796119] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[   78.804900] Mem abort info:
[   78.807683]   ESR = 0x96000004
[   78.810744]   Exception class = DABT (current EL), IL = 32 bits
[   78.816677]   SET = 0, FnV = 0
[   78.819727]   EA = 0, S1PTW = 0
[   78.822873] Data abort info:
[   78.825759]   ISV = 0, ISS = 0x00000004
[   78.829600]   CM = 0, WnR = 0
[   78.832576] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000bf8ee311
[   78.839195] [0000000000000008] pgd=0000000000000000
[   78.844081] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[   78.849642] Modules linked in: tls xt_conntrack ipt_REJECT nf_reject_ipv4 ip6table_filter ip6_tables xt_CHECKSUM cpve cpufreq_conservative lm90 ina2xx crct10dif_ce
[   78.865377] CPU: 0 PID: 6007 Comm: openssl Not tainted 4.20.0-rc6-01647-g754d5da63145-dirty #107
[   78.874149] Hardware name: LS1043A RDB Board (DT)
[   78.878844] pstate: 60000005 (nZCv daif -PAN -UAO)
[   78.883632] pc : scatterwalk_copychunks+0x164/0x1c8
[   78.888500] lr : scatterwalk_copychunks+0x160/0x1c8
[   78.893366] sp : ffff00001d04b600
[   78.896668] x29: ffff00001d04b600 x28: ffff80006814c680
[   78.901970] x27: 0000000000000000 x26: ffff80006c8de786
[   78.907272] x25: ffff00001d04b760 x24: 000000000000001a
[   78.912573] x23: 0000000000000006 x22: ffff80006814e440
[   78.917874] x21: 0000000000000100 x20: 0000000000000000
[   78.923175] x19: 000081ffffffffff x18: 0000000000000400
[   78.928476] x17: 0000000000000008 x16: 0000000000000000
[   78.933778] x15: 0000000000000100 x14: 0000000000000001
[   78.939079] x13: 0000000000001080 x12: 0000000000000020
[   78.944381] x11: 0000000000001080 x10: 00000000ffff0002
[   78.949683] x9 : ffff80006814c248 x8 : 00000000ffff0000
[   78.954985] x7 : ffff80006814c318 x6 : ffff80006c8de786
[   78.960286] x5 : 0000000000000f80 x4 : ffff80006c8de000
[   78.965588] x3 : 0000000000000000 x2 : 0000000000001086
[   78.970889] x1 : ffff7e0001b74e02 x0 : 0000000000000000
[   78.976192] Process openssl (pid: 6007, stack limit = 0x00000000291367f9)
[   78.982968] Call trace:
[   78.985406]  scatterwalk_copychunks+0x164/0x1c8
[   78.989927]  skcipher_walk_next+0x28c/0x448
[   78.994099]  skcipher_walk_done+0xfc/0x258
[   78.998187]  gcm_encrypt+0x434/0x4c0
[   79.001758]  tls_push_record+0x354/0xa58 [tls]
[   79.006194]  bpf_exec_tx_verdict+0x1e4/0x3e8 [tls]
[   79.010978]  tls_sw_sendmsg+0x650/0x780 [tls]
[   79.015326]  inet_sendmsg+0x2c/0xf8
[   79.018806]  sock_sendmsg+0x18/0x30
[   79.022284]  __sys_sendto+0x104/0x138
[   79.025935]  __arm64_sys_sendto+0x24/0x30
[   79.029936]  el0_svc_common+0x60/0xe8
[   79.033588]  el0_svc_handler+0x2c/0x80
[   79.037327]  el0_svc+0x8/0xc
[   79.040200] Code: 6b01005f 54fff788 940169b1 f9000320 (b9400801)
[   79.046283] ---[ end trace 74db007d069c1cf7 ]---

Fixes: d829e9c411 ("tls: convert to generic sk_msg interface")
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-21 09:12:49 -08:00
Daniel Borkmann
d829e9c411 tls: convert to generic sk_msg interface
Convert kTLS over to make use of sk_msg interface for plaintext and
encrypted scattergather data, so it reuses all the sk_msg helpers
and data structure which later on in a second step enables to glue
this to BPF.

This also allows to remove quite a bit of open coded helpers which
are covered by the sk_msg API. Recent changes in kTLs 80ece6a03a
("tls: Remove redundant vars from tls record structure") and
4e6d47206c ("tls: Add support for inplace records encryption")
changed the data path handling a bit; while we've kept the latter
optimization intact, we had to undo the former change to better
fit the sk_msg model, hence the sg_aead_in and sg_aead_out have
been brought back and are linked into the sk_msg sgs. Now the kTLS
record contains a msg_plaintext and msg_encrypted sk_msg each.

In the original code, the zerocopy_from_iter() has been used out
of TX but also RX path. For the strparser skb-based RX path,
we've left the zerocopy_from_iter() in decrypt_internal() mostly
untouched, meaning it has been moved into tls_setup_from_iter()
with charging logic removed (as not used from RX). Given RX path
is not based on sk_msg objects, we haven't pursued setting up a
dummy sk_msg to call into sk_msg_zerocopy_from_iter(), but it
could be an option to prusue in a later step.

Joint work with John.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15 12:23:19 -07:00
Daniel Borkmann
604326b41a bpf, sockmap: convert to generic sk_msg interface
Add a generic sk_msg layer, and convert current sockmap and later
kTLS over to make use of it. While sk_buff handles network packet
representation from netdevice up to socket, sk_msg handles data
representation from application to socket layer.

This means that sk_msg framework spans across ULP users in the
kernel, and enables features such as introspection or filtering
of data with the help of BPF programs that operate on this data
structure.

Latter becomes in particular useful for kTLS where data encryption
is deferred into the kernel, and as such enabling the kernel to
perform L7 introspection and policy based on BPF for TLS connections
where the record is being encrypted after BPF has run and came to
a verdict. In order to get there, first step is to transform open
coding of scatter-gather list handling into a common core framework
that subsystems can use.

The code itself has been split and refactored into three bigger
pieces: i) the generic sk_msg API which deals with managing the
scatter gather ring, providing helpers for walking and mangling,
transferring application data from user space into it, and preparing
it for BPF pre/post-processing, ii) the plain sock map itself
where sockets can be attached to or detached from; these bits
are independent of i) which can now be used also without sock
map, and iii) the integration with plain TCP as one protocol
to be used for processing L7 application data (later this could
e.g. also be extended to other protocols like UDP). The semantics
are the same with the old sock map code and therefore no change
of user facing behavior or APIs. While pursuing this work it
also helped finding a number of bugs in the old sockmap code
that we've fixed already in earlier commits. The test_sockmap
kselftest suite passes through fine as well.

Joint work with John.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15 12:23:19 -07:00