Commit Graph

39842 Commits

Author SHA1 Message Date
Steven Rostedt (Google)
02333de90e tracing/eprobes: Do not hardcode $comm as a string
The variable $comm is hard coded as a string, which is true for both
kprobes and uprobes, but for event probes (eprobes) it is a field name. In
most cases the "comm" field would be a string, but there's no guarantee of
that fact.

Do not assume that comm is a string. Not to mention, it currently forces
comm fields to fault, as string processing for event probes is currently
broken.

Link: https://lkml.kernel.org/r/20220820134400.756152112@goodmis.org

Cc: stable@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Fixes: 7491e2c442 ("tracing: Add a probe that attaches to trace events")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-08-21 15:56:08 -04:00
Steven Rostedt (Google)
2673c60ee6 tracing/eprobes: Do not allow eprobes to use $stack, or % for regs
While playing with event probes (eprobes), I tried to see what would
happen if I attempted to retrieve the instruction pointer (%rip) knowing
that event probes do not use pt_regs. The result was:

 BUG: kernel NULL pointer dereference, address: 0000000000000024
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] PREEMPT SMP PTI
 CPU: 1 PID: 1847 Comm: trace-cmd Not tainted 5.19.0-rc5-test+ #309
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01
v03.03 07/14/2016
 RIP: 0010:get_event_field.isra.0+0x0/0x50
 Code: ff 48 c7 c7 c0 8f 74 a1 e8 3d 8b f5 ff e8 88 09 f6 ff 4c 89 e7 e8
50 6a 13 00 48 89 ef 5b 5d 41 5c 41 5d e9 42 6a 13 00 66 90 <48> 63 47 24
8b 57 2c 48 01 c6 8b 47 28 83 f8 02 74 0e 83 f8 04 74
 RSP: 0018:ffff916c394bbaf0 EFLAGS: 00010086
 RAX: ffff916c854041d8 RBX: ffff916c8d9fbf50 RCX: ffff916c255d2000
 RDX: 0000000000000000 RSI: ffff916c255d2008 RDI: 0000000000000000
 RBP: 0000000000000000 R08: ffff916c3a2a0c08 R09: ffff916c394bbda8
 R10: 0000000000000000 R11: 0000000000000000 R12: ffff916c854041d8
 R13: ffff916c854041b0 R14: 0000000000000000 R15: 0000000000000000
 FS:  0000000000000000(0000) GS:ffff916c9ea40000(0000)
knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000024 CR3: 000000011b60a002 CR4: 00000000001706e0
 Call Trace:
  <TASK>
  get_eprobe_size+0xb4/0x640
  ? __mod_node_page_state+0x72/0xc0
  __eprobe_trace_func+0x59/0x1a0
  ? __mod_lruvec_page_state+0xaa/0x1b0
  ? page_remove_file_rmap+0x14/0x230
  ? page_remove_rmap+0xda/0x170
  event_triggers_call+0x52/0xe0
  trace_event_buffer_commit+0x18f/0x240
  trace_event_raw_event_sched_wakeup_template+0x7a/0xb0
  try_to_wake_up+0x260/0x4c0
  __wake_up_common+0x80/0x180
  __wake_up_common_lock+0x7c/0xc0
  do_notify_parent+0x1c9/0x2a0
  exit_notify+0x1a9/0x220
  do_exit+0x2ba/0x450
  do_group_exit+0x2d/0x90
  __x64_sys_exit_group+0x14/0x20
  do_syscall_64+0x3b/0x90
  entry_SYSCALL_64_after_hwframe+0x46/0xb0

Obviously this is not the desired result.

Move the testing for TPARG_FL_TPOINT which is only used for event probes
to the top of the "$" variable check, as all the other variables are not
used for event probes. Also add a check in the register parsing "%" to
fail if an event probe is used.

Link: https://lkml.kernel.org/r/20220820134400.564426983@goodmis.org

Cc: stable@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Fixes: 7491e2c442 ("tracing: Add a probe that attaches to trace events")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-08-21 15:56:08 -04:00
Yang Jihong
c3b0f72e80 ftrace: Fix NULL pointer dereference in is_ftrace_trampoline when ftrace is dead
ftrace_startup does not remove ops from ftrace_ops_list when
ftrace_startup_enable fails:

register_ftrace_function
  ftrace_startup
    __register_ftrace_function
      ...
      add_ftrace_ops(&ftrace_ops_list, ops)
      ...
    ...
    ftrace_startup_enable // if ftrace failed to modify, ftrace_disabled is set to 1
    ...
  return 0 // ops is in the ftrace_ops_list.

When ftrace_disabled = 1, unregister_ftrace_function simply returns without doing anything:
unregister_ftrace_function
  ftrace_shutdown
    if (unlikely(ftrace_disabled))
            return -ENODEV;  // return here, __unregister_ftrace_function is not executed,
                             // as a result, ops is still in the ftrace_ops_list
    __unregister_ftrace_function
    ...

If ops is dynamically allocated, it will be free later, in this case,
is_ftrace_trampoline accesses NULL pointer:

is_ftrace_trampoline
  ftrace_ops_trampoline
    do_for_each_ftrace_op(op, ftrace_ops_list) // OOPS! op may be NULL!

Syzkaller reports as follows:
[ 1203.506103] BUG: kernel NULL pointer dereference, address: 000000000000010b
[ 1203.508039] #PF: supervisor read access in kernel mode
[ 1203.508798] #PF: error_code(0x0000) - not-present page
[ 1203.509558] PGD 800000011660b067 P4D 800000011660b067 PUD 130fb8067 PMD 0
[ 1203.510560] Oops: 0000 [#1] SMP KASAN PTI
[ 1203.511189] CPU: 6 PID: 29532 Comm: syz-executor.2 Tainted: G    B   W         5.10.0 #8
[ 1203.512324] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 1203.513895] RIP: 0010:is_ftrace_trampoline+0x26/0xb0
[ 1203.514644] Code: ff eb d3 90 41 55 41 54 49 89 fc 55 53 e8 f2 00 fd ff 48 8b 1d 3b 35 5d 03 e8 e6 00 fd ff 48 8d bb 90 00 00 00 e8 2a 81 26 00 <48> 8b ab 90 00 00 00 48 85 ed 74 1d e8 c9 00 fd ff 48 8d bb 98 00
[ 1203.518838] RSP: 0018:ffffc900012cf960 EFLAGS: 00010246
[ 1203.520092] RAX: 0000000000000000 RBX: 000000000000007b RCX: ffffffff8a331866
[ 1203.521469] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000000010b
[ 1203.522583] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff8df18b07
[ 1203.523550] R10: fffffbfff1be3160 R11: 0000000000000001 R12: 0000000000478399
[ 1203.524596] R13: 0000000000000000 R14: ffff888145088000 R15: 0000000000000008
[ 1203.525634] FS:  00007f429f5f4700(0000) GS:ffff8881daf00000(0000) knlGS:0000000000000000
[ 1203.526801] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1203.527626] CR2: 000000000000010b CR3: 0000000170e1e001 CR4: 00000000003706e0
[ 1203.528611] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1203.529605] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Therefore, when ftrace_startup_enable fails, we need to rollback registration
process and remove ops from ftrace_ops_list.

Link: https://lkml.kernel.org/r/20220818032659.56209-1-yangjihong1@huawei.com

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-08-21 15:56:07 -04:00
Steven Rostedt (Google)
7249921d94 tracing/perf: Fix double put of trace event when init fails
If in perf_trace_event_init(), the perf_trace_event_open() fails, then it
will call perf_trace_event_unreg() which will not only unregister the perf
trace event, but will also call the put() function of the tp_event.

The problem here is that the trace_event_try_get_ref() is called by the
caller of perf_trace_event_init() and if perf_trace_event_init() returns a
failure, it will then call trace_event_put(). But since the
perf_trace_event_unreg() already called the trace_event_put() function, it
triggers a WARN_ON().

 WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46 trace_event_dyn_put_ref+0x15/0x20

If perf_trace_event_reg() does not call the trace_event_try_get_ref() then
the perf_trace_event_unreg() should not be calling trace_event_put(). This
breaks symmetry and causes bugs like these.

Pull out the trace_event_put() from perf_trace_event_unreg() and call it
in the locations that perf_trace_event_unreg() is called. This not only
fixes this bug, but also brings back the proper symmetry of the reg/unreg
vs get/put logic.

Link: https://lore.kernel.org/all/cover.1660347763.git.kjlx@templeofstupid.com/
Link: https://lkml.kernel.org/r/20220816192817.43d5e17f@gandalf.local.home

Cc: stable@vger.kernel.org
Fixes: 1d18538e6a ("tracing: Have dynamic events have a ref counter")
Reported-by: Krister Johansen <kjlx@templeofstupid.com>
Reviewed-by: Krister Johansen <kjlx@templeofstupid.com>
Tested-by: Krister Johansen <kjlx@templeofstupid.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-08-21 15:56:07 -04:00
Lukas Bulwahn
d8a64313c1 tracing: React to error return from traceprobe_parse_event_name()
The function traceprobe_parse_event_name() may set the first two function
arguments to a non-null value and still return -EINVAL to indicate an
unsuccessful completion of the function. Hence, it is not sufficient to
just check the result of the two function arguments for being not null,
but the return value also needs to be checked.

Commit 95c104c378 ("tracing: Auto generate event name when creating a
group of events") changed the error-return-value checking of the second
traceprobe_parse_event_name() invocation in __trace_eprobe_create() and
removed checking the return value to jump to the error handling case.

Reinstate using the return value in the error-return-value checking.

Link: https://lkml.kernel.org/r/20220811071734.20700-1-lukas.bulwahn@gmail.com

Fixes: 95c104c378 ("tracing: Auto generate event name when creating a group of events")
Acked-by: Linyu Yuan <quic_linyyuan@quicinc.com>
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-08-21 15:56:07 -04:00
Kuniyuki Iwashima
9c80e79906 kprobes: don't call disarm_kprobe() for disabled kprobes
The assumption in __disable_kprobe() is wrong, and it could try to disarm
an already disarmed kprobe and fire the WARN_ONCE() below. [0]  We can
easily reproduce this issue.

1. Write 0 to /sys/kernel/debug/kprobes/enabled.

  # echo 0 > /sys/kernel/debug/kprobes/enabled

2. Run execsnoop.  At this time, one kprobe is disabled.

  # /usr/share/bcc/tools/execsnoop &
  [1] 2460
  PCOMM            PID    PPID   RET ARGS

  # cat /sys/kernel/debug/kprobes/list
  ffffffff91345650  r  __x64_sys_execve+0x0    [FTRACE]
  ffffffff91345650  k  __x64_sys_execve+0x0    [DISABLED][FTRACE]

3. Write 1 to /sys/kernel/debug/kprobes/enabled, which changes
   kprobes_all_disarmed to false but does not arm the disabled kprobe.

  # echo 1 > /sys/kernel/debug/kprobes/enabled

  # cat /sys/kernel/debug/kprobes/list
  ffffffff91345650  r  __x64_sys_execve+0x0    [FTRACE]
  ffffffff91345650  k  __x64_sys_execve+0x0    [DISABLED][FTRACE]

4. Kill execsnoop, when __disable_kprobe() calls disarm_kprobe() for the
   disabled kprobe and hits the WARN_ONCE() in __disarm_kprobe_ftrace().

  # fg
  /usr/share/bcc/tools/execsnoop
  ^C

Actually, WARN_ONCE() is fired twice, and __unregister_kprobe_top() misses
some cleanups and leaves the aggregated kprobe in the hash table.  Then,
__unregister_trace_kprobe() initialises tk->rp.kp.list and creates an
infinite loop like this.

  aggregated kprobe.list -> kprobe.list -.
                                     ^    |
                                     '.__.'

In this situation, these commands fall into the infinite loop and result
in RCU stall or soft lockup.

  cat /sys/kernel/debug/kprobes/list : show_kprobe_addr() enters into the
                                       infinite loop with RCU.

  /usr/share/bcc/tools/execsnoop : warn_kprobe_rereg() holds kprobe_mutex,
                                   and __get_valid_kprobe() is stuck in
				   the loop.

To avoid the issue, make sure we don't call disarm_kprobe() for disabled
kprobes.

[0]
Failed to disarm kprobe-ftrace at __x64_sys_execve+0x0/0x40 (error -2)
WARNING: CPU: 6 PID: 2460 at kernel/kprobes.c:1130 __disarm_kprobe_ftrace.isra.19 (kernel/kprobes.c:1129)
Modules linked in: ena
CPU: 6 PID: 2460 Comm: execsnoop Not tainted 5.19.0+ #28
Hardware name: Amazon EC2 c5.2xlarge/, BIOS 1.0 10/16/2017
RIP: 0010:__disarm_kprobe_ftrace.isra.19 (kernel/kprobes.c:1129)
Code: 24 8b 02 eb c1 80 3d c4 83 f2 01 00 75 d4 48 8b 75 00 89 c2 48 c7 c7 90 fa 0f 92 89 04 24 c6 05 ab 83 01 e8 e4 94 f0 ff <0f> 0b 8b 04 24 eb b1 89 c6 48 c7 c7 60 fa 0f 92 89 04 24 e8 cc 94
RSP: 0018:ffff9e6ec154bd98 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffffffff930f7b00 RCX: 0000000000000001
RDX: 0000000080000001 RSI: ffffffff921461c5 RDI: 00000000ffffffff
RBP: ffff89c504286da8 R08: 0000000000000000 R09: c0000000fffeffff
R10: 0000000000000000 R11: ffff9e6ec154bc28 R12: ffff89c502394e40
R13: ffff89c502394c00 R14: ffff9e6ec154bc00 R15: 0000000000000000
FS:  00007fe800398740(0000) GS:ffff89c812d80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000c00057f010 CR3: 0000000103b54006 CR4: 00000000007706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
 __disable_kprobe (kernel/kprobes.c:1716)
 disable_kprobe (kernel/kprobes.c:2392)
 __disable_trace_kprobe (kernel/trace/trace_kprobe.c:340)
 disable_trace_kprobe (kernel/trace/trace_kprobe.c:429)
 perf_trace_event_unreg.isra.2 (./include/linux/tracepoint.h:93 kernel/trace/trace_event_perf.c:168)
 perf_kprobe_destroy (kernel/trace/trace_event_perf.c:295)
 _free_event (kernel/events/core.c:4971)
 perf_event_release_kernel (kernel/events/core.c:5176)
 perf_release (kernel/events/core.c:5186)
 __fput (fs/file_table.c:321)
 task_work_run (./include/linux/sched.h:2056 (discriminator 1) kernel/task_work.c:179 (discriminator 1))
 exit_to_user_mode_prepare (./include/linux/resume_user_mode.h:49 kernel/entry/common.c:169 kernel/entry/common.c:201)
 syscall_exit_to_user_mode (./arch/x86/include/asm/jump_label.h:55 ./arch/x86/include/asm/nospec-branch.h:384 ./arch/x86/include/asm/entry-common.h:94 kernel/entry/common.c:133 kernel/entry/common.c:296)
 do_syscall_64 (arch/x86/entry/common.c:87)
 entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
RIP: 0033:0x7fe7ff210654
Code: 15 79 89 20 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb be 0f 1f 00 8b 05 9a cd 20 00 48 63 ff 85 c0 75 11 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 3a f3 c3 48 83 ec 18 48 89 7c 24 08 e8 34 fc
RSP: 002b:00007ffdbd1d3538 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000008 RCX: 00007fe7ff210654
RDX: 0000000000000000 RSI: 0000000000002401 RDI: 0000000000000008
RBP: 0000000000000000 R08: 94ae31d6fda838a4 R0900007fe8001c9d30
R10: 00007ffdbd1d34b0 R11: 0000000000000246 R12: 00007ffdbd1d3600
R13: 0000000000000000 R14: fffffffffffffffc R15: 00007ffdbd1d3560
</TASK>

Link: https://lkml.kernel.org/r/20220813020509.90805-1-kuniyu@amazon.com
Fixes: 69d54b916d ("kprobes: makes kprobes/enabled works correctly for optimized kprobes.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reported-by: Ayushman Dutta <ayudutta@amazon.com>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Kuniyuki Iwashima <kuni1840@gmail.com>
Cc: Ayushman Dutta <ayudutta@amazon.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-20 15:17:46 -07:00
Randy Dunlap
a8faed3a02 kernel/sys_ni: add compat entry for fadvise64_64
When CONFIG_ADVISE_SYSCALLS is not set/enabled and CONFIG_COMPAT is
set/enabled, the riscv compat_syscall_table references
'compat_sys_fadvise64_64', which is not defined:

riscv64-linux-ld: arch/riscv/kernel/compat_syscall_table.o:(.rodata+0x6f8):
undefined reference to `compat_sys_fadvise64_64'

Add 'fadvise64_64' to kernel/sys_ni.c as a conditional COMPAT function so
that when CONFIG_ADVISE_SYSCALLS is not set, there is a fallback function
available.

Link: https://lkml.kernel.org/r/20220807220934.5689-1-rdunlap@infradead.org
Fixes: d3ac21cacc ("mm: Support compiling out madvise and fadvise")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-20 15:17:45 -07:00
Linus Torvalds
4c2d0b039c Merge tag 'net-6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from netfilter.

  Current release - regressions:

   - tcp: fix cleanup and leaks in tcp_read_skb() (the new way BPF
     socket maps get data out of the TCP stack)

   - tls: rx: react to strparser initialization errors

   - netfilter: nf_tables: fix scheduling-while-atomic splat

   - net: fix suspicious RCU usage in bpf_sk_reuseport_detach()

  Current release - new code bugs:

   - mlxsw: ptp: fix a couple of races, static checker warnings and
     error handling

  Previous releases - regressions:

   - netfilter:
      - nf_tables: fix possible module reference underflow in error path
      - make conntrack helpers deal with BIG TCP (skbs > 64kB)
      - nfnetlink: re-enable conntrack expectation events

   - net: fix potential refcount leak in ndisc_router_discovery()

  Previous releases - always broken:

   - sched: cls_route: disallow handle of 0

   - neigh: fix possible local DoS due to net iface start/stop loop

   - rtnetlink: fix module refcount leak in rtnetlink_rcv_msg

   - sched: fix adding qlen to qcpu->backlog in gnet_stats_add_queue_cpu

   - virtio_net: fix endian-ness for RSS

   - dsa: mv88e6060: prevent crash on an unused port

   - fec: fix timer capture timing in `fec_ptp_enable_pps()`

   - ocelot: stats: fix races, integer wrapping and reading incorrect
     registers (the change of register definitions here accounts for
     bulk of the changed LoC in this PR)"

* tag 'net-6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (77 commits)
  net: moxa: MAC address reading, generating, validity checking
  tcp: handle pure FIN case correctly
  tcp: refactor tcp_read_skb() a bit
  tcp: fix tcp_cleanup_rbuf() for tcp_read_skb()
  tcp: fix sock skb accounting in tcp_read_skb()
  igb: Add lock to avoid data race
  dt-bindings: Fix incorrect "the the" corrections
  net: genl: fix error path memory leak in policy dumping
  stmmac: intel: Add a missing clk_disable_unprepare() call in intel_eth_pci_remove()
  net: ethernet: mtk_eth_soc: fix possible NULL pointer dereference in mtk_xdp_run
  net/mlx5e: Allocate flow steering storage during uplink initialization
  net: mscc: ocelot: report ndo_get_stats64 from the wraparound-resistant ocelot->stats
  net: mscc: ocelot: keep ocelot_stat_layout by reg address, not offset
  net: mscc: ocelot: make struct ocelot_stat_layout array indexable
  net: mscc: ocelot: fix race between ndo_get_stats64 and ocelot_check_stats_work
  net: mscc: ocelot: turn stats_lock into a spinlock
  net: mscc: ocelot: fix address of SYS_COUNT_TX_AGING counter
  net: mscc: ocelot: fix incorrect ndo_get_stats64 packet counters
  net: dsa: felix: fix ethtool 256-511 and 512-1023 TX packet counters
  net: dsa: don't warn in dsa_port_set_state_now() when driver doesn't support it
  ...
2022-08-18 19:37:15 -07:00
Pu Lehui
7d6620f107 bpf, cgroup: Fix kernel BUG in purge_effective_progs
Syzkaller reported a triggered kernel BUG as follows:

  ------------[ cut here ]------------
  kernel BUG at kernel/bpf/cgroup.c:925!
  invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 1 PID: 194 Comm: detach Not tainted 5.19.0-14184-g69dac8e431af #8
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
  rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
  RIP: 0010:__cgroup_bpf_detach+0x1f2/0x2a0
  Code: 00 e8 92 60 30 00 84 c0 75 d8 4c 89 e0 31 f6 85 f6 74 19 42 f6 84
  28 48 05 00 00 02 75 0e 48 8b 80 c0 00 00 00 48 85 c0 75 e5 <0f> 0b 48
  8b 0c5
  RSP: 0018:ffffc9000055bdb0 EFLAGS: 00000246
  RAX: 0000000000000000 RBX: ffff888100ec0800 RCX: ffffc900000f1000
  RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff888100ec4578
  RBP: 0000000000000000 R08: ffff888100ec0800 R09: 0000000000000040
  R10: 0000000000000000 R11: 0000000000000000 R12: ffff888100ec4000
  R13: 000000000000000d R14: ffffc90000199000 R15: ffff888100effb00
  FS:  00007f68213d2b80(0000) GS:ffff88813bc80000(0000)
  knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 000055f74a0e5850 CR3: 0000000102836000 CR4: 00000000000006e0
  Call Trace:
   <TASK>
   cgroup_bpf_prog_detach+0xcc/0x100
   __sys_bpf+0x2273/0x2a00
   __x64_sys_bpf+0x17/0x20
   do_syscall_64+0x3b/0x90
   entry_SYSCALL_64_after_hwframe+0x63/0xcd
  RIP: 0033:0x7f68214dbcb9
  Code: 08 44 89 e0 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 48 89 f8 48 89
  f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
  f0 ff8
  RSP: 002b:00007ffeb487db68 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
  RAX: ffffffffffffffda RBX: 000000000000000b RCX: 00007f68214dbcb9
  RDX: 0000000000000090 RSI: 00007ffeb487db70 RDI: 0000000000000009
  RBP: 0000000000000003 R08: 0000000000000012 R09: 0000000b00000003
  R10: 00007ffeb487db70 R11: 0000000000000246 R12: 00007ffeb487dc20
  R13: 0000000000000004 R14: 0000000000000001 R15: 000055f74a1011b0
   </TASK>
  Modules linked in:
  ---[ end trace 0000000000000000 ]---

Repetition steps:

For the following cgroup tree,

  root
   |
  cg1
   |
  cg2

  1. attach prog2 to cg2, and then attach prog1 to cg1, both bpf progs
     attach type is NONE or OVERRIDE.
  2. write 1 to /proc/thread-self/fail-nth for failslab.
  3. detach prog1 for cg1, and then kernel BUG occur.

Failslab injection will cause kmalloc fail and fall back to
purge_effective_progs. The problem is that cg2 have attached another prog,
so when go through cg2 layer, iteration will add pos to 1, and subsequent
operations will be skipped by the following condition, and cg will meet
NULL in the end.

  `if (pos && !(cg->bpf.flags[atype] & BPF_F_ALLOW_MULTI))`

The NULL cg means no link or prog match, this is as expected, and it's not
a bug. So here just skip the no match situation.

Fixes: 4c46091ee9 ("bpf: Fix KASAN use-after-free Read in compute_effective_progs")
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220813134030.1972696-1-pulehui@huawei.com
2022-08-18 23:27:33 +02:00
David Howells
fc4aaf9fb3 net: Fix suspicious RCU usage in bpf_sk_reuseport_detach()
bpf_sk_reuseport_detach() calls __rcu_dereference_sk_user_data_with_flags()
to obtain the value of sk->sk_user_data, but that function is only usable
if the RCU read lock is held, and neither that function nor any of its
callers hold it.

Fix this by adding a new helper, __locked_read_sk_user_data_with_flags()
that checks to see if sk->sk_callback_lock() is held and use that here
instead.

Alternatively, making __rcu_dereference_sk_user_data_with_flags() use
rcu_dereference_checked() might suffice.

Without this, the following warning can be occasionally observed:

=============================
WARNING: suspicious RCU usage
6.0.0-rc1-build2+ #563 Not tainted
-----------------------------
include/net/sock.h:592 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
5 locks held by locktest/29873:
 #0: ffff88812734b550 (&sb->s_type->i_mutex_key#9){+.+.}-{3:3}, at: __sock_release+0x77/0x121
 #1: ffff88812f5621b0 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_close+0x1c/0x70
 #2: ffff88810312f5c8 (&h->lhash2[i].lock){+.+.}-{2:2}, at: inet_unhash+0x76/0x1c0
 #3: ffffffff83768bb8 (reuseport_lock){+...}-{2:2}, at: reuseport_detach_sock+0x18/0xdd
 #4: ffff88812f562438 (clock-AF_INET){++..}-{2:2}, at: bpf_sk_reuseport_detach+0x24/0xa4

stack backtrace:
CPU: 1 PID: 29873 Comm: locktest Not tainted 6.0.0-rc1-build2+ #563
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x4c/0x5f
 bpf_sk_reuseport_detach+0x6d/0xa4
 reuseport_detach_sock+0x75/0xdd
 inet_unhash+0xa5/0x1c0
 tcp_set_state+0x169/0x20f
 ? lockdep_sock_is_held+0x3a/0x3a
 ? __lock_release.isra.0+0x13e/0x220
 ? reacquire_held_locks+0x1bb/0x1bb
 ? hlock_class+0x31/0x96
 ? mark_lock+0x9e/0x1af
 __tcp_close+0x50/0x4b6
 tcp_close+0x28/0x70
 inet_release+0x8e/0xa7
 __sock_release+0x95/0x121
 sock_close+0x14/0x17
 __fput+0x20f/0x36a
 task_work_run+0xa3/0xcc
 exit_to_user_mode_prepare+0x9c/0x14d
 syscall_exit_to_user_mode+0x18/0x44
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

Fixes: cf8c1e9672 ("net: refactor bpf_sk_reuseport_detach()")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hawkins Jiawei <yin31149@gmail.com>
Link: https://lore.kernel.org/r/166064248071.3502205.10036394558814861778.stgit@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17 16:42:59 -07:00
YiFei Zhu
14b20b784f bpf: Restrict bpf_sys_bpf to CAP_PERFMON
The verifier cannot perform sufficient validation of any pointers passed
into bpf_attr and treats them as integers rather than pointers. The helper
will then read from arbitrary pointers passed into it. Restrict the helper
to CAP_PERFMON since the security model in BPF of arbitrary kernel read is
CAP_BPF + CAP_PERFMON.

Fixes: af2ac3e13e ("bpf: Prepare bpf syscall to be used from kernel and user space.")
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220816205517.682470-1-zhuyifei@google.com
2022-08-18 00:27:49 +02:00
Tejun Heo
4f7e723643 cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock
Bringing up a CPU may involve creating and destroying tasks which requires
read-locking threadgroup_rwsem, so threadgroup_rwsem nests inside
cpus_read_lock(). However, cpuset's ->attach(), which may be called with
thredagroup_rwsem write-locked, also wants to disable CPU hotplug and
acquires cpus_read_lock(), leading to a deadlock.

Fix it by guaranteeing that ->attach() is always called with CPU hotplug
disabled and removing cpus_read_lock() call from cpuset_attach().

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-and-tested-by: Imran Khan <imran.f.khan@oracle.com>
Reported-and-tested-by: Xuewen Yan <xuewen.yan@unisoc.com>
Fixes: 05c7b7a92c ("cgroup/cpuset: Fix a race between cpuset_attach() and cpu hotplug")
Cc: stable@vger.kernel.org # v5.17+
2022-08-17 07:36:05 -10:00
Tetsuo Handa
c0feea594e workqueue: don't skip lockdep work dependency in cancel_work_sync()
Like Hillf Danton mentioned

  syzbot should have been able to catch cancel_work_sync() in work context
  by checking lockdep_map in __flush_work() for both flush and cancel.

in [1], being unable to report an obvious deadlock scenario shown below is
broken. From locking dependency perspective, sync version of cancel request
should behave as if flush request, for it waits for completion of work if
that work has already started execution.

  ----------
  #include <linux/module.h>
  #include <linux/sched.h>
  static DEFINE_MUTEX(mutex);
  static void work_fn(struct work_struct *work)
  {
    schedule_timeout_uninterruptible(HZ / 5);
    mutex_lock(&mutex);
    mutex_unlock(&mutex);
  }
  static DECLARE_WORK(work, work_fn);
  static int __init test_init(void)
  {
    schedule_work(&work);
    schedule_timeout_uninterruptible(HZ / 10);
    mutex_lock(&mutex);
    cancel_work_sync(&work);
    mutex_unlock(&mutex);
    return -EINVAL;
  }
  module_init(test_init);
  MODULE_LICENSE("GPL");
  ----------

The check this patch restores was added by commit 0976dfc1d0
("workqueue: Catch more locking problems with flush_work()").

Then, lockdep's crossrelease feature was added by commit b09be676e0
("locking/lockdep: Implement the 'crossrelease' feature"). As a result,
this check was once removed by commit fd1a5b04df ("workqueue: Remove
now redundant lock acquisitions wrt. workqueue flushes").

But lockdep's crossrelease feature was removed by commit e966eaeeb6
("locking/lockdep: Remove the cross-release locking checks"). At this
point, this check should have been restored.

Then, commit d6e89786be ("workqueue: skip lockdep wq dependency in
cancel_work_sync()") introduced a boolean flag in order to distinguish
flush_work() and cancel_work_sync(), for checking "struct workqueue_struct"
dependency when called from cancel_work_sync() was causing false positives.

Then, commit 87915adc3f ("workqueue: re-add lockdep dependencies for
flushing") tried to restore "struct work_struct" dependency check, but by
error checked this boolean flag. Like an example shown above indicates,
"struct work_struct" dependency needs to be checked for both flush_work()
and cancel_work_sync().

Link: https://lkml.kernel.org/r/20220504044800.4966-1-hdanton@sina.com [1]
Reported-by: Hillf Danton <hdanton@sina.com>
Suggested-by: Lai Jiangshan <jiangshanlai@gmail.com>
Fixes: 87915adc3f ("workqueue: re-add lockdep dependencies for flushing")
Cc: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Tejun Heo <tj@kernel.org>
2022-08-16 06:27:35 -10:00
Hao Jia
76b079ef4c sched/psi: Remove unused parameter nbytes of psi_trigger_create()
psi_trigger_create()'s 'nbytes' parameter is not used, so we can remove it.

Signed-off-by: Hao Jia <jiahao.os@bytedance.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2022-08-15 12:35:25 -10:00
Hao Jia
2b97cf7628 sched/psi: Zero the memory of struct psi_group
After commit 5f69a6577b ("psi: dont alloc memory for psi by default"),
the memory used by struct psi_group is no longer allocated and zeroed
in cgroup_create().

Since the memory of struct psi_group is not zeroed, the data in this
memory is random, which will lead to inaccurate psi statistics when
creating a new cgroup.

So we use kzlloc() to allocate and zero the struct psi_group and
remove the redundant zeroing in group_init().

Steps to reproduce:
1. Use cgroup v2 and enable CONFIG_PSI
2. Create a new cgroup, and query psi statistics
mkdir /sys/fs/cgroup/test
cat /sys/fs/cgroup/test/cpu.pressure
some avg10=0.00 avg60=0.00 avg300=47927752200.00 total=12884901
full avg10=561815124.00 avg60=125835394188.00 avg300=1077090462000.00 total=10273561772

cat /sys/fs/cgroup/test/io.pressure
some avg10=1040093132823.95 avg60=1203770351379.21 avg300=3862252669559.46 total=4294967296
full avg10=921884564601.39 avg60=0.00 avg300=1984507298.35 total=442381631

cat /sys/fs/cgroup/test/memory.pressure
some avg10=232476085778.11 avg60=0.00 avg300=0.00 total=0
full avg10=0.00 avg60=0.00 avg300=2585658472280.57 total=12884901

Fixes: commit 5f69a6577b ("psi: dont alloc memory for psi by default")
Signed-off-by: Hao Jia <jiahao.os@bytedance.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2022-08-15 12:35:13 -10:00
David Gow
41a55567b9 module: kunit: Load .kunit_test_suites section when CONFIG_KUNIT=m
The new KUnit module handling has KUnit test suites listed in a
.kunit_test_suites section of each module. This should be loaded when
the module is, but at the moment this only happens if KUnit is built-in.

Also load this when KUnit is enabled as a module: it'll not be usable
unless KUnit is loaded, but such modules are likely to depend on KUnit
anyway, so it's unlikely to ever be loaded needlessly.

Fixes: 3d6e446238 ("kunit: unify module and builtin suite definitions")
Signed-off-by: David Gow <davidgow@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-08-15 13:51:07 -06:00
Linus Torvalds
5d6a0f4da9 Merge tag 'for-linus-6.0-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull more xen updates from Juergen Gross:

 - fix the handling of the "persistent grants" feature negotiation
   between Xen blkfront and Xen blkback drivers

 - a cleanup of xen.config and adding xen.config to Xen section in
   MAINTAINERS

 - support HVMOP_set_evtchn_upcall_vector, which is more compliant to
   "normal" interrupt handling than the global callback used up to now

 - further small cleanups

* tag 'for-linus-6.0-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  MAINTAINERS: add xen config fragments to XEN HYPERVISOR sections
  xen: remove XEN_SCRUB_PAGES in xen.config
  xen/pciback: Fix comment typo
  xen/xenbus: fix return type in xenbus_file_read()
  xen-blkfront: Apply 'feature_persistent' parameter when connect
  xen-blkback: Apply 'feature_persistent' parameter when connect
  xen-blkback: fix persistent grants negotiation
  x86/xen: Add support for HVMOP_set_evtchn_upcall_vector
2022-08-14 09:28:54 -07:00
Linus Torvalds
f6eb0fed6a Merge tag 'timers-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
 "Misc timer fixes:

   - fix a potential use-after-free bug in posix timers

   - correct a prototype

   - address a build warning"

* tag 'timers-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  posix-cpu-timers: Cleanup CPU timers before freeing them during exec
  time: Correct the prototype of ns_to_kernel_old_timeval and ns_to_timespec64
  posix-timers: Make do_clock_gettime() static
2022-08-13 14:38:22 -07:00
Linus Torvalds
1da8cf961b Merge tag 'io_uring-6.0-2022-08-13' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:

 - Regression fix for this merge window, fixing a wrong order of
   arguments for io_req_set_res() for passthru (Dylan)

 - Fix for the audit code leaking context memory (Peilin)

 - Ensure that provided buffers are memcg accounted (Pavel)

 - Correctly handle short zero-copy sends (Pavel)

 - Sparse warning fixes for the recvmsg multishot command (Dylan)

 - Error handling fix for passthru (Anuj)

 - Remove randomization of struct kiocb fields, to avoid it growing in
   size if re-arranged in such a fashion that it grows more holes or
   padding (Keith, Linus)

 - Small series improving type safety of the sqe fields (Stefan)

* tag 'io_uring-6.0-2022-08-13' of git://git.kernel.dk/linux-block:
  io_uring: add missing BUILD_BUG_ON() checks for new io_uring_sqe fields
  io_uring: make io_kiocb_to_cmd() typesafe
  fs: don't randomize struct kiocb fields
  io_uring: consistently make use of io_notif_to_data()
  io_uring: fix error handling for io_uring_cmd
  io_uring: fix io_recvmsg_prep_multishot sparse warnings
  io_uring/net: send retry for zerocopy
  io_uring: mem-account pbuf buckets
  audit, io_uring, io-wq: Fix memory leak in io_sq_thread() and io_wqe_worker()
  io_uring: pass correct parameters to io_req_set_res
2022-08-13 13:28:54 -07:00
Lukas Bulwahn
aa6d1e5b50 xen: remove XEN_SCRUB_PAGES in xen.config
Commit 197ecb3802 ("xen/balloon: add runtime control for scrubbing
ballooned out pages") changed config XEN_SCRUB_PAGES to config
XEN_SCRUB_PAGES_DEFAULT. As xen.config sets 'XEN_BALLOON=y' and
XEN_SCRUB_PAGES_DEFAULT defaults to yes, there is no further need to set
this config in the xen.config file.

Remove setting XEN_SCRUB_PAGES in xen.config, which is without
effect since the commit above anyway.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20220810050712.9539-3-lukas.bulwahn@gmail.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2022-08-12 12:22:23 +02:00
Linus Torvalds
7ebfc85e2c Merge tag 'net-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from bluetooth, bpf, can and netfilter.

  A little larger than usual but it's all fixes, no late features. It's
  large partially because of timing, and partially because of follow ups
  to stuff that got merged a week or so before the merge window and
  wasn't as widely tested. Maybe the Bluetooth fixes are a little
  alarming so we'll address that, but the rest seems okay and not scary.

  Notably we're including a fix for the netfilter Kconfig [1], your WiFi
  warning [2] and a bluetooth fix which should unblock syzbot [3].

  Current release - regressions:

   - Bluetooth:
      - don't try to cancel uninitialized works [3]
      - L2CAP: fix use-after-free caused by l2cap_chan_put

   - tls: rx: fix device offload after recent rework

   - devlink: fix UAF on failed reload and leftover locks in mlxsw

  Current release - new code bugs:

   - netfilter:
      - flowtable: fix incorrect Kconfig dependencies [1]
      - nf_tables: fix crash when nf_trace is enabled

   - bpf:
      - use proper target btf when exporting attach_btf_obj_id
      - arm64: fixes for bpf trampoline support

   - Bluetooth:
      - ISO: unlock on error path in iso_sock_setsockopt()
      - ISO: fix info leak in iso_sock_getsockopt()
      - ISO: fix iso_sock_getsockopt for BT_DEFER_SETUP
      - ISO: fix memory corruption on iso_pinfo.base
      - ISO: fix not using the correct QoS
      - hci_conn: fix updating ISO QoS PHY

   - phy: dp83867: fix get nvmem cell fail

  Previous releases - regressions:

   - wifi: cfg80211: fix validating BSS pointers in
     __cfg80211_connect_result [2]

   - atm: bring back zatm uAPI after ATM had been removed

   - properly fix old bug making bonding ARP monitor mode not being able
     to work with software devices with lockless Tx

   - tap: fix null-deref on skb->dev in dev_parse_header_protocol

   - revert "net: usb: ax88179_178a needs FLAG_SEND_ZLP" it helps some
     devices and breaks others

   - netfilter:
      - nf_tables: many fixes rejecting cross-object linking which may
        lead to UAFs
      - nf_tables: fix null deref due to zeroed list head
      - nf_tables: validate variable length element extension

   - bgmac: fix a BUG triggered by wrong bytes_compl

   - bcmgenet: indicate MAC is in charge of PHY PM

  Previous releases - always broken:

   - bpf:
      - fix bad pointer deref in bpf_sys_bpf() injected via test infra
      - disallow non-builtin bpf programs calling the prog_run command
      - don't reinit map value in prealloc_lru_pop
      - fix UAFs during the read of map iterator fd
      - fix invalidity check for values in sk local storage map
      - reject sleepable program for non-resched map iterator

   - mptcp:
      - move subflow cleanup in mptcp_destroy_common()
      - do not queue data on closed subflows

   - virtio_net: fix memory leak inside XDP_TX with mergeable

   - vsock: fix memory leak when multiple threads try to connect()

   - rework sk_user_data sharing to prevent psock leaks

   - geneve: fix TOS inheriting for ipv4

   - tunnels & drivers: do not use RT_TOS for IPv6 flowlabel

   - phy: c45 baset1: do not skip aneg configuration if clock role is
     not specified

   - rose: avoid overflow when /proc displays timer information

   - x25: fix call timeouts in blocking connects

   - can: mcp251x: fix race condition on receive interrupt

   - can: j1939:
      - replace user-reachable WARN_ON_ONCE() with netdev_warn_once()
      - fix memory leak of skbs in j1939_session_destroy()

  Misc:

   - docs: bpf: clarify that many things are not uAPI

   - seg6: initialize induction variable to first valid array index (to
     silence clang vs objtool warning)

   - can: ems_usb: fix clang 14's -Wunaligned-access warning"

* tag 'net-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (117 commits)
  net: atm: bring back zatm uAPI
  dpaa2-eth: trace the allocated address instead of page struct
  net: add missing kdoc for struct genl_multicast_group::flags
  nfp: fix use-after-free in area_cache_get()
  MAINTAINERS: use my korg address for mt7601u
  mlxsw: minimal: Fix deadlock in ports creation
  bonding: fix reference count leak in balance-alb mode
  net: usb: qmi_wwan: Add support for Cinterion MV32
  bpf: Shut up kern_sys_bpf warning.
  net/tls: Use RCU API to access tls_ctx->netdev
  tls: rx: device: don't try to copy too much on detach
  tls: rx: device: bound the frag walk
  net_sched: cls_route: remove from list when handle is 0
  selftests: forwarding: Fix failing tests with old libnet
  net: refactor bpf_sk_reuseport_detach()
  net: fix refcount bug in sk_psock_get (2)
  selftests/bpf: Ensure sleepable program is rejected by hash map iter
  selftests/bpf: Add write tests for sk local storage map iterator
  selftests/bpf: Add tests for reading a dangling map iter fd
  bpf: Only allow sleepable program for resched-able iterator
  ...
2022-08-11 13:45:37 -07:00
Alexei Starovoitov
4e4588f1c4 bpf: Shut up kern_sys_bpf warning.
Shut up this warning:
kernel/bpf/syscall.c:5089:5: warning: no previous prototype for function 'kern_sys_bpf' [-Wmissing-prototypes]
int kern_sys_bpf(int cmd, union bpf_attr *attr, unsigned int size)

Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-10 23:58:13 -07:00
Jakub Kicinski
fbe8870f72 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
bpf 2022-08-10

We've added 23 non-merge commits during the last 7 day(s) which contain
a total of 19 files changed, 424 insertions(+), 35 deletions(-).

The main changes are:

1) Several fixes for BPF map iterator such as UAFs along with selftests, from Hou Tao.

2) Fix BPF syscall program's {copy,strncpy}_from_bpfptr() to not fault, from Jinghao Jia.

3) Reject BPF syscall programs calling BPF_PROG_RUN, from Alexei Starovoitov and YiFei Zhu.

4) Fix attach_btf_obj_id info to pick proper target BTF, from Stanislav Fomichev.

5) BPF design Q/A doc update to clarify what is not stable ABI, from Paul E. McKenney.

6) Fix BPF map's prealloc_lru_pop to not reinitialize, from Kumar Kartikeya Dwivedi.

7) Fix bpf_trampoline_put to avoid leaking ftrace hash, from Jiri Olsa.

8) Fix arm64 JIT to address sparse errors around BPF trampoline, from Xu Kuohai.

9) Fix arm64 JIT to use kvcalloc instead of kcalloc for internal program address
   offset buffer, from Aijun Sun.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (23 commits)
  selftests/bpf: Ensure sleepable program is rejected by hash map iter
  selftests/bpf: Add write tests for sk local storage map iterator
  selftests/bpf: Add tests for reading a dangling map iter fd
  bpf: Only allow sleepable program for resched-able iterator
  bpf: Check the validity of max_rdwr_access for sock local storage map iterator
  bpf: Acquire map uref in .init_seq_private for sock{map,hash} iterator
  bpf: Acquire map uref in .init_seq_private for sock local storage map iterator
  bpf: Acquire map uref in .init_seq_private for hash map iterator
  bpf: Acquire map uref in .init_seq_private for array map iterator
  bpf: Disallow bpf programs call prog_run command.
  bpf, arm64: Fix bpf trampoline instruction endianness
  selftests/bpf: Add test for prealloc_lru_pop bug
  bpf: Don't reinit map value in prealloc_lru_pop
  bpf: Allow calling bpf_prog_test kfuncs in tracing programs
  bpf, arm64: Allocate program buffer using kvcalloc instead of kcalloc
  selftests/bpf: Excercise bpf_obj_get_info_by_fd for bpf2bpf
  bpf: Use proper target btf when exporting attach_btf_obj_id
  mptcp, btf: Add struct mptcp_sock definition when CONFIG_MPTCP is disabled
  bpf: Cleanup ftrace hash in bpf_trampoline_put
  BPF: Fix potential bad pointer dereference in bpf_sys_bpf()
  ...
====================

Link: https://lore.kernel.org/r/20220810190624.10748-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-10 21:48:15 -07:00
Hawkins Jiawei
cf8c1e9672 net: refactor bpf_sk_reuseport_detach()
Refactor sk_user_data dereference using more generic function
__rcu_dereference_sk_user_data_with_flags(), which improve its
maintainability

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-10 21:48:04 -07:00
Linus Torvalds
c235698355 Merge tag 'cxl-for-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull cxl updates from Dan Williams:
 "Compute Express Link (CXL) updates for 6.0:

   - Introduce a 'struct cxl_region' object with support for
     provisioning and assembling persistent memory regions.

   - Introduce alloc_free_mem_region() to accompany the existing
     request_free_mem_region() as a method to allocate physical memory
     capacity out of an existing resource.

   - Export insert_resource_expand_to_fit() for the CXL subsystem to
     late-publish CXL platform windows in iomem_resource.

   - Add a polled mode PCI DOE (Data Object Exchange) driver service and
     use it in cxl_pci to retrieve the CDAT (Coherent Device Attribute
     Table)"

* tag 'cxl-for-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (74 commits)
  cxl/hdm: Fix skip allocations vs multiple pmem allocations
  cxl/region: Disallow region granularity != window granularity
  cxl/region: Fix x1 interleave to greater than x1 interleave routing
  cxl/region: Move HPA setup to cxl_region_attach()
  cxl/region: Fix decoder interleave programming
  Documentation: cxl: remove dangling kernel-doc reference
  cxl/region: describe targets and nr_targets members of cxl_region_params
  cxl/regions: add padding for cxl_rr_ep_add nested lists
  cxl/region: Fix IS_ERR() vs NULL check
  cxl/region: Fix region reference target accounting
  cxl/region: Fix region commit uninitialized variable warning
  cxl/region: Fix port setup uninitialized variable warnings
  cxl/region: Stop initializing interleave granularity
  cxl/hdm: Fix DPA reservation vs cxl_endpoint_decoder lifetime
  cxl/acpi: Minimize granularity for x1 interleaves
  cxl/region: Delete 'region' attribute from root decoders
  cxl/acpi: Autoload driver for 'cxl_acpi' test devices
  cxl/region: decrement ->nr_targets on error in cxl_region_attach()
  cxl/region: prevent underflow in ways_to_cxl()
  cxl/region: uninitialized variable in alloc_hpa()
  ...
2022-08-10 11:07:26 -07:00
Hou Tao
d247049f4f bpf: Only allow sleepable program for resched-able iterator
When a sleepable program is attached to a hash map iterator, might_fault()
will report "BUG: sleeping function called from invalid context..." if
CONFIG_DEBUG_ATOMIC_SLEEP is enabled. The reason is that rcu_read_lock()
is held in bpf_hash_map_seq_next() and won't be released until all elements
are traversed or bpf_hash_map_seq_stop() is called.

Fixing it by reusing BPF_ITER_RESCHED to indicate that only non-sleepable
program is allowed for iterator without BPF_ITER_RESCHED. We can revise
bpf_iter_link_attach() later if there are other conditions which may
cause rcu_read_lock() or spin_lock() issues.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220810080538.1845898-7-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-10 10:12:48 -07:00
Hou Tao
ef1e93d2ee bpf: Acquire map uref in .init_seq_private for hash map iterator
bpf_iter_attach_map() acquires a map uref, and the uref may be released
before or in the middle of iterating map elements. For example, the uref
could be released in bpf_iter_detach_map() as part of
bpf_link_release(), or could be released in bpf_map_put_with_uref() as
part of bpf_map_release().

So acquiring an extra map uref in bpf_iter_init_hash_map() and
releasing it in bpf_iter_fini_hash_map().

Fixes: d6c4503cc2 ("bpf: Implement bpf iterator for hash maps")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220810080538.1845898-3-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-10 10:12:47 -07:00
Hou Tao
f76fa6b338 bpf: Acquire map uref in .init_seq_private for array map iterator
bpf_iter_attach_map() acquires a map uref, and the uref may be released
before or in the middle of iterating map elements. For example, the uref
could be released in bpf_iter_detach_map() as part of
bpf_link_release(), or could be released in bpf_map_put_with_uref() as
part of bpf_map_release().

Alternative fix is acquiring an extra bpf_link reference just like
a pinned map iterator does, but it introduces unnecessary dependency
on bpf_link instead of bpf_map.

So choose another fix: acquiring an extra map uref in .init_seq_private
for array map iterator.

Fixes: d3cc2ab546 ("bpf: Implement bpf iterator for array maps")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220810080538.1845898-2-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-10 10:12:47 -07:00
Alexei Starovoitov
86f44fcec2 bpf: Disallow bpf programs call prog_run command.
The verifier cannot perform sufficient validation of bpf_attr->test.ctx_in
pointer, therefore bpf programs should not be allowed to call BPF_PROG_RUN
command from within the program.
To fix this issue split bpf_sys_bpf() bpf helper into normal kern_sys_bpf()
kernel function that can only be used by the kernel light skeleton directly.

Reported-by: YiFei Zhu <zhuyifei@google.com>
Fixes: b1d18a7574 ("bpf: Extend sys_bpf commands for bpf_syscall programs.")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-10 09:43:07 -07:00
Kumar Kartikeya Dwivedi
275c30bcee bpf: Don't reinit map value in prealloc_lru_pop
The LRU map that is preallocated may have its elements reused while
another program holds a pointer to it from bpf_map_lookup_elem. Hence,
only check_and_free_fields is appropriate when the element is being
deleted, as it ensures proper synchronization against concurrent access
of the map value. After that, we cannot call check_and_init_map_value
again as it may rewrite bpf_spin_lock, bpf_timer, and kptr fields while
they can be concurrently accessed from a BPF program.

This is safe to do as when the map entry is deleted, concurrent access
is protected against by check_and_free_fields, i.e. an existing timer
would be freed, and any existing kptr will be released by it. The
program can create further timers and kptrs after check_and_free_fields,
but they will eventually be released once the preallocated items are
freed on map destruction, even if the item is never reused again. Hence,
the deleted item sitting in the free list can still have resources
attached to it, and they would never leak.

With spin_lock, we never touch the field at all on delete or update, as
we may end up modifying the state of the lock. Since the verifier
ensures that a bpf_spin_lock call is always paired with bpf_spin_unlock
call, the program will eventually release the lock so that on reuse the
new user of the value can take the lock.

Essentially, for the preallocated case, we must assume that the map
value may always be in use by the program, even when it is sitting in
the freelist, and handle things accordingly, i.e. use proper
synchronization inside check_and_free_fields, and never reinitialize the
special fields when it is reused on update.

Fixes: 68134668c1 ("bpf: Add map side support for bpf timers.")
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220809213033.24147-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-09 18:46:11 -07:00
Youngmin Nam
46dae32fe6 time: Correct the prototype of ns_to_kernel_old_timeval and ns_to_timespec64
In ns_to_kernel_old_timeval() definition, the function argument is defined
with const identifier in kernel/time/time.c, but the prototype in
include/linux/time32.h looks different.

- The function is defined in kernel/time/time.c as below:
  struct __kernel_old_timeval ns_to_kernel_old_timeval(const s64 nsec)

- The function is decalared in include/linux/time32.h as below:
  extern struct __kernel_old_timeval ns_to_kernel_old_timeval(s64 nsec);

Because the variable of arithmethic types isn't modified in the calling scope,
there's no need to mark arguments as const, which was already mentioned during 
review (Link[1) of the original patch.

Likewise remove the "const" keyword in both definition and declaration of
ns_to_timespec64() as requested by Arnd (Link[2]).

Fixes: a84d116916 ("y2038: Introduce struct __kernel_old_timeval")
Signed-off-by: Youngmin Nam <youngmin.nam@samsung.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/all/20220712094715.2918823-1-youngmin.nam@samsung.com
Link[1]: https://lore.kernel.org/all/20180310081123.thin6wphgk7tongy@gmail.com/
Link[2]: https://lore.kernel.org/all/CAK8P3a3nknJgEDESGdJH91jMj6R_xydFqWASd8r5BbesdvMBgA@mail.gmail.com/
2022-08-09 20:02:13 +02:00
Linus Torvalds
5d5d353bed Merge tag 'rproc-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux
Pull remoteproc updates from Bjorn Andersson:
 "This introduces support for the remoteproc on Mediatek MT8188, and
  enables caches for MT8186 SCP. It adds support for PRU cores found on
  the TI K3 AM62x SoCs.

  It moves the recovery work after a firmware crash to an unbound
  workqueue, to allow recovery to happen in parallel.

  A new DMA API is introduced to release dma_mem for a device.

  It adds support a panic handler for the Qualcomm modem remoteproc,
  with the goal of having caches flushed in memory dumps for post-mortem
  debugging and it introduces a mechanism to wait for the modem firmware
  on SM8450 to decrypt part of its memory for post-mortem debugging.

  Qualcomm sysmon is restricted to only inform remote processors about
  peers that are actually running, to avoid a race where Linux tries to
  notify a recovering remote processor about its peers new state. A
  mechanism for waiting for the sysmon connection to be established is
  also introduced, to avoid out-of-sync updates for rapidly restarting
  remote processors.

  A number of Devicetree binding cleanups and conversions to YAML are
  introduced, to facilitate Devicetree validation. Lastly it introduces
  a number of smaller fixes and cleanups in the core and a few different
  drivers"

* tag 'rproc-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux: (42 commits)
  remoteproc: qcom_q6v5_pas: Do not fail if regulators are not found
  drivers/remoteproc: fix repeated words in comments
  remoteproc: Directly use ida_alloc()/free()
  remoteproc: Use unbounded workqueue for recovery work
  remoteproc: using pm_runtime_resume_and_get instead of pm_runtime_get_sync
  remoteproc: qcom_q6v5_pas: Deal silently with optional px and cx regulators
  remoteproc: sysmon: Send sysmon state only for running rprocs
  remoteproc: sysmon: Wait for SSCTL service to come up
  remoteproc: qcom: q6v5: Set q6 state to offline on receiving wdog irq
  remoteproc: qcom: pas: Check if coredump is enabled
  remoteproc: qcom: pas: Mark devices as wakeup capable
  remoteproc: qcom: pas: Mark va as io memory
  remoteproc: qcom: pas: Add decrypt shutdown support for modem
  remoteproc: qcom: q6v5-mss: add powerdomains to MSM8996 config
  remoteproc: qcom_q6v5: Introduce panic handler for MSS
  remoteproc: qcom_q6v5_mss: Update MBA log info
  remoteproc: qcom: correct kerneldoc
  remoteproc: qcom_q6v5_mss: map/unmap metadata region before/after use
  remoteproc: qcom: using pm_runtime_resume_and_get to simplify the code
  remoteproc: mediatek: Support MT8188 SCP
  ...
2022-08-08 15:16:29 -07:00
Linus Torvalds
d5af75f77c Merge tag 'sysctl-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull sysctl updates from Luis Chamberlain:
 "There isn't much for 6.0 for sysctl stuff, most of the stuff went
  through the networking subsystem (Kuniyuki Iwashima's trove of fixes
  using READ_ONCE/WRITE_ONCE helpers) as most of the issues there have
  been identified on networking side. So it is good we don't have much
  updates as we would have ended up with tons of conflicts. I rebased my
  delta just now to your tree so to avoid conflicts with that stuff.
  This merge request is just minor fluff cleanups then. Perhaps for 6.1
  kernel/sysctl.c will get more love than this release"

* tag 'sysctl-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
  kernel/sysctl.c: Remove trailing white space
  kernel/sysctl.c: Clean up indentation, replace spaces with tab.
  sysctl: Merge adjacent CONFIG_TREE_RCU blocks
2022-08-08 14:17:46 -07:00
Linus Torvalds
e74acdf55d Merge tag 'modules-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull module updates from Luis Chamberlain:
 "For the 6.0 merge window the modules code shifts to cleanup and minor
  fixes effort. This becomes much easier to do and review now due to the
  code split to its own directory from effort on the last kernel
  release. I expect to see more of this with time and as we expand on
  test coverage in the future. The cleanups and fixes come from usual
  suspects such as Christophe Leroy and Aaron Tomlin but there are also
  some other contributors.

  One particular minor fix worth mentioning is from Helge Deller, where
  he spotted a *forever* incorrect natural alignment on both ELF section
  header tables:

    * .altinstructions
    * __bug_table sections

  A lot of back and forth went on in trying to determine the ill effects
  of this misalignment being present for years and it has been
  determined there should be no real ill effects unless you have a buggy
  exception handler. Helge actually hit one of these buggy exception
  handlers on parisc which is how he ended up spotting this issue. When
  implemented correctly these paths with incorrect misalignment would
  just mean a performance penalty, but given that we are dealing with
  alternatives on modules and with the __bug_table (where info regardign
  BUG()/WARN() file/line information associated with it is stored) this
  really shouldn't be a big deal.

  The only other change with mentioning is the kmap() with
  kmap_local_page() and my only concern with that was on what is done
  after preemption, but the virtual addresses are restored after
  preemption. This is only used on module decompression.

  This all has sit on linux-next for a while except the kmap stuff which
  has been there for 3 weeks"

* tag 'modules-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
  module: Replace kmap() with kmap_local_page()
  module: Show the last unloaded module's taint flag(s)
  module: Use strscpy() for last_unloaded_module
  module: Modify module_flags() to accept show_state argument
  module: Move module's Kconfig items in kernel/module/
  MAINTAINERS: Update file list for module maintainers
  module: Use vzalloc() instead of vmalloc()/memset(0)
  modules: Ensure natural alignment for .altinstructions and __bug_table sections
  module: Increase readability of module_kallsyms_lookup_name()
  module: Fix ERRORs reported by checkpatch.pl
  module: Add support for default value for module async_probe
2022-08-08 14:12:19 -07:00
Fanjun Kong
374a723c74 kernel/sysctl.c: Remove trailing white space
This patch removes the trailing white space in kernel/sysysctl.c

Signed-off-by: Fanjun Kong <bh1scw@gmail.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
[mcgrof: fix commit message subject]
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2022-08-08 09:01:36 -07:00
Fanjun Kong
5bfd5d3e2e kernel/sysctl.c: Clean up indentation, replace spaces with tab.
This patch fixes two coding style issues:
1. Clean up indentation, replace spaces with tab
2. Add space after ','

Signed-off-by: Fanjun Kong <bh1scw@gmail.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2022-08-08 09:01:36 -07:00
Geert Uytterhoeven
7251ceb51a sysctl: Merge adjacent CONFIG_TREE_RCU blocks
There are two adjacent sysctl entries protected by the same
CONFIG_TREE_RCU config symbol.  Merge them into a single block to
improve readability.

Use the more common "#ifdef" form while at it.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2022-08-08 09:01:36 -07:00
Stanislav Fomichev
6644aabbd8 bpf: Use proper target btf when exporting attach_btf_obj_id
When attaching to program, the program itself might not be attached
to anything (and, hence, might not have attach_btf), so we can't
unconditionally use 'prog->aux->dst_prog->aux->attach_btf'.

Instead, use bpf_prog_get_target_btf to pick proper target BTF:

  * when attached to dst_prog, use dst_prog->aux->btf
  * when attached to kernel btf, use prog->aux->attach_btf

Fixes: b79c9fc955 ("bpf: implement BPF_PROG_QUERY for BPF_LSM_CGROUP")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hao Luo <haoluo@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220804201140.1340684-1-sdf@google.com
2022-08-08 15:53:17 +02:00
Linus Torvalds
eb5699ba31 Merge tag 'mm-nonmm-stable-2022-08-06-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc updates from Andrew Morton:
 "Updates to various subsystems which I help look after. lib, ocfs2,
  fatfs, autofs, squashfs, procfs, etc. A relatively small amount of
  material this time"

* tag 'mm-nonmm-stable-2022-08-06-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (72 commits)
  scripts/gdb: ensure the absolute path is generated on initial source
  MAINTAINERS: kunit: add David Gow as a maintainer of KUnit
  mailmap: add linux.dev alias for Brendan Higgins
  mailmap: update Kirill's email
  profile: setup_profiling_timer() is moslty not implemented
  ocfs2: fix a typo in a comment
  ocfs2: use the bitmap API to simplify code
  ocfs2: remove some useless functions
  lib/mpi: fix typo 'the the' in comment
  proc: add some (hopefully) insightful comments
  bdi: remove enum wb_congested_state
  kernel/hung_task: fix address space of proc_dohung_task_timeout_secs
  lib/lzo/lzo1x_compress.c: replace ternary operator with min() and min_t()
  squashfs: support reading fragments in readahead call
  squashfs: implement readahead
  squashfs: always build "file direct" version of page actor
  Revert "squashfs: provide backing_dev_info in order to disable read-ahead"
  fs/ocfs2: Fix spelling typo in comment
  ia64: old_rr4 added under CONFIG_HUGETLB_PAGE
  proc: fix test for "vsyscall=xonly" boot option
  ...
2022-08-07 10:03:24 -07:00
Linus Torvalds
cac03ac368 Merge tag 'sched-urgent-2022-08-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Various fixes: a deadline scheduler fix, a migration fix, a Sparse fix
  and a comment fix"

* tag 'sched-urgent-2022-08-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/core: Do not requeue task on CPU excluded from cpus_mask
  sched/rt: Fix Sparse warnings due to undefined rt.c declarations
  exit: Fix typo in comment: s/sub-theads/sub-threads
  sched, cpuset: Fix dl_cpu_busy() panic due to empty cs->cpus_allowed
2022-08-06 17:34:06 -07:00
Linus Torvalds
592d8362bc Merge tag 'perf-urgent-2022-08-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Misc fixes to kprobes and the faddr2line script, plus a cleanup"

* tag 'perf-urgent-2022-08-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Fix ';;' typo
  scripts/faddr2line: Add CONFIG_DEBUG_INFO check
  scripts/faddr2line: Fix vmlinux detection on arm64
  x86/kprobes: Update kcb status flag after singlestepping
  kprobes: Forbid probing on trampoline and BPF code areas
2022-08-06 17:28:12 -07:00
Linus Torvalds
cae4199f93 Merge tag 'powerpc-6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:

 - Add support for syscall stack randomization

 - Add support for atomic operations to the 32 & 64-bit BPF JIT

 - Full support for KASAN on 64-bit Book3E

 - Add a watchdog driver for the new PowerVM hypervisor watchdog

 - Add a number of new selftests for the Power10 PMU support

 - Add a driver for the PowerVM Platform KeyStore

 - Increase the NMI watchdog timeout during live partition migration, to
   avoid timeouts due to increased memory access latency

 - Add support for using the 'linux,pci-domain' device tree property for
   PCI domain assignment

 - Many other small features and fixes

Thanks to Alexey Kardashevskiy, Andy Shevchenko, Arnd Bergmann, Athira
Rajeev, Bagas Sanjaya, Christophe Leroy, Erhard Furtner, Fabiano Rosas,
Greg Kroah-Hartman, Greg Kurz, Haowen Bai, Hari Bathini, Jason A.
Donenfeld, Jason Wang, Jiang Jian, Joel Stanley, Juerg Haefliger, Kajol
Jain, Kees Cook, Laurent Dufour, Madhavan Srinivasan, Masahiro Yamada,
Maxime Bizon, Miaoqian Lin, Murilo Opsfelder Araújo, Nathan Lynch,
Naveen N.  Rao, Nayna Jain, Nicholas Piggin, Ning Qiang, Pali Rohár,
Petr Mladek, Rashmica Gupta, Sachin Sant, Scott Cheloha, Segher
Boessenkool, Stephen Rothwell, Uwe Kleine-König, Wolfram Sang, Xiu
Jianfeng, and Zhouyi Zhou.

* tag 'powerpc-6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (191 commits)
  powerpc/64e: Fix kexec build error
  EDAC/ppc_4xx: Include required of_irq header directly
  powerpc/pci: Fix PHB numbering when using opal-phbid
  powerpc/64: Init jump labels before parse_early_param()
  selftests/powerpc: Avoid GCC 12 uninitialised variable warning
  powerpc/cell/axon_msi: Fix refcount leak in setup_msi_msg_address
  powerpc/xive: Fix refcount leak in xive_get_max_prio
  powerpc/spufs: Fix refcount leak in spufs_init_isolated_loader
  powerpc/perf: Include caps feature for power10 DD1 version
  powerpc: add support for syscall stack randomization
  powerpc: Move system_call_exception() to syscall.c
  powerpc/powernv: rename remaining rng powernv_ functions to pnv_
  powerpc/powernv/kvm: Use darn for H_RANDOM on Power9
  powerpc/powernv: Avoid crashing if rng is NULL
  selftests/powerpc: Fix matrix multiply assist test
  powerpc/signal: Update comment for clarity
  powerpc: make facility_unavailable_exception 64s
  powerpc/platforms/83xx/suspend: Remove write-only global variable
  powerpc/platforms/83xx/suspend: Prevent unloading the driver
  powerpc/platforms/83xx/suspend: Reorder to get rid of a forward declaration
  ...
2022-08-06 16:38:17 -07:00
Linus Torvalds
c993e07be0 Merge tag 'dma-mapping-5.20-2022-08-06' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:

 - convert arm32 to the common dma-direct code (Arnd Bergmann, Robin
   Murphy, Christoph Hellwig)

 - restructure the PCIe peer to peer mapping support (Logan Gunthorpe)

 - allow the IOMMU code to communicate an optional DMA mapping length
   and use that in scsi and libata (John Garry)

 - split the global swiotlb lock (Tianyu Lan)

 - various fixes and cleanup (Chao Gao, Dan Carpenter, Dongli Zhang,
   Lukas Bulwahn, Robin Murphy)

* tag 'dma-mapping-5.20-2022-08-06' of git://git.infradead.org/users/hch/dma-mapping: (45 commits)
  swiotlb: fix passing local variable to debugfs_create_ulong()
  dma-mapping: reformat comment to suppress htmldoc warning
  PCI/P2PDMA: Remove pci_p2pdma_[un]map_sg()
  RDMA/rw: drop pci_p2pdma_[un]map_sg()
  RDMA/core: introduce ib_dma_pci_p2p_dma_supported()
  nvme-pci: convert to using dma_map_sgtable()
  nvme-pci: check DMA ops when indicating support for PCI P2PDMA
  iommu/dma: support PCI P2PDMA pages in dma-iommu map_sg
  iommu: Explicitly skip bus address marked segments in __iommu_map_sg()
  dma-mapping: add flags to dma_map_ops to indicate PCI P2PDMA support
  dma-direct: support PCI P2PDMA pages in dma-direct map_sg
  dma-mapping: allow EREMOTEIO return code for P2PDMA transfers
  PCI/P2PDMA: Introduce helpers for dma_map_sg implementations
  PCI/P2PDMA: Attempt to set map_type if it has not been set
  lib/scatterlist: add flag for indicating P2PDMA segments in an SGL
  swiotlb: clean up some coding style and minor issues
  dma-mapping: update comment after dmabounce removal
  scsi: sd: Add a comment about limiting max_sectors to shost optimal limit
  ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors
  scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit
  ...
2022-08-06 10:56:45 -07:00
Jiri Slaby
221f9d9cdf posix-timers: Make do_clock_gettime() static
do_clock_gettime() is used only in posix-stubs.c, so make it static. It avoids
a compiler warning too:
time/posix-stubs.c:73:5: warning: no previous prototype for ‘do_clock_gettime’ [-Wmissing-prototypes]

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220719085620.30567-1-jslaby@suse.cz
2022-08-06 10:33:54 +02:00
Linus Torvalds
6614a3c316 Merge tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
 "Most of the MM queue. A few things are still pending.

  Liam's maple tree rework didn't make it. This has resulted in a few
  other minor patch series being held over for next time.

  Multi-gen LRU still isn't merged as we were waiting for mapletree to
  stabilize. The current plan is to merge MGLRU into -mm soon and to
  later reintroduce mapletree, with a view to hopefully getting both
  into 6.1-rc1.

  Summary:

   - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe
     Lin, Yang Shi, Anshuman Khandual and Mike Rapoport

   - Some kmemleak fixes from Patrick Wang and Waiman Long

   - DAMON updates from SeongJae Park

   - memcg debug/visibility work from Roman Gushchin

   - vmalloc speedup from Uladzislau Rezki

   - more folio conversion work from Matthew Wilcox

   - enhancements for coherent device memory mapping from Alex Sierra

   - addition of shared pages tracking and CoW support for fsdax, from
     Shiyang Ruan

   - hugetlb optimizations from Mike Kravetz

   - Mel Gorman has contributed some pagealloc changes to improve
     latency and realtime behaviour.

   - mprotect soft-dirty checking has been improved by Peter Xu

   - Many other singleton patches all over the place"

 [ XFS merge from hell as per Darrick Wong in

   https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ]

* tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits)
  tools/testing/selftests/vm/hmm-tests.c: fix build
  mm: Kconfig: fix typo
  mm: memory-failure: convert to pr_fmt()
  mm: use is_zone_movable_page() helper
  hugetlbfs: fix inaccurate comment in hugetlbfs_statfs()
  hugetlbfs: cleanup some comments in inode.c
  hugetlbfs: remove unneeded header file
  hugetlbfs: remove unneeded hugetlbfs_ops forward declaration
  hugetlbfs: use helper macro SZ_1{K,M}
  mm: cleanup is_highmem()
  mm/hmm: add a test for cross device private faults
  selftests: add soft-dirty into run_vmtests.sh
  selftests: soft-dirty: add test for mprotect
  mm/mprotect: fix soft-dirty check in can_change_pte_writable()
  mm: memcontrol: fix potential oom_lock recursion deadlock
  mm/gup.c: fix formatting in check_and_migrate_movable_page()
  xfs: fail dax mount if reflink is enabled on a partition
  mm/memcontrol.c: remove the redundant updating of stats_flush_threshold
  userfaultfd: don't fail on unrecognized features
  hugetlb_cgroup: fix wrong hugetlb cgroup numa stat
  ...
2022-08-05 16:32:45 -07:00
Jiri Olsa
62d468e5e1 bpf: Cleanup ftrace hash in bpf_trampoline_put
We need to release possible hash from trampoline fops object
before removing it, otherwise we leak it.

Fixes: 00963a2e75 ("bpf: Support bpf_trampoline on functions with IPMODIFY (e.g. livepatch)")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/bpf/20220802135651.1794015-1-jolsa@kernel.org
2022-08-05 09:43:58 -07:00
Linus Torvalds
965a9d75e3 Merge tag 'trace-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:

 - Runtime verification infrastructure

   This is the biggest change here. It introduces the runtime
   verification that is necessary for running Linux on safety critical
   systems.

   It allows for deterministic automata models to be inserted into the
   kernel that will attach to tracepoints, where the information on
   these tracepoints will move the model from state to state.

   If a state is encountered that does not belong to the model, it will
   then activate a given reactor, that could just inform the user or
   even panic the kernel (for which safety critical systems will detect
   and can recover from).

 - Two monitor models are also added: Wakeup In Preemptive (WIP - not to
   be confused with "work in progress"), and Wakeup While Not Running
   (WWNR).

 - Added __vstring() helper to the TRACE_EVENT() macro to replace
   several vsnprintf() usages that were all doing it wrong.

 - eprobes now can have their event autogenerated when the event name is
   left off.

 - The rest is various cleanups and fixes.

* tag 'trace-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (50 commits)
  rv: Unlock on error path in rv_unregister_reactor()
  tracing: Use alignof__(struct {type b;}) instead of offsetof()
  tracing/eprobe: Show syntax error logs in error_log file
  scripts/tracing: Fix typo 'the the' in comment
  tracepoints: It is CONFIG_TRACEPOINTS not CONFIG_TRACEPOINT
  tracing: Use free_trace_buffer() in allocate_trace_buffers()
  tracing: Use a struct alignof to determine trace event field alignment
  rv/reactor: Add the panic reactor
  rv/reactor: Add the printk reactor
  rv/monitor: Add the wwnr monitor
  rv/monitor: Add the wip monitor
  rv/monitor: Add the wip monitor skeleton created by dot2k
  Documentation/rv: Add deterministic automata instrumentation documentation
  Documentation/rv: Add deterministic automata monitor synthesis documentation
  tools/rv: Add dot2k
  Documentation/rv: Add deterministic automaton documentation
  tools/rv: Add dot2c
  Documentation/rv: Add a basic documentation
  rv/include: Add instrumentation helper functions
  rv/include: Add deterministic automata monitor definition via C macros
  ...
2022-08-05 09:41:12 -07:00
Dan Carpenter
f1a15b977f rv: Unlock on error path in rv_unregister_reactor()
Unlock the "rv_interface_lock" mutex before returning.

Link: https://lkml.kernel.org/r/YuvYzNfGMgV+PIhd@kili

Fixes: 04acadcb44 ("rv: Add runtime reactors interface")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-08-04 22:49:17 -04:00
Linus Torvalds
7447691ef9 Merge tag 'for-linus-6.0-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:

 - a series fine tuning virtio support for Xen guests, including removal
   the now again unused "platform_has()" feature.

 - a fix for host admin triggered reboot of Xen guests

 - a simple spelling fix

* tag 'for-linus-6.0-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: don't require virtio with grants for non-PV guests
  kernel: remove platform_has() infrastructure
  virtio: replace restricted mem access flag with callback
  xen: Fix spelling mistake
  xen/manage: Use orderly_reboot() to reboot
2022-08-04 15:10:55 -07:00
Linus Torvalds
228dfe98a3 Merge tag 'char-misc-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc driver updates from Greg KH:
 "Here is the large set of char and misc and other driver subsystem
  changes for 6.0-rc1.

  Highlights include:

   - large set of IIO driver updates, additions, and cleanups

   - new habanalabs device support added (loads of register maps much
     like GPUs have)

   - soundwire driver updates

   - phy driver updates

   - slimbus driver updates

   - tiny virt driver fixes and updates

   - misc driver fixes and updates

   - interconnect driver updates

   - hwtracing driver updates

   - fpga driver updates

   - extcon driver updates

   - firmware driver updates

   - counter driver update

   - mhi driver fixes and updates

   - binder driver fixes and updates

   - speakup driver fixes

  All of these have been in linux-next for a while without any reported
  problems"

* tag 'char-misc-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (634 commits)
  drivers: lkdtm: fix clang -Wformat warning
  char: remove VR41XX related char driver
  misc: Mark MICROCODE_MINOR unused
  spmi: trace: fix stack-out-of-bound access in SPMI tracing functions
  dt-bindings: iio: adc: Add compatible for MT8188
  iio: light: isl29028: Fix the warning in isl29028_remove()
  iio: accel: sca3300: Extend the trigger buffer from 16 to 32 bytes
  iio: fix iio_format_avail_range() printing for none IIO_VAL_INT
  iio: adc: max1027: unlock on error path in max1027_read_single_value()
  iio: proximity: sx9324: add empty line in front of bullet list
  iio: magnetometer: hmc5843: Remove duplicate 'the'
  iio: magn: yas530: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros
  iio: magnetometer: ak8974: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros
  iio: light: veml6030: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros
  iio: light: vcnl4035: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros
  iio: light: vcnl4000: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros
  iio: light: tsl2591: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr()
  iio: light: tsl2583: Use DEFINE_RUNTIME_DEV_PM_OPS and pm_ptr()
  iio: light: isl29028: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr()
  iio: light: gp2ap002: Switch to DEFINE_RUNTIME_DEV_PM_OPS and pm_ptr()
  ...
2022-08-04 11:05:48 -07:00