Commit Graph

3406 Commits

Author SHA1 Message Date
David Howells
7ba167c4c7 netfs: Switch to using unsigned long long rather than loff_t
Switch to using unsigned long long rather than loff_t in netfslib to avoid
problems with the sign flipping in the maths when we're dealing with the
byte at position 0x7fffffffffffffff.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Xiubo Li <xiubli@redhat.com>
cc: netfs@lists.linux.dev
cc: ceph-devel@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
2024-05-01 18:07:35 +01:00
David Howells
b4ff7b178b netfs: Remove ->launder_folio() support
Remove support for ->launder_folio() from netfslib and expect filesystems
to use filemap_invalidate_inode() instead.  netfs_launder_folio() can then
be got rid of.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: Eric Van Hensbergen <ericvh@kernel.org>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Steve French <sfrench@samba.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-mm@kvack.org
cc: linux-fsdevel@vger.kernel.org
cc: netfs@lists.linux.dev
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: ceph-devel@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: devel@lists.orangefs.org
2024-05-01 18:07:34 +01:00
Mark Brown
9f6bdb0aa1 ASoC: doc: dapm: various improvements
Merge series from Luca Ceresoli <luca.ceresoli@bootlin.com>:

This series applies various improvements to the DAPM documentation: a
rewrite of a few sections for clarity, style improvements and typo fixes.

Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
---
Changes in v2:
- avoid wrapping in patch 3 as suggested by Alex
- Link to v1: https://lore.kernel.org/r/20240416-dapm-docs-v1-0-a818d2819bf6@bootlin.com

---
Luca Ceresoli (12):
      ASoC: doc: dapm: fix typos
      ASoC: doc: dapm: fix struct name
      ASoC: doc: dapm: minor rewording
      ASoC: doc: dapm: remove dash after colon
      ASoC: doc: dapm: clarify it's an internal API
      ASoC: doc: dapm: replace "map" with "graph"
      ASoC: doc: dapm: extend initial descrption
      ASoC: doc: dapm: describe how widgets and routes are registered
      ASoC: doc: dapm: fix and improve section "Registering DAPM controls"
      ASoC: doc: dapm: improve section "Codec/DSP Widget Interconnections"
      ASoC: doc: dapm: update section "DAPM Widget Events"
      ASoC: doc: dapm: update event types

 Documentation/sound/soc/dapm-graph.svg | 375 +++++++++++++++++++++++++++++++++
 Documentation/sound/soc/dapm.rst       | 174 ++++++++++-----
 2 files changed, 492 insertions(+), 57 deletions(-)
---
base-commit: c942a0cd36
change-id: 20240315-dapm-docs-79bd51f267db

Best regards,
--
Luca Ceresoli <luca.ceresoli@bootlin.com>
2024-05-01 00:00:17 +09:00
Jakub Kicinski
89de2db193 Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2024-04-29

We've added 147 non-merge commits during the last 32 day(s) which contain
a total of 158 files changed, 9400 insertions(+), 2213 deletions(-).

The main changes are:

1) Add an internal-only BPF per-CPU instruction for resolving per-CPU
   memory addresses and implement support in x86 BPF JIT. This allows
   inlining per-CPU array and hashmap lookups
   and the bpf_get_smp_processor_id() helper, from Andrii Nakryiko.

2) Add BPF link support for sk_msg and sk_skb programs, from Yonghong Song.

3) Optimize x86 BPF JIT's emit_mov_imm64, and add support for various
   atomics in bpf_arena which can be JITed as a single x86 instruction,
   from Alexei Starovoitov.

4) Add support for passing mark with bpf_fib_lookup helper,
   from Anton Protopopov.

5) Add a new bpf_wq API for deferring events and refactor sleepable
   bpf_timer code to keep common code where possible,
   from Benjamin Tissoires.

6) Fix BPF_PROG_TEST_RUN infra with regards to bpf_dummy_struct_ops programs
   to check when NULL is passed for non-NULLable parameters,
   from Eduard Zingerman.

7) Harden the BPF verifier's and/or/xor value tracking,
   from Harishankar Vishwanathan.

8) Introduce crypto kfuncs to make BPF programs able to utilize the kernel
   crypto subsystem, from Vadim Fedorenko.

9) Various improvements to the BPF instruction set standardization doc,
   from Dave Thaler.

10) Extend libbpf APIs to partially consume items from the BPF ringbuffer,
    from Andrea Righi.

11) Bigger batch of BPF selftests refactoring to use common network helpers
    and to drop duplicate code, from Geliang Tang.

12) Support bpf_tail_call_static() helper for BPF programs with GCC 13,
    from Jose E. Marchesi.

13) Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF
    program to have code sections where preemption is disabled,
    from Kumar Kartikeya Dwivedi.

14) Allow invoking BPF kfuncs from BPF_PROG_TYPE_SYSCALL programs,
    from David Vernet.

15) Extend the BPF verifier to allow different input maps for a given
    bpf_for_each_map_elem() helper call in a BPF program, from Philo Lu.

16) Add support for PROBE_MEM32 and bpf_addr_space_cast instructions
    for riscv64 and arm64 JITs to enable BPF Arena, from Puranjay Mohan.

17) Shut up a false-positive KMSAN splat in interpreter mode by unpoison
    the stack memory, from Martin KaFai Lau.

18) Improve xsk selftest coverage with new tests on maximum and minimum
    hardware ring size configurations, from Tushar Vyavahare.

19) Various ReST man pages fixes as well as documentation and bash completion
    improvements for bpftool, from Rameez Rehman & Quentin Monnet.

20) Fix libbpf with regards to dumping subsequent char arrays,
    from Quentin Deslandes.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (147 commits)
  bpf, docs: Clarify PC use in instruction-set.rst
  bpf_helpers.h: Define bpf_tail_call_static when building with GCC
  bpf, docs: Add introduction for use in the ISA Internet Draft
  selftests/bpf: extend BPF_SOCK_OPS_RTT_CB test for srtt and mrtt_us
  bpf: add mrtt and srtt as BPF_SOCK_OPS_RTT_CB args
  selftests/bpf: dummy_st_ops should reject 0 for non-nullable params
  bpf: check bpf_dummy_struct_ops program params for test runs
  selftests/bpf: do not pass NULL for non-nullable params in dummy_st_ops
  selftests/bpf: adjust dummy_st_ops_success to detect additional error
  bpf: mark bpf_dummy_struct_ops.test_1 parameter as nullable
  selftests/bpf: Add ring_buffer__consume_n test.
  bpf: Add bpf_guard_preempt() convenience macro
  selftests: bpf: crypto: add benchmark for crypto functions
  selftests: bpf: crypto skcipher algo selftests
  bpf: crypto: add skcipher to bpf crypto
  bpf: make common crypto API for TC/XDP programs
  bpf: update the comment for BTF_FIELDS_MAX
  selftests/bpf: Fix wq test.
  selftests/bpf: Use make_sockaddr in test_sock_addr
  selftests/bpf: Use connect_to_addr in test_sock_addr
  ...
====================

Link: https://lore.kernel.org/r/20240429131657.19423-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-29 13:12:19 -07:00
David Howells
2ff1e97587 netfs: Replace PG_fscache by setting folio->private and marking dirty
When dirty data is being written to the cache, setting/waiting on/clearing
the fscache flag is always done in tandem with setting/waiting on/clearing
the writeback flag.  The netfslib buffered write routines wait on and set
both flags and the write request cleanup clears both flags, so the fscache
flag is almost superfluous.

The reason it isn't superfluous is because the fscache flag is also used to
indicate that data just read from the server is being written to the cache.
The flag is used to prevent a race involving overlapping direct-I/O writes
to the cache.

Change this to indicate that a page is in need of being copied to the cache
by placing a magic value in folio->private and marking the folios dirty.
Then when the writeback code sees a folio marked in this way, it only
writes it to the cache and not to the server.

If a folio that has this magic value set is modified, the value is just
replaced and the folio will then be uplodaded too.

With this, PG_fscache is no longer required by the netfslib core, 9p and
afs.

Ceph and nfs, however, still need to use the old PG_fscache-based tracking.
To deal with this, a flag, NETFS_ICTX_USE_PGPRIV2, now has to be set on the
flags in the netfs_inode struct for those filesystems.  This reenables the
use of PG_fscache in that inode.  9p and afs use the netfslib write helpers
so get switched over; cifs, for the moment, does page-by-page manual access
to the cache, so doesn't use PG_fscache and is unaffected.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: Matthew Wilcox (Oracle) <willy@infradead.org>
cc: Eric Van Hensbergen <ericvh@kernel.org>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Xiubo Li <xiubli@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
cc: Shyam Prasad N <sprasad@microsoft.com>
cc: Tom Talpey <tom@talpey.com>
cc: Bharath SM <bharathsm@microsoft.com>
cc: Trond Myklebust <trond.myklebust@hammerspace.com>
cc: Anna Schumaker <anna@kernel.org>
cc: netfs@lists.linux.dev
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: ceph-devel@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: linux-nfs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
2024-04-29 15:01:42 +01:00
Jithu Joseph
15b429f4e0 platform/x86/intel/ifs: trace: display batch num in hex
In Field Scan test image files are named in ff-mm-ss-<batch02x>.scan
format. Current trace output, prints the batch number in decimal format.

Make it easier to correlate the trace line to a test image file by
showing the batch number also in hex format.

Add 0x prefix to all fields in the trace line to make the type explicit.

Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://lore.kernel.org/r/20240412172349.544064-3-jithu.joseph@intel.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-04-29 10:52:02 +02:00
Linus Torvalds
e6ebf01172 Merge tag 'mm-hotfixes-stable-2024-04-26-13-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
 "11 hotfixes. 8 are cc:stable and the remaining 3 (nice ratio!) address
  post-6.8 issues or aren't considered suitable for backporting.

  All except one of these are for MM. I see no particular theme - it's
  singletons all over"

* tag 'mm-hotfixes-stable-2024-04-26-13-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm/hugetlb: fix DEBUG_LOCKS_WARN_ON(1) when dissolve_free_hugetlb_folio()
  selftests: mm: protection_keys: save/restore nr_hugepages value from launch script
  stackdepot: respect __GFP_NOLOCKDEP allocation flag
  hugetlb: check for anon_vma prior to folio allocation
  mm: zswap: fix shrinker NULL crash with cgroup_disable=memory
  mm: turn folio_test_hugetlb into a PageType
  mm: support page_mapcount() on page_has_type() pages
  mm: create FOLIO_FLAG_FALSE and FOLIO_TYPE_OPS macros
  mm/hugetlb: fix missing hugetlb_lock for resv uncharge
  selftests: mm: fix unused and uninitialized variable warning
  selftests/harness: remove use of LINE_MAX
2024-04-26 13:48:03 -07:00
Jason Xing
b533fb9cf4 rstreason: make it work in trace world
At last, we should let it work by introducing this reset reason in
trace world.

One of the possible expected outputs is:
... tcp_send_reset: skbaddr=xxx skaddr=xxx src=xxx dest=xxx
state=TCP_ESTABLISHED reason=NOT_SPECIFIED

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-26 15:34:01 +02:00
Matthew Wilcox (Oracle)
43849758fd khugepaged: use a folio throughout hpage_collapse_scan_file()
Replace the use of pages with folios.  Saves a few calls to
compound_head() and removes some uses of obsolete functions.

Link: https://lkml.kernel.org/r/20240403171838.1445826-8-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25 20:56:34 -07:00
Matthew Wilcox (Oracle)
610ff817b9 khugepaged: remove hpage from collapse_file()
Use new_folio throughout where we had been using hpage.

Link: https://lkml.kernel.org/r/20240403171838.1445826-6-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25 20:56:33 -07:00
Matthew Wilcox (Oracle)
c93012d849 dax: use huge_zero_folio
Convert from huge_zero_page to huge_zero_folio.

Link: https://lkml.kernel.org/r/20240326202833.523759-8-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25 20:56:20 -07:00
Matthew Wilcox (Oracle)
46df8e73a4 mm: free up PG_slab
Reclaim the Slab page flag by using a spare bit in PageType.  We are
perennially short of page flags for various purposes, and now that the
original SLAB allocator has been retired, SLUB does not use the
mapcount/page_type field.  This lets us remove a number of special cases
for ignoring mapcount on Slab pages.

[willy@infradead.org: update vmcoreinfo]
  Link: https://lkml.kernel.org/r/ZgGV-O8WYQ_83kxp@casper.infradead.org
Link: https://lkml.kernel.org/r/20240321142448.1645400-8-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25 20:56:00 -07:00
Matthew Wilcox (Oracle)
d99e3140a4 mm: turn folio_test_hugetlb into a PageType
The current folio_test_hugetlb() can be fooled by a concurrent folio split
into returning true for a folio which has never belonged to hugetlbfs. 
This can't happen if the caller holds a refcount on it, but we have a few
places (memory-failure, compaction, procfs) which do not and should not
take a speculative reference.

Since hugetlb pages do not use individual page mapcounts (they are always
fully mapped and use the entire_mapcount field to record the number of
mappings), the PageType field is available now that page_mapcount()
ignores the value in this field.

In compaction and with CONFIG_DEBUG_VM enabled, the current implementation
can result in an oops, as reported by Luis. This happens since 9c5ccf2db0
("mm: remove HUGETLB_PAGE_DTOR") effectively added some VM_BUG_ON() checks
in the PageHuge() testing path.

[willy@infradead.org: update vmcoreinfo]
  Link: https://lkml.kernel.org/r/ZgGZUvsdhaT1Va-T@casper.infradead.org
Link: https://lkml.kernel.org/r/20240321142448.1645400-6-willy@infradead.org
Fixes: 9c5ccf2db0 ("mm: remove HUGETLB_PAGE_DTOR")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Luis Chamberlain <mcgrof@kernel.org>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218227
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-24 19:34:26 -07:00
Vincent Guittot
d4dbc99171 sched/cpufreq: Rename arch_update_thermal_pressure() => arch_update_hw_pressure()
Now that cpufreq provides a pressure value to the scheduler, rename
arch_update_thermal_pressure into HW pressure to reflect that it returns
a pressure applied by HW (i.e. with a high frequency change) and not
always related to thermal mitigation but also generated by max current
limitation as an example. Such high frequency signal needs filtering to be
smoothed and provide an value that reflects the average available capacity
into the scheduler time scale.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Qais Yousef <qyousef@layalina.io>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lore.kernel.org/r/20240326091616.3696851-5-vincent.guittot@linaro.org
2024-04-24 12:08:01 +02:00
Chao Yu
92f750d847 f2fs: convert f2fs__page tracepoint class to use folio
Convert f2fs__page tracepoint class() and its instances to use folio
and related functionality, and rename it to f2fs__folio().

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-04-19 17:57:10 +00:00
Jakub Kicinski
41e3ddb291 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

include/trace/events/rpcgss.h
  386f4a7379 ("trace: events: cleanup deprecated strncpy uses")
  a4833e3aba ("SUNRPC: Fix rpcgss_context trace event acceptor field")

Adjacent changes:

drivers/net/ethernet/intel/ice/ice_tc_lib.c
  2cca35f5dd ("ice: Fix checking for unsupported keys on non-tunnel device")
  784feaa65d ("ice: Add support for PFCP hardware offload in switchdev")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-18 13:12:24 -07:00
Jesper Dangaard Brouer
fc29e04ae1 cgroup/rstat: add cgroup_rstat_lock helpers and tracepoints
This commit enhances the ability to troubleshoot the global
cgroup_rstat_lock by introducing wrapper helper functions for the lock
along with associated tracepoints.

Although global, the cgroup_rstat_lock helper APIs and tracepoints take
arguments such as cgroup pointer and cpu_in_loop variable. This
adjustment is made because flushing occurs per cgroup despite the lock
being global. Hence, when troubleshooting, it's important to identify the
relevant cgroup. The cpu_in_loop variable is necessary because the global
lock may be released within the main flushing loop that traverses CPUs.
In the tracepoints, the cpu_in_loop value is set to -1 when acquiring the
main lock; otherwise, it denotes the CPU number processed last.

The new feature in this patchset is detecting when lock is contended. The
tracepoints are implemented with production in mind. For minimum overhead
attach to cgroup:cgroup_rstat_lock_contended, which only gets activated
when trylock detects lock is contended. A quick production check for
issues could be done via this perf commands:

 perf record -g -e cgroup:cgroup_rstat_lock_contended

Next natural question would be asking how long time do lock contenders
wait for obtaining the lock. This can be answered by measuring the time
between cgroup:cgroup_rstat_lock_contended and cgroup:cgroup_rstat_locked
when args->contended is set.  Like this bpftrace script:

 bpftrace -e '
   tracepoint:cgroup:cgroup_rstat_lock_contended {@start[tid]=nsecs}
   tracepoint:cgroup:cgroup_rstat_locked {
     if (args->contended) {
       @wait_ns=hist(nsecs-@start[tid]); delete(@start[tid]);}}
   interval:s:1 {time("%H:%M:%S "); print(@wait_ns); }'

Extending with time spend holding the lock will be more expensive as this
also looks at all the non-contended cases.
Like this bpftrace script:

 bpftrace -e '
   tracepoint:cgroup:cgroup_rstat_lock_contended {@start[tid]=nsecs}
   tracepoint:cgroup:cgroup_rstat_locked { @locked[tid]=nsecs;
     if (args->contended) {
       @wait_ns=hist(nsecs-@start[tid]); delete(@start[tid]);}}
   tracepoint:cgroup:cgroup_rstat_unlock {
       @locked_ns=hist(nsecs-@locked[tid]); delete(@locked[tid]);}
   interval:s:1 {time("%H:%M:%S ");  print(@wait_ns);print(@locked_ns); }'

Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-04-16 12:10:42 -10:00
Steven Rostedt
58300f8d6a ASoC: tracing: Export SND_SOC_DAPM_DIR_OUT to its value
The string SND_SOC_DAPM_DIR_OUT is printed in the snd_soc_dapm_path trace
event instead of its value:

   (((REC->path_dir) == SND_SOC_DAPM_DIR_OUT) ? "->" : "<-")

User space cannot parse this, as it has no idea what SND_SOC_DAPM_DIR_OUT
is. Use TRACE_DEFINE_ENUM() to convert it to its value:

   (((REC->path_dir) == 1) ? "->" : "<-")

So that user space tools, such as perf and trace-cmd, can parse it
correctly.

Reported-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Fixes: 6e588a0d83 ("ASoC: dapm: Consolidate path trace events")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20240416000303.04670cdf@rorschach.local.home
Signed-off-by: Mark Brown <broonie@kernel.org>
2024-04-16 20:00:18 +09:00
Linus Torvalds
96fca68c4f Merge tag 'nfsd-6.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:

 - Fix a potential tracepoint crash

 - Fix NFSv4 GETATTR on big-endian platforms

* tag 'nfsd-6.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  NFSD: fix endianness issue in nfsd4_encode_fattr4
  SUNRPC: Fix rpcgss_context trace event acceptor field
2024-04-15 14:09:47 -07:00
Uladzislau Rezki (Sony)
2053937a31 rcu: Add a trace event for synchronize_rcu_normal()
Add an rcu_sr_normal() trace event. It takes three arguments
first one is the name of RCU flavour, second one is a user id
which triggeres synchronize_rcu_normal() and last one is an
event.

There are two traces in the synchronize_rcu_normal(). On entry,
when a new request is registered and on exit point when request
is completed.

Please note, CONFIG_RCU_TRACE=y is required to activate traces.

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
2024-04-15 19:47:51 +02:00
Paolo Bonzini
f3b65bbaed KVM: delete .change_pte MMU notifier callback
The .change_pte() MMU notifier callback was intended as an
optimization. The original point of it was that KSM could tell KVM to flip
its secondary PTE to a new location without having to first zap it. At
the time there was also an .invalidate_page() callback; both of them were
*not* bracketed by calls to mmu_notifier_invalidate_range_{start,end}(),
and .invalidate_page() also doubled as a fallback implementation of
.change_pte().

Later on, however, both callbacks were changed to occur within an
invalidate_range_start/end() block.

In the case of .change_pte(), commit 6bdb913f0a ("mm: wrap calls to
set_pte_at_notify with invalidate_range_start and invalidate_range_end",
2012-10-09) did so to remove the fallback from .invalidate_page() to
.change_pte() and allow sleepable .invalidate_page() hooks.

This however made KVM's usage of the .change_pte() callback completely
moot, because KVM unmaps the sPTEs during .invalidate_range_start()
and therefore .change_pte() has no hope of finding a sPTE to change.
Drop the generic KVM code that dispatches to kvm_set_spte_gfn(), as
well as all the architecture specific implementations.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Anup Patel <anup@brainfault.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Message-ID: <20240405115815.3226315-2-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-11 13:18:27 -04:00
Marco Elver
c82389947d tracing: Add sched_prepare_exec tracepoint
Add "sched_prepare_exec" tracepoint, which is run right after the point
of no return but before the current task assumes its new exec identity.

Unlike the tracepoint "sched_process_exec", the "sched_prepare_exec"
tracepoint runs before flushing the old exec, i.e. while the task still
has the original state (such as original MM), but when the new exec
either succeeds or crashes (but never returns to the original exec).

Being able to trace this event can be helpful in a number of use cases:

  * allowing tracing eBPF programs access to the original MM on exec,
    before current->mm is replaced;
  * counting exec in the original task (via perf event);
  * profiling flush time ("sched_prepare_exec" to "sched_process_exec").

Example of tracing output:

 $ cat /sys/kernel/debug/tracing/trace_pipe
    <...>-379  [003] .....  179.626921: sched_prepare_exec: interp=/usr/bin/sshd filename=/usr/bin/sshd pid=379 comm=sshd
    <...>-381  [002] .....  180.048580: sched_prepare_exec: interp=/bin/bash filename=/bin/bash pid=381 comm=sshd
    <...>-385  [001] .....  180.068277: sched_prepare_exec: interp=/usr/bin/tty filename=/usr/bin/tty pid=385 comm=bash
    <...>-389  [006] .....  192.020147: sched_prepare_exec: interp=/usr/bin/dmesg filename=/usr/bin/dmesg pid=389 comm=bash

Signed-off-by: Marco Elver <elver@google.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20240411102158.1272267-1-elver@google.com
Signed-off-by: Kees Cook <keescook@chromium.org>
2024-04-11 09:02:21 -07:00
Steven Rostedt (Google)
a4833e3aba SUNRPC: Fix rpcgss_context trace event acceptor field
The rpcgss_context trace event acceptor field is a dynamically sized
string that records the "data" parameter. But this parameter is also
dependent on the "len" field to determine the size of the data.

It needs to use __string_len() helper macro where the length can be passed
in. It also incorrectly uses strncpy() to save it instead of
__assign_str(). As these macros can change, it is not wise to open code
them in trace events.

As of commit c759e60903 ("tracing: Remove __assign_str_len()"),
__assign_str() can be used for both __string() and __string_len() fields.
Before that commit, __assign_str_len() is required to be used. This needs
to be noted for backporting. (In actuality, commit c1fa617cae ("tracing:
Rework __assign_str() and __string() to not duplicate getting the string")
is the commit that makes __string_str_len() obsolete).

Cc: stable@vger.kernel.org
Fixes: 0c77668ddb ("SUNRPC: Introduce trace points in rpc_auth_gss.ko")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-04-10 19:01:30 -04:00
Ingo Molnar
0e6ebfd163 Merge tag 'v6.9-rc3' into x86/cpu, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-04-09 09:28:41 +02:00
Justin Stitt
386f4a7379 trace: events: cleanup deprecated strncpy uses
strncpy() is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.

For 2 out of 3 of these changes we can simply swap in strscpy() as it
guarantess NUL-termination which is needed for the following trace
print.

trace_rpcgss_context() should use memcpy as its format specifier %.*s
allows for the length to be specifier (__entry->len). Due to this,
acceptor does not technically need to be NUL-terminated. Moreover,
swapping in strscpy() and keeping everything else the same could result
in truncation of the source string by one byte. To remedy this, we could
use `len + 1` but I am unsure of the size of the destination buffer so a
simple memcpy should suffice.
|	TP_printk("win_size=%u expiry=%lu now=%lu timeout=%u acceptor=%.*s",
|		__entry->window_size, __entry->expiry, __entry->now,
|		__entry->timeout, __entry->len, __get_str(acceptor))

I suspect acceptor not to naturally be a NUL-terminated string due to
the presence of some stringify methods.
|	.crstringify_acceptor	= gss_stringify_acceptor,

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1]
Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2]
Link: https://github.com/KSPP/linux/issues/90
Signed-off-by: Justin Stitt <justinstitt@google.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/r/20240401-strncpy-include-trace-events-mdio-h-v1-1-9cb5a4cda116@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-05 22:10:25 -07:00
Jason Xing
19822a980e trace: tcp: fully support trace_tcp_send_reset
Prior to this patch, what we can see by enabling trace_tcp_send is
only happening under two circumstances:
1) active rst mode
2) non-active rst mode and based on the full socket

That means the inconsistency occurs if we use tcpdump and trace
simultaneously to see how rst happens.

It's necessary that we should take into other cases into considerations,
say:
1) time-wait socket
2) no socket
...

By parsing the incoming skb and reversing its 4-tuple can
we know the exact 'flow' which might not exist.

Samples after applied this patch:
1. tcp_send_reset: skbaddr=XXX skaddr=XXX src=ip:port dest=ip:port
state=TCP_ESTABLISHED
2. tcp_send_reset: skbaddr=000...000 skaddr=XXX src=ip:port dest=ip:port
state=UNKNOWN
Note:
1) UNKNOWN means we cannot extract the right information from skb.
2) skbaddr/skaddr could be 0

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Link: https://lore.kernel.org/r/20240401073605.37335-3-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03 19:26:14 -07:00
Jason Xing
9807080e21 trace: adjust TP_STORE_ADDR_PORTS_SKB() parameters
Introducing entry_saddr and entry_daddr parameters in this macro
for later use can help us record the reverse 4-tuple by analyzing
the 4-tuple of the incoming skb when receiving.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240401073605.37335-2-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03 19:26:14 -07:00
Avadhut Naik
186d7ef52c tracing: Add the ::microcode field to the mce_record tracepoint
Currently, the microcode field (Microcode Revision) of 'struct mce' is not
exposed to userspace through the mce_record tracepoint.

Knowing the microcode version on which the MCE was received is critical
information for debugging. If the version is not recorded, later attempts
to acquire the version might result in discrepancies since it can be
changed at runtime.

Add microcode version to the tracepoint to prevent ambiguity over
the active version on the system when the MCE was received.

Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20240401171455.1737976-3-avadhut.naik@amd.com
2024-04-03 09:39:29 +02:00
Avadhut Naik
98430645e3 tracing: Add the ::ppin field to the mce_record tracepoint
Machine Check Error information from 'struct mce' is exposed to userspace
through the mce_record tracepoint.

Currently, however, the PPIN (Protected Processor Inventory Number) field
of 'struct mce' is not exposed.

Add a PPIN field to the tracepoint as it provides a unique identifier for
the system (or socket in case of multi-socket systems) on which the MCE
has been received.

Also, add a comment explaining the kind of information that can be and
should be added to the tracepoint.

Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20240401171455.1737976-2-avadhut.naik@amd.com
2024-04-03 09:39:29 +02:00
Alexander Aring
1131f33908 dlm: remove lkb from callback tracepoints
Stop using lkb structs in the callback tracepoints so that lkb
references are not needed. This prepares for separating lkb
structs from callbacks.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-01 13:31:12 -05:00
Ingo Molnar
ac5e80e94f x86/mce: Clean up TP_printk() output line of the 'mce_record' tracepoint
- Only capitalize entries where that makes sense
 - Print separate values separately
 - Rename 'PROCESSOR' to vendor & CPUID

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Avadhut Naik <avadhut.naik@amd.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Link: https://lore.kernel.org/r/ZgZpn/zbCJWYdL5y@gmail.com
2024-03-30 11:35:28 +01:00
Balazs Scheidler
e9669a00bb net: udp: add IP/port data to the tracepoint udp/udp_fail_queue_rcv_skb
The udp_fail_queue_rcv_skb() tracepoint lacks any details on the source
and destination IP/port whereas this information can be critical in case
of UDP/syslog.

Signed-off-by: Balazs Scheidler <balazs.scheidler@axoflow.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/r/0c8b3e33dbf679e190be6f4c6736603a76988a20.1711475011.git.balazs.scheidler@axoflow.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-29 12:18:24 -07:00
Balazs Scheidler
a0ad11fc26 net: port TP_STORE_ADDR_PORTS_SKB macro to be tcp/udp independent
This patch moves TP_STORE_ADDR_PORTS_SKB() to a common header and removes
the TCP specific implementation details.

Previously the macro assumed the skb passed as an argument is a
TCP packet, the implementation now uses an argument to the L4 header and
uses that to extract the source/destination ports, which happen
to be named the same in "struct tcphdr" and "struct udphdr"

Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Balazs Scheidler <balazs.scheidler@axoflow.com>
Link: https://lore.kernel.org/r/9e306f78260dfbbdc7353ba5f864cc027a409540.1711475011.git.balazs.scheidler@axoflow.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-29 12:18:24 -07:00
Andrii Nakryiko
3124591f68 bpf: add bpf_modify_return_test_tp() kfunc triggering tracepoint
Add a simple bpf_modify_return_test_tp() kfunc, available to all program
types, that is useful for various testing and benchmarking scenarios, as
it allows to trigger most tracing BPF program types from BPF side,
allowing to do complex testing and benchmarking scenarios.

It is also attachable to for fmod_ret programs, making it a good and
simple way to trigger fmod_ret program under test/benchmark.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240326162151.3981687-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28 18:31:40 -07:00
Cristian Marussi
da251ce210 include: trace: Widen the tag buffer in trace_scmi_dump_msg
A bigger buffer allow for more diverse tag names.

Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20240325204620.1437237-2-cristian.marussi@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2024-03-26 11:17:40 +00:00
Jason Xing
646700ce23 trace: use TP_STORE_ADDRS() macro in inet_sock_set_state()
As the title said, use the macro directly like the patch[1] did
to avoid those duplications. No functional change.

[1]
commit 6a6b0b9914 ("tcp: Avoid preprocessor directives in tracepoint macro args")

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-26 11:12:06 +01:00
Jason Xing
a24c855a5e trace: use TP_STORE_ADDRS() macro in inet_sk_error_report()
As the title said, use the macro directly like the patch[1] did
to avoid those duplications. No functional change.

[1]
commit 6a6b0b9914 ("tcp: Avoid preprocessor directives in tracepoint macro args")

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-26 11:12:05 +01:00
Jason Xing
b3af9045b4 trace: move to TP_STORE_ADDRS related macro to net_probe_common.h
Put the macro into another standalone file for better extension.
Some tracepoints can use this common part in the future.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-26 11:12:05 +01:00
Kassey Li
d6a7bbdde6 workqueue: add function in event of workqueue_activate_work
The trace event "workqueue_activate_work" only print work struct.
However, function is the region of interest in a full sequence of work.
Current workqueue_activate_work trace event output:

    workqueue_activate_work: work struct ffffff88b4a0f450

With this change, workqueue_activate_work will print the function name,
align with workqueue_queue_work/execute_start/execute_end event.

    workqueue_activate_work: work struct ffffff80413a78b8 function=vmstat_update

Signed-off-by: Kassey Li <quic_yingangl@quicinc.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-03-25 09:28:52 -10:00
Linus Torvalds
ad584d73a2 Merge tag 'trace-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:
 "Main user visible change:

   - User events can now have "multi formats"

     The current user events have a single format. If another event is
     created with a different format, it will fail to be created. That
     is, once an event name is used, it cannot be used again with a
     different format. This can cause issues if a library is using an
     event and updates its format. An application using the older format
     will prevent an application using the new library from registering
     its event.

     A task could also DOS another application if it knows the event
     names, and it creates events with different formats.

     The multi-format event is in a different name space from the single
     format. Both the event name and its format are the unique
     identifier. This will allow two different applications to use the
     same user event name but with different payloads.

   - Added support to have ftrace_dump_on_oops dump out instances and
     not just the main top level tracing buffer.

  Other changes:

   - Add eventfs_root_inode

     Only the root inode has a dentry that is static (never goes away)
     and stores it upon creation. There's no reason that the thousands
     of other eventfs inodes should have a pointer that never gets set
     in its descriptor. Create a eventfs_root_inode desciptor that has a
     eventfs_inode descriptor and a dentry pointer, and only the root
     inode will use this.

   - Added WARN_ON()s in eventfs

     There's some conditionals remaining in eventfs that should never be
     hit, but instead of removing them, add WARN_ON() around them to
     make sure that they are never hit.

   - Have saved_cmdlines allocation also include the map_cmdline_to_pid
     array

     The saved_cmdlines structure allocates a large amount of data to
     hold its mappings. Within it, it has three arrays. Two are already
     apart of it: map_pid_to_cmdline[] and saved_cmdlines[]. More memory
     can be saved by also including the map_cmdline_to_pid[] array as
     well.

   - Restructure __string() and __assign_str() macros used in
     TRACE_EVENT()

     Dynamic strings in TRACE_EVENT() are declared with:

         __string(name, source)

     And assigned with:

        __assign_str(name, source)

     In the tracepoint callback of the event, the __string() is used to
     get the size needed to allocate on the ring buffer and
     __assign_str() is used to copy the string into the ring buffer.
     There's a helper structure that is created in the TRACE_EVENT()
     macro logic that will hold the string length and its position in
     the ring buffer which is created by __string().

     There are several trace events that have a function to create the
     string to save. This function is executed twice. Once for
     __string() and again for __assign_str(). There's no reason for
     this. The helper structure could also save the string it used in
     __string() and simply copy that into __assign_str() (it also
     already has its length).

     By using the structure to store the source string for the
     assignment, it means that the second argument to __assign_str() is
     no longer needed.

     It will be removed in the next merge window, but for now add a
     warning if the source string given to __string() is different than
     the source string given to __assign_str(), as the source to
     __assign_str() isn't even used and will be going away.

   - Added checks to make sure that the source of __string() is also the
     source of __assign_str() so that it can be safely removed in the
     next merge window.

     Included fixes that the above check found.

   - Other minor clean ups and fixes"

* tag 'trace-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (34 commits)
  tracing: Add __string_src() helper to help compilers not to get confused
  tracing: Use strcmp() in __assign_str() WARN_ON() check
  tracepoints: Use WARN() and not WARN_ON() for warnings
  tracing: Use div64_u64() instead of do_div()
  tracing: Support to dump instance traces by ftrace_dump_on_oops
  tracing: Remove second parameter to __assign_rel_str()
  tracing: Add warning if string in __assign_str() does not match __string()
  tracing: Add __string_len() example
  tracing: Remove __assign_str_len()
  ftrace: Fix most kernel-doc warnings
  tracing: Decrement the snapshot if the snapshot trigger fails to register
  tracing: Fix snapshot counter going between two tracers that use it
  tracing: Use EVENT_NULL_STR macro instead of open coding "(null)"
  tracing: Use ? : shortcut in trace macros
  tracing: Do not calculate strlen() twice for __string() fields
  tracing: Rework __assign_str() and __string() to not duplicate getting the string
  cxl/trace: Properly initialize cxl_poison region name
  net: hns3: tracing: fix hclgevf trace event strings
  drm/i915: Add missing ; to __assign_str() macros in tracepoint code
  NFSD: Fix nfsd_clid_class use of __string_len() macro
  ...
2024-03-18 15:11:44 -07:00
Steven Rostedt (Google)
70a6ed553f tracing: Use EVENT_NULL_STR macro instead of open coding "(null)"
The TRACE_EVENT macros has some dependency if a __string() field is NULL,
where it will save "(null)" as the string. This string is also used by
__assign_str(). It's better to create a single macro instead of having
something that will not be caught by the compiler if there is an
unfortunate typo.

Link: https://lore.kernel.org/linux-trace-kernel/20240222211443.106216915@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-03-18 10:33:04 -04:00
Linus Torvalds
c1f10ac840 Merge tag 'nfs-for-6.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Bugfixes:
   - Fix for an Oops in the NFSv4.2 listxattr handler
   - Correct an incorrect buffer size in listxattr
   - Fix for an Oops in the pNFS flexfiles layout
   - Fix a refcount leak in NFS O_DIRECT writes
   - Fix missing locking in NFS O_DIRECT
   - Avoid an infinite loop in pnfs_update_layout
   - Fix an overflow in the RPC waitqueue queue length counter
   - Ensure that pNFS I/O is also protected by TLS when xprtsec is
     specified by the mount options
   - Fix a leaked folio lock in the netfs read code
   - Fix a potential deadlock in fscache
   - Allow setting the fscache uniquifier in NFSv4
   - Fix an off by one in root_nfs_cat()
   - Fix another off by one in rpc_sockaddr2uaddr()
   - nfs4_do_open() can incorrectly trigger state recovery
   - Various fixes for connection shutdown

  Features and cleanups:
   - Ensure that containers only see their own RPC and NFS stats
   - Enable nconnect for RDMA
   - Remove dead code from nfs_writepage_locked()
   - Various tracepoint additions to track EXCHANGE_ID, GETDEVICEINFO,
     and mount options"

* tag 'nfs-for-6.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (29 commits)
  nfs: fix panic when nfs4_ff_layout_prepare_ds() fails
  NFS: trace the uniquifier of fscache
  NFS: Read unlock folio on nfs_page_create_from_folio() error
  NFS: remove unused variable nfs_rpcstat
  nfs: fix UAF in direct writes
  nfs: properly protect nfs_direct_req fields
  NFS: enable nconnect for RDMA
  NFSv4: nfs4_do_open() is incorrectly triggering state recovery
  NFS: avoid infinite loop in pnfs_update_layout.
  NFS: remove sync_mode test from nfs_writepage_locked()
  NFSv4.1/pnfs: fix NFS with TLS in pnfs
  NFS: Fix an off by one in root_nfs_cat()
  nfs: make the rpc_stat per net namespace
  nfs: expose /proc/net/sunrpc/nfs in net namespaces
  sunrpc: add a struct rpc_stats arg to rpc_create_args
  nfs: remove unused NFS_CALL macro
  NFSv4.1: add tracepoint to trunked nfs4_exchange_id calls
  NFS: Fix nfs_netfs_issue_read() xarray locking for writeback interrupt
  SUNRPC: increase size of rpc_wait_queue.qlen from unsigned short to unsigned int
  nfs: fix regression in handling of fsc= option in NFSv4
  ...
2024-03-16 11:44:00 -07:00
Linus Torvalds
902861e34c Merge tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:

 - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames
   from hotplugged memory rather than only from main memory. Series
   "implement "memmap on memory" feature on s390".

 - More folio conversions from Matthew Wilcox in the series

	"Convert memcontrol charge moving to use folios"
	"mm: convert mm counter to take a folio"

 - Chengming Zhou has optimized zswap's rbtree locking, providing
   significant reductions in system time and modest but measurable
   reductions in overall runtimes. The series is "mm/zswap: optimize the
   scalability of zswap rb-tree".

 - Chengming Zhou has also provided the series "mm/zswap: optimize zswap
   lru list" which provides measurable runtime benefits in some
   swap-intensive situations.

 - And Chengming Zhou further optimizes zswap in the series "mm/zswap:
   optimize for dynamic zswap_pools". Measured improvements are modest.

 - zswap cleanups and simplifications from Yosry Ahmed in the series
   "mm: zswap: simplify zswap_swapoff()".

 - In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has
   contributed several DAX cleanups as well as adding a sysfs tunable to
   control the memmap_on_memory setting when the dax device is
   hotplugged as system memory.

 - Johannes Weiner has added the large series "mm: zswap: cleanups",
   which does that.

 - More DAMON work from SeongJae Park in the series

	"mm/damon: make DAMON debugfs interface deprecation unignorable"
	"selftests/damon: add more tests for core functionalities and corner cases"
	"Docs/mm/damon: misc readability improvements"
	"mm/damon: let DAMOS feeds and tame/auto-tune itself"

 - In the series "mm/mempolicy: weighted interleave mempolicy and sysfs
   extension" Rakie Kim has developed a new mempolicy interleaving
   policy wherein we allocate memory across nodes in a weighted fashion
   rather than uniformly. This is beneficial in heterogeneous memory
   environments appearing with CXL.

 - Christophe Leroy has contributed some cleanup and consolidation work
   against the ARM pagetable dumping code in the series "mm: ptdump:
   Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute".

 - Luis Chamberlain has added some additional xarray selftesting in the
   series "test_xarray: advanced API multi-index tests".

 - Muhammad Usama Anjum has reworked the selftest code to make its
   human-readable output conform to the TAP ("Test Anything Protocol")
   format. Amongst other things, this opens up the use of third-party
   tools to parse and process out selftesting results.

 - Ryan Roberts has added fork()-time PTE batching of THP ptes in the
   series "mm/memory: optimize fork() with PTE-mapped THP". Mainly
   targeted at arm64, this significantly speeds up fork() when the
   process has a large number of pte-mapped folios.

 - David Hildenbrand also gets in on the THP pte batching game in his
   series "mm/memory: optimize unmap/zap with PTE-mapped THP". It
   implements batching during munmap() and other pte teardown
   situations. The microbenchmark improvements are nice.

 - And in the series "Transparent Contiguous PTEs for User Mappings"
   Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte
   mappings"). Kernel build times on arm64 improved nicely. Ryan's
   series "Address some contpte nits" provides some followup work.

 - In the series "mm/hugetlb: Restore the reservation" Breno Leitao has
   fixed an obscure hugetlb race which was causing unnecessary page
   faults. He has also added a reproducer under the selftest code.

 - In the series "selftests/mm: Output cleanups for the compaction
   test", Mark Brown did what the title claims.

 - Kinsey Ho has added the series "mm/mglru: code cleanup and
   refactoring".

 - Even more zswap material from Nhat Pham. The series "fix and extend
   zswap kselftests" does as claimed.

 - In the series "Introduce cpu_dcache_is_aliasing() to fix DAX
   regression" Mathieu Desnoyers has cleaned up and fixed rather a mess
   in our handling of DAX on archiecctures which have virtually aliasing
   data caches. The arm architecture is the main beneficiary.

 - Lokesh Gidra's series "per-vma locks in userfaultfd" provides
   dramatic improvements in worst-case mmap_lock hold times during
   certain userfaultfd operations.

 - Some page_owner enhancements and maintenance work from Oscar Salvador
   in his series

	"page_owner: print stacks and their outstanding allocations"
	"page_owner: Fixup and cleanup"

 - Uladzislau Rezki has contributed some vmalloc scalability
   improvements in his series "Mitigate a vmap lock contention". It
   realizes a 12x improvement for a certain microbenchmark.

 - Some kexec/crash cleanup work from Baoquan He in the series "Split
   crash out from kexec and clean up related config items".

 - Some zsmalloc maintenance work from Chengming Zhou in the series

	"mm/zsmalloc: fix and optimize objects/page migration"
	"mm/zsmalloc: some cleanup for get/set_zspage_mapping()"

 - Zi Yan has taught the MM to perform compaction on folios larger than
   order=0. This a step along the path to implementaton of the merging
   of large anonymous folios. The series is named "Enable >0 order folio
   memory compaction".

 - Christoph Hellwig has done quite a lot of cleanup work in the
   pagecache writeback code in his series "convert write_cache_pages()
   to an iterator".

 - Some modest hugetlb cleanups and speedups in Vishal Moola's series
   "Handle hugetlb faults under the VMA lock".

 - Zi Yan has changed the page splitting code so we can split huge pages
   into sizes other than order-0 to better utilize large folios. The
   series is named "Split a folio to any lower order folios".

 - David Hildenbrand has contributed the series "mm: remove
   total_mapcount()", a cleanup.

 - Matthew Wilcox has sought to improve the performance of bulk memory
   freeing in his series "Rearrange batched folio freeing".

 - Gang Li's series "hugetlb: parallelize hugetlb page init on boot"
   provides large improvements in bootup times on large machines which
   are configured to use large numbers of hugetlb pages.

 - Matthew Wilcox's series "PageFlags cleanups" does that.

 - Qi Zheng's series "minor fixes and supplement for ptdesc" does that
   also. S390 is affected.

 - Cleanups to our pagemap utility functions from Peter Xu in his series
   "mm/treewide: Replace pXd_large() with pXd_leaf()".

 - Nico Pache has fixed a few things with our hugepage selftests in his
   series "selftests/mm: Improve Hugepage Test Handling in MM
   Selftests".

 - Also, of course, many singleton patches to many things. Please see
   the individual changelogs for details.

* tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits)
  mm/zswap: remove the memcpy if acomp is not sleepable
  crypto: introduce: acomp_is_async to expose if comp drivers might sleep
  memtest: use {READ,WRITE}_ONCE in memory scanning
  mm: prohibit the last subpage from reusing the entire large folio
  mm: recover pud_leaf() definitions in nopmd case
  selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements
  selftests/mm: skip uffd hugetlb tests with insufficient hugepages
  selftests/mm: dont fail testsuite due to a lack of hugepages
  mm/huge_memory: skip invalid debugfs new_order input for folio split
  mm/huge_memory: check new folio order when split a folio
  mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure
  mm: add an explicit smp_wmb() to UFFDIO_CONTINUE
  mm: fix list corruption in put_pages_list
  mm: remove folio from deferred split list before uncharging it
  filemap: avoid unnecessary major faults in filemap_fault()
  mm,page_owner: drop unnecessary check
  mm,page_owner: check for null stack_record before bumping its refcount
  mm: swap: fix race between free_swap_and_cache() and swapoff()
  mm/treewide: align up pXd_leaf() retval across archs
  mm/treewide: drop pXd_large()
  ...
2024-03-14 17:43:30 -07:00
Linus Torvalds
fe46a7dd18 Merge tag 'sound-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
 "This was a relatively calm development cycle. Most of changes are
  rather small device-specific fixes and enhancements. The only
  significant changes in ALSA core are code refactoring with the recent
  cleanup infrastructure, which should bring no functionality changes.
  Some highlights below:

  Core:
   - Lots of cleanups in ALSA core code with automatic kfree cleanup and
     locking guard macros
   - New ALSA core kunit test

  ASoC:
   - SoundWire support for AMD ACP 6.3 systems
   - Support for reporting version information for AVS firmware
   - Support DSPless mode for Intel Soundwire systems
   - Support for configuring CS35L56 amplifiers using EFI calibration
     data
   - Log which component is being operated on as part of power
     management trace events.
   - Support for Microchip SAM9x7, NXP i.MX95 and Qualcomm WCD939x

  HD- and USB-audio:
   - More Cirrus HD-audio codec support
   - TAS2781 HD-audio codec fixes
   - Scarlett2 mixer fixes

  Others:
   - Enhancement of virtio driver for audio control supports
   - Cleanups of legacy PM code with new macros
   - Firewire sound updates"

* tag 'sound-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (307 commits)
  ALSA: usb-audio: Stop parsing channels bits when all channels are found.
  ALSA: hda/tas2781: remove unnecessary runtime_pm calls
  ALSA: hda/realtek - ALC236 fix volume mute & mic mute LED on some HP models
  ALSA: aaci: Delete unused variable in aaci_do_suspend
  ALSA: scarlett2: Fix Scarlett 4th Gen input gain range again
  ALSA: scarlett2: Fix Scarlett 4th Gen input gain range
  ALSA: scarlett2: Fix Scarlett 4th Gen autogain status values
  ALSA: scarlett2: Fix Scarlett 4th Gen 4i4 low-voltage detection
  ALSA: hda/tas2781: restore power state after system_resume
  ALSA: hda/tas2781: do not call pm_runtime_force_* in system_resume/suspend
  ALSA: hda/tas2781: do not reset cur_* values in runtime_suspend
  ALSA: hda/tas2781: add lock to system_suspend
  ALSA: hda/tas2781: use dev_dbg in system_resume
  ALSA: hda/realtek: fix ALC285 issues on HP Envy x360 laptops
  platform/x86: serial-multi-instantiate: Add support for CS35L54 and CS35L57
  ALSA: hda: cs35l56: Add support for CS35L54 and CS35L57
  ASoC: cs35l56: Add support for CS35L54 and CS35L57
  ASoC: Intel: catpt: Carefully use PCI bitwise constants
  ALSA: hda: hda_component: Include sound/hda_codec.h
  ALSA: hda: hda_component: Add missing #include guards
  ...
2024-03-14 11:10:43 -07:00
Linus Torvalds
66fd6d0bd7 Merge tag 'platform-drivers-x86-v6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver updates from Ilpo Järvinen:

 - New acer-wmi HW support

 - Support for new revision of amd/pmf heartbeat notify

 - Correctly handle asus-wmi HW without LEDs

 - fujitsu-laptop battery charge control support

 - Support for new hp-wmi thermal profiles

 - Support ideapad-laptop refresh rate key

 - Put intel/pmc AI accelerator (GNA) into D3 if it has no driver to
   allow entry into low-power modes, and temporarily removed Lunar Lake
   SSRAM support due to breaking FW changes causing probe fail (further
   breaking FW changes are still pending)

 - Report pmc/punit_atom devices that prevent reacing low power levels

 - Surface Fan speed function support

 - Support for more sperial keys and complete the list of models with
   non-standard fan registers in thinkpad_acpi

 - New DMI touchscreen HW support

 - Continued modernization efforts of wmi

 - Removal of obsoleted ledtrig-audio call and the related dependency

 - Debug & metrics interface improvements

 - Miscellaneous cleanups / fixes / improvements

* tag 'platform-drivers-x86-v6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (87 commits)
  platform/x86/intel/pmc: Improve PKGC residency counters debug
  platform/x86: asus-wmi: Consider device is absent when the read is ~0
  Documentation/x86/amd/hsmp: Updating urls
  platform/mellanox: mlxreg-hotplug: Remove redundant NULL-check
  platform/x86/amd/pmf: Update sps power thermals according to the platform-profiles
  platform/x86/amd/pmf: Add support to get sps default APTS index values
  platform/x86/amd/pmf: Add support to get APTS index numbers for static slider
  platform/x86/amd/pmf: Add support to notify sbios heart beat event
  platform/x86/amd/pmf: Add support to get sbios requests in PMF driver
  platform/x86/amd/pmf: Disable debugfs support for querying power thermals
  platform/x86/amd/pmf: Differentiate PMF ACPI versions
  x86/platform/atom: Check state of Punit managed devices on s2idle
  platform/x86: pmc_atom: Check state of PMC clocks on s2idle
  platform/x86: pmc_atom: Check state of PMC managed devices on s2idle
  platform/x86: pmc_atom: Annotate d3_sts register bit defines
  clk: x86: Move clk-pmc-atom register defines to include/linux/platform_data/x86/pmc_atom.h
  platform/x86: make fw_attr_class constant
  platform/x86/intel/tpmi: Change vsec offset to u64
  platform/x86: intel_scu_pcidrv: Remove unused intel-mid.h
  platform/x86: intel_scu_wdt: Remove unused intel-mid.h
  ...
2024-03-14 10:44:09 -07:00
Linus Torvalds
07abb19a9b Merge tag 'pm-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
 "From the functional perspective, the most significant change here is
  the addition of support for Energy Models that can be updated
  dynamically at run time.

  There is also the addition of LZ4 compression support for hibernation,
  the new preferred core support in amd-pstate, new platforms support in
  the Intel RAPL driver, new model-specific EPP handling in intel_pstate
  and more.

  Apart from that, the cpufreq default transition delay is reduced from
  10 ms to 2 ms (along with some related adjustments), the system
  suspend statistics code undergoes a significant rework and there is a
  usual bunch of fixes and code cleanups all over.

  Specifics:

   - Allow the Energy Model to be updated dynamically (Lukasz Luba)

   - Add support for LZ4 compression algorithm to the hibernation image
     creation and loading code (Nikhil V)

   - Fix and clean up system suspend statistics collection (Rafael
     Wysocki)

   - Simplify device suspend and resume handling in the power management
     core code (Rafael Wysocki)

   - Fix PCI hibernation support description (Yiwei Lin)

   - Make hibernation take set_memory_ro() return values into account as
     appropriate (Christophe Leroy)

   - Set mem_sleep_current during kernel command line setup to avoid an
     ordering issue with handling it (Maulik Shah)

   - Fix wake IRQs handling when pm_runtime_force_suspend() is used as a
     driver's system suspend callback (Qingliang Li)

   - Simplify pm_runtime_get_if_active() usage and add a replacement for
     pm_runtime_put_autosuspend() (Sakari Ailus)

   - Add a tracepoint for runtime_status changes tracking (Vilas Bhat)

   - Fix section title markdown in the runtime PM documentation (Yiwei
     Lin)

   - Enable preferred core support in the amd-pstate cpufreq driver
     (Meng Li)

   - Fix min_perf assignment in amd_pstate_adjust_perf() and make the
     min/max limit perf values in amd-pstate always stay within the
     (highest perf, lowest perf) range (Tor Vic, Meng Li)

   - Allow intel_pstate to assign model-specific values to strings used
     in the EPP sysfs interface and make it do so on Meteor Lake
     (Srinivas Pandruvada)

   - Drop long-unused cpudata::prev_cummulative_iowait from the
     intel_pstate cpufreq driver (Jiri Slaby)

   - Prevent scaling_cur_freq from exceeding scaling_max_freq when the
     latter is an inefficient frequency (Shivnandan Kumar)

   - Change default transition delay in cpufreq to 2ms (Qais Yousef)

   - Remove references to 10ms minimum sampling rate from comments in
     the cpufreq code (Pierre Gondois)

   - Honour transition_latency over transition_delay_us in cpufreq (Qais
     Yousef)

   - Stop unregistering cpufreq cooling on CPU hot-remove (Viresh Kumar)

   - General enhancements / cleanups to ARM cpufreq drivers (tianyu2,
     Nícolas F. R. A. Prado, Erick Archer, Arnd Bergmann, Anastasia
     Belova)

   - Update cpufreq-dt-platdev to block/approve devices (Richard Acayan)

   - Make the SCMI cpufreq driver get a transition delay value from
     firmware (Pierre Gondois)

   - Prevent the haltpoll cpuidle governor from shrinking guest
     poll_limit_ns below grow_start (Parshuram Sangle)

   - Avoid potential overflow in integer multiplication when computing
     cpuidle state parameters (C Cheng)

   - Adjust MWAIT hint target C-state computation in the ACPI cpuidle
     driver and in intel_idle to return a correct value for C0 (He
     Rongguang)

   - Address multiple issues in the TPMI RAPL driver and add support for
     new platforms (Lunar Lake-M, Arrow Lake) to Intel RAPL (Zhang Rui)

   - Fix freq_qos_add_request() return value check in dtpm_cpu (Daniel
     Lezcano)

   - Fix kernel-doc for dtpm_create_hierarchy() (Yang Li)

   - Fix file leak in get_pkg_num() in x86_energy_perf_policy (Samasth
     Norway Ananda)

   - Fix cpupower-frequency-info.1 man page typo (Jan Kratochvil)

   - Fix a couple of warnings in the OPP core code related to W=1 builds
     (Viresh Kumar)

   - Move dev_pm_opp_{init|free}_cpufreq_table() to pm_opp.h (Viresh
     Kumar)

   - Extend dev_pm_opp_data with turbo support (Sibi Sankar)

   - dt-bindings: drop maxItems from inner items (David Heidelberg)"

* tag 'pm-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (95 commits)
  dt-bindings: opp: drop maxItems from inner items
  OPP: debugfs: Fix warning around icc_get_name()
  OPP: debugfs: Fix warning with W=1 builds
  cpufreq: Move dev_pm_opp_{init|free}_cpufreq_table() to pm_opp.h
  OPP: Extend dev_pm_opp_data with turbo support
  Fix cpupower-frequency-info.1 man page typo
  cpufreq: scmi: Set transition_delay_us
  firmware: arm_scmi: Populate fast channel rate_limit
  firmware: arm_scmi: Populate perf commands rate_limit
  cpuidle: ACPI/intel: fix MWAIT hint target C-state computation
  PM: sleep: wakeirq: fix wake irq warning in system suspend
  powercap: dtpm: Fix kernel-doc for dtpm_create_hierarchy() function
  cpufreq: Don't unregister cpufreq cooling on CPU hotplug
  PM: suspend: Set mem_sleep_current during kernel command line setup
  cpufreq: Honour transition_latency over transition_delay_us
  cpufreq: Limit resolving a frequency to policy min/max
  Documentation: PM: Fix runtime_pm.rst markdown syntax
  cpufreq: amd-pstate: adjust min/max limit perf
  cpufreq: Remove references to 10ms min sampling rate
  cpufreq: intel_pstate: Update default EPPs for Meteor Lake
  ...
2024-03-13 11:40:06 -07:00
Linus Torvalds
9187210eee Merge tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
 "Core & protocols:

   - Large effort by Eric to lower rtnl_lock pressure and remove locks:

      - Make commonly used parts of rtnetlink (address, route dumps
        etc) lockless, protected by RCU instead of rtnl_lock.

      - Add a netns exit callback which already holds rtnl_lock,
        allowing netns exit to take rtnl_lock once in the core instead
        of once for each driver / callback.

      - Remove locks / serialization in the socket diag interface.

      - Remove 6 calls to synchronize_rcu() while holding rtnl_lock.

      - Remove the dev_base_lock, depend on RCU where necessary.

   - Support busy polling on a per-epoll context basis. Poll length and
     budget parameters can be set independently of system defaults.

   - Introduce struct net_hotdata, to make sure read-mostly global
     config variables fit in as few cache lines as possible.

   - Add optional per-nexthop statistics to ease monitoring / debug of
     ECMP imbalance problems.

   - Support TCP_NOTSENT_LOWAT in MPTCP.

   - Ensure that IPv6 temporary addresses' preferred lifetimes are long
     enough, compared to other configured lifetimes, and at least 2 sec.

   - Support forwarding of ICMP Error messages in IPSec, per RFC 4301.

   - Add support for the independent control state machine for bonding
     per IEEE 802.1AX-2008 5.4.15 in addition to the existing coupled
     control state machine.

   - Add "network ID" to MCTP socket APIs to support hosts with multiple
     disjoint MCTP networks.

   - Re-use the mono_delivery_time skbuff bit for packets which user
     space wants to be sent at a specified time. Maintain the timing
     information while traversing veth links, bridge etc.

   - Take advantage of MSG_SPLICE_PAGES for RxRPC DATA and ACK packets.

   - Simplify many places iterating over netdevs by using an xarray
     instead of a hash table walk (hash table remains in place, for use
     on fastpaths).

   - Speed up scanning for expired routes by keeping a dedicated list.

   - Speed up "generic" XDP by trying harder to avoid large allocations.

   - Support attaching arbitrary metadata to netconsole messages.

  Things we sprinkled into general kernel code:

   - Enforce VM_IOREMAP flag and range in ioremap_page_range and
     introduce VM_SPARSE kind and vm_area_[un]map_pages (used by
     bpf_arena).

   - Rework selftest harness to enable the use of the full range of ksft
     exit code (pass, fail, skip, xfail, xpass).

  Netfilter:

   - Allow userspace to define a table that is exclusively owned by a
     daemon (via netlink socket aliveness) without auto-removing this
     table when the userspace program exits. Such table gets marked as
     orphaned and a restarting management daemon can re-attach/regain
     ownership.

   - Speed up element insertions to nftables' concatenated-ranges set
     type. Compact a few related data structures.

  BPF:

   - Add BPF token support for delegating a subset of BPF subsystem
     functionality from privileged system-wide daemons such as systemd
     through special mount options for userns-bound BPF fs to a trusted
     & unprivileged application.

   - Introduce bpf_arena which is sparse shared memory region between
     BPF program and user space where structures inside the arena can
     have pointers to other areas of the arena, and pointers work
     seamlessly for both user-space programs and BPF programs.

   - Introduce may_goto instruction that is a contract between the
     verifier and the program. The verifier allows the program to loop
     assuming it's behaving well, but reserves the right to terminate
     it.

   - Extend the BPF verifier to enable static subprog calls in spin lock
     critical sections.

   - Support registration of struct_ops types from modules which helps
     projects like fuse-bpf that seeks to implement a new struct_ops
     type.

   - Add support for retrieval of cookies for perf/kprobe multi links.

   - Support arbitrary TCP SYN cookie generation / validation in the TC
     layer with BPF to allow creating SYN flood handling in BPF
     firewalls.

   - Add code generation to inline the bpf_kptr_xchg() helper which
     improves performance when stashing/popping the allocated BPF
     objects.

  Wireless:

   - Add SPP (signaling and payload protected) AMSDU support.

   - Support wider bandwidth OFDMA, as required for EHT operation.

  Driver API:

   - Major overhaul of the Energy Efficient Ethernet internals to
     support new link modes (2.5GE, 5GE), share more code between
     drivers (especially those using phylib), and encourage more
     uniform behavior. Convert and clean up drivers.

   - Define an API for querying per netdev queue statistics from
     drivers.

   - IPSec: account in global stats for fully offloaded sessions.

   - Create a concept of Ethernet PHY Packages at the Device Tree level,
     to allow parameterizing the existing PHY package code.

   - Enable Rx hashing (RSS) on GTP protocol fields.

  Misc:

   - Improvements and refactoring all over networking selftests.

   - Create uniform module aliases for TC classifiers, actions, and
     packet schedulers to simplify creating modprobe policies.

   - Address all missing MODULE_DESCRIPTION() warnings in networking.

   - Extend the Netlink descriptions in YAML to cover message
     encapsulation or "Netlink polymorphism", where interpretation of
     nested attributes depends on link type, classifier type or some
     other "class type".

  Drivers:

   - Ethernet high-speed NICs:
      - Add a new driver for Marvell's Octeon PCI Endpoint NIC VF.
      - Intel (100G, ice, idpf):
         - support E825-C devices
      - nVidia/Mellanox:
         - support devices with one port and multiple PCIe links
      - Broadcom (bnxt):
         - support n-tuple filters
         - support configuring the RSS key
      - Wangxun (ngbe/txgbe):
         - implement irq_domain for TXGBE's sub-interrupts
      - Pensando/AMD:
         - support XDP
         - optimize queue submission and wakeup handling (+17% bps)
         - optimize struct layout, saving 28% of memory on queues

   - Ethernet NICs embedded and virtual:
      - Google cloud vNIC:
         - refactor driver to perform memory allocations for new queue
           config before stopping and freeing the old queue memory
      - Synopsys (stmmac):
         - obey queueMaxSDU and implement counters required by 802.1Qbv
      - Renesas (ravb):
         - support packet checksum offload
         - suspend to RAM and runtime PM support

   - Ethernet switches:
      - nVidia/Mellanox:
         - support for nexthop group statistics
      - Microchip:
         - ksz8: implement PHY loopback
         - add support for KSZ8567, a 7-port 10/100Mbps switch

   - PTP:
      - New driver for RENESAS FemtoClock3 Wireless clock generator.
      - Support OCP PTP cards designed and built by Adva.

   - CAN:
      - Support recvmsg() flags for own, local and remote traffic on CAN
        BCM sockets.
      - Support for esd GmbH PCIe/402 CAN device family.
      - m_can:
         - Rx/Tx submission coalescing
         - wake on frame Rx

   - WiFi:
      - Intel (iwlwifi):
         - enable signaling and payload protected A-MSDUs
         - support wider-bandwidth OFDMA
         - support for new devices
         - bump FW API to 89 for AX devices; 90 for BZ/SC devices
      - MediaTek (mt76):
         - mt7915: newer ADIE version support
         - mt7925: radio temperature sensor support
      - Qualcomm (ath11k):
         - support 6 GHz station power modes: Low Power Indoor (LPI),
           Standard Power) SP and Very Low Power (VLP)
         - QCA6390 & WCN6855: support 2 concurrent station interfaces
         - QCA2066 support
      - Qualcomm (ath12k):
         - refactoring in preparation for Multi-Link Operation (MLO)
           support
         - 1024 Block Ack window size support
         - firmware-2.bin support
         - support having multiple identical PCI devices (firmware needs
           to have ATH12K_FW_FEATURE_MULTI_QRTR_ID)
         - QCN9274: support split-PHY devices
         - WCN7850: enable Power Save Mode in station mode
         - WCN7850: P2P support
      - RealTek:
         - rtw88: support for more rtw8811cu and rtw8821cu devices
         - rtw89: support SCAN_RANDOM_SN and SET_SCAN_DWELL
         - rtlwifi: speed up USB firmware initialization
         - rtwl8xxxu:
             - RTL8188F: concurrent interface support
             - Channel Switch Announcement (CSA) support in AP mode
      - Broadcom (brcmfmac):
         - per-vendor feature support
         - per-vendor SAE password setup
         - DMI nvram filename quirk for ACEPC W5 Pro"

* tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2255 commits)
  nexthop: Fix splat with CONFIG_DEBUG_PREEMPT=y
  nexthop: Fix out-of-bounds access during attribute validation
  nexthop: Only parse NHA_OP_FLAGS for dump messages that require it
  nexthop: Only parse NHA_OP_FLAGS for get messages that require it
  bpf: move sleepable flag from bpf_prog_aux to bpf_prog
  bpf: hardcode BPF_PROG_PACK_SIZE to 2MB * num_possible_nodes()
  selftests/bpf: Add kprobe multi triggering benchmarks
  ptp: Move from simple ida to xarray
  vxlan: Remove generic .ndo_get_stats64
  vxlan: Do not alloc tstats manually
  devlink: Add comments to use netlink gen tool
  nfp: flower: handle acti_netdevs allocation failure
  net/packet: Add getsockopt support for PACKET_COPY_THRESH
  net/netlink: Add getsockopt support for NETLINK_LISTEN_ALL_NSID
  selftests/bpf: Add bpf_arena_htab test.
  selftests/bpf: Add bpf_arena_list test.
  selftests/bpf: Add unit tests for bpf_arena_alloc/free_pages
  bpf: Add helper macro bpf_addr_space_cast()
  libbpf: Recognize __arena global variables.
  bpftool: Recognize arena map type
  ...
2024-03-12 17:44:08 -07:00
Linus Torvalds
a01c9fe323 Merge tag 'nfsd-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
 "The bulk of the patches for this release are optimizations, code
  clean-ups, and minor bug fixes.

  One new feature to mention is that NFSD administrators now have the
  ability to revoke NFSv4 open and lock state. NFSD's NFSv3 support has
  had this capability for some time.

  As always I am grateful to NFSD contributors, reviewers, and testers"

* tag 'nfsd-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (75 commits)
  NFSD: Clean up nfsd4_encode_replay()
  NFSD: send OP_CB_RECALL_ANY to clients when number of delegations reaches its limit
  NFSD: Document nfsd_setattr() fill-attributes behavior
  nfsd: Fix NFSv3 atomicity bugs in nfsd_setattr()
  nfsd: Fix a regression in nfsd_setattr()
  NFSD: OP_CB_RECALL_ANY should recall both read and write delegations
  NFSD: handle GETATTR conflict with write delegation
  NFSD: add support for CB_GETATTR callback
  NFSD: Document the phases of CREATE_SESSION
  NFSD: Fix the NFSv4.1 CREATE_SESSION operation
  nfsd: clean up comments over nfs4_client definition
  svcrdma: Add Write chunk WRs to the RPC's Send WR chain
  svcrdma: Post WRs for Write chunks in svc_rdma_sendto()
  svcrdma: Post the Reply chunk and Send WR together
  svcrdma: Move write_info for Reply chunks into struct svc_rdma_send_ctxt
  svcrdma: Post Send WR chain
  svcrdma: Fix retry loop in svc_rdma_send()
  svcrdma: Prevent a UAF in svc_rdma_send()
  svcrdma: Fix SQ wake-ups
  svcrdma: Increase the per-transport rw_ctx count
  ...
2024-03-12 14:27:37 -07:00
Linus Torvalds
d08c407f71 Merge tag 'timers-core-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "A large set of updates and features for timers and timekeeping:

   - The hierarchical timer pull model

     When timer wheel timers are armed they are placed into the timer
     wheel of a CPU which is likely to be busy at the time of expiry.
     This is done to avoid wakeups on potentially idle CPUs.

     This is wrong in several aspects:

       1) The heuristics to select the target CPU are wrong by
          definition as the chance to get the prediction right is
          close to zero.

       2) Due to #1 it is possible that timers are accumulated on
          a single target CPU

       3) The required computation in the enqueue path is just overhead
          for dubious value especially under the consideration that the
          vast majority of timer wheel timers are either canceled or
          rearmed before they expire.

     The timer pull model avoids the above by removing the target
     computation on enqueue and queueing timers always on the CPU on
     which they get armed.

     This is achieved by having separate wheels for CPU pinned timers
     and global timers which do not care about where they expire.

     As long as a CPU is busy it handles both the pinned and the global
     timers which are queued on the CPU local timer wheels.

     When a CPU goes idle it evaluates its own timer wheels:

       - If the first expiring timer is a pinned timer, then the global
         timers can be ignored as the CPU will wake up before they
         expire.

       - If the first expiring timer is a global timer, then the expiry
         time is propagated into the timer pull hierarchy and the CPU
         makes sure to wake up for the first pinned timer.

     The timer pull hierarchy organizes CPUs in groups of eight at the
     lowest level and at the next levels groups of eight groups up to
     the point where no further aggregation of groups is required, i.e.
     the number of levels is log8(NR_CPUS). The magic number of eight
     has been established by experimention, but can be adjusted if
     needed.

     In each group one busy CPU acts as the migrator. It's only one CPU
     to avoid lock contention on remote timer wheels.

     The migrator CPU checks in its own timer wheel handling whether
     there are other CPUs in the group which have gone idle and have
     global timers to expire. If there are global timers to expire, the
     migrator locks the remote CPU timer wheel and handles the expiry.

     Depending on the group level in the hierarchy this handling can
     require to walk the hierarchy downwards to the CPU level.

     Special care is taken when the last CPU goes idle. At this point
     the CPU is the systemwide migrator at the top of the hierarchy and
     it therefore cannot delegate to the hierarchy. It needs to arm its
     own timer device to expire either at the first expiring timer in
     the hierarchy or at the first CPU local timer, which ever expires
     first.

     This completely removes the overhead from the enqueue path, which
     is e.g. for networking a true hotpath and trades it for a slightly
     more complex idle path.

     This has been in development for a couple of years and the final
     series has been extensively tested by various teams from silicon
     vendors and ran through extensive CI.

     There have been slight performance improvements observed on network
     centric workloads and an Intel team confirmed that this allows them
     to power down a die completely on a mult-die socket for the first
     time in a mostly idle scenario.

     There is only one outstanding ~1.5% regression on a specific
     overloaded netperf test which is currently investigated, but the
     rest is either positive or neutral performance wise and positive on
     the power management side.

   - Fixes for the timekeeping interpolation code for cross-timestamps:

     cross-timestamps are used for PTP to get snapshots from hardware
     timers and interpolated them back to clock MONOTONIC. The changes
     address a few corner cases in the interpolation code which got the
     math and logic wrong.

   - Simplifcation of the clocksource watchdog retry logic to
     automatically adjust to handle larger systems correctly instead of
     having more incomprehensible command line parameters.

   - Treewide consolidation of the VDSO data structures.

   - The usual small improvements and cleanups all over the place"

* tag 'timers-core-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits)
  timer/migration: Fix quick check reporting late expiry
  tick/sched: Fix build failure for CONFIG_NO_HZ_COMMON=n
  vdso/datapage: Quick fix - use asm/page-def.h for ARM64
  timers: Assert no next dyntick timer look-up while CPU is offline
  tick: Assume timekeeping is correctly handed over upon last offline idle call
  tick: Shut down low-res tick from dying CPU
  tick: Split nohz and highres features from nohz_mode
  tick: Move individual bit features to debuggable mask accesses
  tick: Move got_idle_tick away from common flags
  tick: Assume the tick can't be stopped in NOHZ_MODE_INACTIVE mode
  tick: Move broadcast cancellation up to CPUHP_AP_TICK_DYING
  tick: Move tick cancellation up to CPUHP_AP_TICK_DYING
  tick: Start centralizing tick related CPU hotplug operations
  tick/sched: Don't clear ts::next_tick again in can_stop_idle_tick()
  tick/sched: Rename tick_nohz_stop_sched_tick() to tick_nohz_full_stop_tick()
  tick: Use IS_ENABLED() whenever possible
  tick/sched: Remove useless oneshot ifdeffery
  tick/nohz: Remove duplicate between lowres and highres handlers
  tick/nohz: Remove duplicate between tick_nohz_switch_to_nohz() and tick_setup_sched_timer()
  hrtimer: Select housekeeping CPU during migration
  ...
2024-03-11 14:38:26 -07:00
Linus Torvalds
d2c84bdce2 Merge tag 'for-6.9/io_uring-20240310' of git://git.kernel.dk/linux
Pull io_uring updates from Jens Axboe:

 - Make running of task_work internal loops more fair, and unify how the
   different methods deal with them (me)

 - Support for per-ring NAPI. The two minor networking patches are in a
   shared branch with netdev (Stefan)

 - Add support for truncate (Tony)

 - Export SQPOLL utilization stats (Xiaobing)

 - Multishot fixes (Pavel)

 - Fix for a race in manipulating the request flags via poll (Pavel)

 - Cleanup the multishot checking by making it generic, moving it out of
   opcode handlers (Pavel)

 - Various tweaks and cleanups (me, Kunwu, Alexander)

* tag 'for-6.9/io_uring-20240310' of git://git.kernel.dk/linux: (53 commits)
  io_uring: Fix sqpoll utilization check racing with dying sqpoll
  io_uring/net: dedup io_recv_finish req completion
  io_uring: refactor DEFER_TASKRUN multishot checks
  io_uring: fix mshot io-wq checks
  io_uring/net: add io_req_msg_cleanup() helper
  io_uring/net: simplify msghd->msg_inq checking
  io_uring/kbuf: rename REQ_F_PARTIAL_IO to REQ_F_BL_NO_RECYCLE
  io_uring/net: remove dependency on REQ_F_PARTIAL_IO for sr->done_io
  io_uring/net: correctly handle multishot recvmsg retry setup
  io_uring/net: clear REQ_F_BL_EMPTY in the multishot retry handler
  io_uring: fix io_queue_proc modifying req->flags
  io_uring: fix mshot read defer taskrun cqe posting
  io_uring/net: fix overflow check in io_recvmsg_mshot_prep()
  io_uring/net: correct the type of variable
  io_uring/sqpoll: statistics of the true utilization of sq threads
  io_uring/net: move recv/recvmsg flags out of retry loop
  io_uring/kbuf: flag request if buffer pool is empty after buffer pick
  io_uring/net: improve the usercopy for sendmsg/recvmsg
  io_uring/net: move receive multishot out of the generic msghdr path
  io_uring/net: unify how recvmsg and sendmsg copy in the msghdr
  ...
2024-03-11 11:35:31 -07:00