Commit Graph

17188 Commits

Author SHA1 Message Date
Zelin Deng
81bfd6268f x86/kvmclock: Move this_cpu_pvti into kvmclock.h
commit ad9af93068 upstream.

There're other modules might use hv_clock_per_cpu variable like ptp_kvm,
so move it into kvmclock.h and export the symbol to make it visiable to
other modules.

Signed-off-by: Zelin Deng <zelin.deng@linux.alibaba.com>
Cc: <stable@vger.kernel.org>
Message-Id: <1632892429-101194-2-git-send-email-zelin.deng@linux.alibaba.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-07 07:53:06 +02:00
Juergen Gross
fe5eaf1cdf x86/setup: Call early_reserve_memory() earlier
commit 8aa83e6395 upstream.

Commit in Fixes introduced early_reserve_memory() to do all needed
initial memblock_reserve() calls in one function. Unfortunately, the call
of early_reserve_memory() is done too late for Xen dom0, as in some
cases a Xen hook called by e820__memory_setup() will need those memory
reservations to have happened already.

Move the call of early_reserve_memory() before the call of
e820__memory_setup() in order to avoid such problems.

Fixes: a799c2bd29 ("x86/setup: Consolidate early memory reservations")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210920120421.29276-1-jgross@suse.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-30 10:13:07 +02:00
Thomas Gleixner
0a1b8623d1 drivers: base: cacheinfo: Get rid of DEFINE_SMP_CALL_CACHE_FUNCTION()
[ Upstream commit 4b92d4add5 ]

DEFINE_SMP_CALL_CACHE_FUNCTION() was usefel before the CPU hotplug rework
to ensure that the cache related functions are called on the upcoming CPU
because the notifier itself could run on any online CPU.

The hotplug state machine guarantees that the callbacks are invoked on the
upcoming CPU. So there is no need to have this SMP function call
obfuscation. That indirection was missed when the hotplug notifiers were
converted.

This also solves the problem of ARM64 init_cache_level() invoking ACPI
functions which take a semaphore in that context. That's invalid as SMP
function calls run with interrupts disabled. Running it just from the
callback in context of the CPU hotplug thread solves this.

Fixes: 8571890e15 ("arm64: Add support for ACPI based firmware tables")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/871r69ersb.ffs@tglx
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-26 14:10:19 +02:00
Tony Luck
54089df947 x86/mce: Avoid infinite loop for copy from user recovery
commit 81065b35e2 upstream.

There are two cases for machine check recovery:

1) The machine check was triggered by ring3 (application) code.
   This is the simpler case. The machine check handler simply queues
   work to be executed on return to user. That code unmaps the page
   from all users and arranges to send a SIGBUS to the task that
   triggered the poison.

2) The machine check was triggered in kernel code that is covered by
   an exception table entry. In this case the machine check handler
   still queues a work entry to unmap the page, etc. but this will
   not be called right away because the #MC handler returns to the
   fix up code address in the exception table entry.

Problems occur if the kernel triggers another machine check before the
return to user processes the first queued work item.

Specifically, the work is queued using the ->mce_kill_me callback
structure in the task struct for the current thread. Attempting to queue
a second work item using this same callback results in a loop in the
linked list of work functions to call. So when the kernel does return to
user, it enters an infinite loop processing the same entry for ever.

There are some legitimate scenarios where the kernel may take a second
machine check before returning to the user.

1) Some code (e.g. futex) first tries a get_user() with page faults
   disabled. If this fails, the code retries with page faults enabled
   expecting that this will resolve the page fault.

2) Copy from user code retries a copy in byte-at-time mode to check
   whether any additional bytes can be copied.

On the other side of the fence are some bad drivers that do not check
the return value from individual get_user() calls and may access
multiple user addresses without noticing that some/all calls have
failed.

Fix by adding a counter (current->mce_count) to keep track of repeated
machine checks before task_work() is called. First machine check saves
the address information and calls task_work_add(). Subsequent machine
checks before that task_work call back is executed check that the address
is in the same page as the first machine check (since the callback will
offline exactly one page).

Expected worst case is four machine checks before moving on (e.g. one
user access with page faults disabled, then a repeat to the same address
with page faults enabled ... repeat in copy tail bytes). Just in case
there is some code that loops forever enforce a limit of 10.

 [ bp: Massage commit message, drop noinstr, fix typo, extend panic
   messages. ]

Fixes: 5567d11c21 ("x86/mce: Send #MC singal from task work")
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/YT/IJ9ziLqmtqEPu@agluck-desk2.amr.corp.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-22 12:39:18 +02:00
Ani Sinha
108b1867e3 x86/hyperv: fix for unwanted manipulation of sched_clock when TSC marked unstable
[ Upstream commit c445535c3e ]

Marking TSC as unstable has a side effect of marking sched_clock as
unstable when TSC is still being used as the sched_clock. This is not
desirable. Hyper-V ultimately uses a paravirtualized clock source that
provides a stable scheduler clock even on systems without TscInvariant
CPU capability. Hence, mark_tsc_unstable() call should be called _after_
scheduler clock has been changed to the paravirtualized clocksource. This
will prevent any unwanted manipulation of the sched_clock. Only TSC will
be correctly marked as unstable.

Signed-off-by: Ani Sinha <ani@anisinha.ca>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20210713030522.1714803-1-ani@anisinha.ca
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-18 13:43:36 +02:00
Borislav Petkov
97ffb7a107 x86/mce: Defer processing of early errors
[ Upstream commit 3bff147b18 ]

When a fatal machine check results in a system reset, Linux does not
clear the error(s) from machine check bank(s) - hardware preserves the
machine check banks across a warm reset.

During initialization of the kernel after the reboot, Linux reads, logs,
and clears all machine check banks.

But there is a problem. In:

  5de97c9f6d ("x86/mce: Factor out and deprecate the /dev/mcelog driver")

the call to mce_register_decode_chain() moved later in the boot
sequence. This means that /dev/mcelog doesn't see those early error
logs.

This was partially fixed by:

  cd9c57cad3 ("x86/MCE: Dump MCE to dmesg if no consumers")

which made sure that the logs were not lost completely by printing
to the console. But parsing console logs is error prone. Users of
/dev/mcelog should expect to find any early errors logged to standard
places.

Add a new flag MCP_QUEUE_LOG to machine_check_poll() to be used in early
machine check initialization to indicate that any errors found should
just be queued to genpool. When mcheck_late_init() is called it will
call mce_schedule_work() to actually log and flush any errors queued in
the genpool.

 [ Based on an original patch, commit message by and completely
   productized by Tony Luck. ]

Fixes: 5de97c9f6d ("x86/mce: Factor out and deprecate the /dev/mcelog driver")
Reported-by: Sumanth Kamatala <skamatala@juniper.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210824003129.GA1642753@agluck-desk2.amr.corp.intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-15 10:02:09 +02:00
Paul Gortmaker
de33291cf4 x86/reboot: Limit Dell Optiplex 990 quirk to early BIOS versions
commit a729691b54 upstream.

When this platform was relatively new in November 2011, with early BIOS
revisions, a reboot quirk was added in commit 6be30bb7d7 ("x86/reboot:
Blacklist Dell OptiPlex 990 known to require PCI reboot")

However, this quirk (and several others) are open-ended to all BIOS
versions and left no automatic expiry if/when the system BIOS fixed the
issue, meaning that nobody is likely to come along and re-test.

What is really problematic with using PCI reboot as this quirk does, is
that it causes this platform to do a full power down, wait one second,
and then power back on.  This is less than ideal if one is using it for
boot testing and/or bisecting kernels when legacy rotating hard disks
are installed.

It was only by chance that the quirk was noticed in dmesg - and when
disabled it turned out that it wasn't required anymore (BIOS A24), and a
default reboot would work fine without the "harshness" of power cycling the
machine (and disks) down and up like the PCI reboot does.

Doing a bit more research, it seems that the "newest" BIOS for which the
issue was reported[1] was version A06, however Dell[2] seemed to suggest
only up to and including version A05, with the A06 having a large number of
fixes[3] listed.

As is typical with a new platform, the initial BIOS updates come frequently
and then taper off (and in this case, with a revival for CPU CVEs); a
search for O990-A<ver>.exe reveals the following dates:

        A02     16 Mar 2011
        A03     11 May 2011
        A06     14 Sep 2011
        A07     24 Oct 2011
        A10     08 Dec 2011
        A14     06 Sep 2012
        A16     15 Oct 2012
        A18     30 Sep 2013
        A19     23 Sep 2015
        A20     02 Jun 2017
        A23     07 Mar 2018
        A24     21 Aug 2018

While it's overkill to flash and test each of the above, it would seem
likely that the issue was contained within A0x BIOS versions, given the
dates above and the dates of issue reports[4] from distros.  So rather than
just throw out the quirk entirely, limit the scope to just those early BIOS
versions, in case people are still running systems from 2011 with the
original as-shipped early A0x BIOS versions.

[1] https://lore.kernel.org/lkml/1320373471-3942-1-git-send-email-trenn@suse.de/
[2] https://www.dell.com/support/kbdoc/en-ca/000131908/linux-based-operating-systems-stall-upon-reboot-on-optiplex-390-790-990-systems
[3] https://www.dell.com/support/home/en-ca/drivers/driversdetails?driverid=85j10
[4] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/768039

Fixes: 6be30bb7d7 ("x86/reboot: Blacklist Dell OptiPlex 990 known to require PCI reboot")
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210530162447.996461-4-paul.gortmaker@windriver.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-12 09:01:00 +02:00
Babu Moger
527f721478 x86/resctrl: Fix a maybe-uninitialized build warning treated as error
The recent commit

  064855a690 ("x86/resctrl: Fix default monitoring groups reporting")

caused a RHEL build failure with an uninitialized variable warning
treated as an error because it removed the default case snippet.

The RHEL Makefile uses '-Werror=maybe-uninitialized' to force possibly
uninitialized variable warnings to be treated as errors. This is also
reported by smatch via the 0day robot.

The error from the RHEL build is:

  arch/x86/kernel/cpu/resctrl/monitor.c: In function ‘__mon_event_count’:
  arch/x86/kernel/cpu/resctrl/monitor.c:261:12: error: ‘m’ may be used
  uninitialized in this function [-Werror=maybe-uninitialized]
    m->chunks += chunks;
              ^~

The upstream Makefile does not build using '-Werror=maybe-uninitialized'.
So, the problem is not seen there. Fix the problem by putting back the
default case snippet.

 [ bp: note that there's nothing wrong with the code and other compilers
   do not trigger this warning - this is being done just so the RHEL compiler
   is happy. ]

Fixes: 064855a690 ("x86/resctrl: Fix default monitoring groups reporting")
Reported-by: Terry Bowman <Terry.Bowman@amd.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/162949631908.23903.17090272726012848523.stgit@bmoger-ubuntu
2021-08-22 09:11:29 +02:00
Linus Torvalds
c4f14eac22 Merge tag 'irq-urgent-2021-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
 "A set of fixes for PCI/MSI and x86 interrupt startup:

   - Mask all MSI-X entries when enabling MSI-X otherwise stale unmasked
     entries stay around e.g. when a crashkernel is booted.

   - Enforce masking of a MSI-X table entry when updating it, which
     mandatory according to speification

   - Ensure that writes to MSI[-X} tables are flushed.

   - Prevent invalid bits being set in the MSI mask register

   - Properly serialize modifications to the mask cache and the mask
     register for multi-MSI.

   - Cure the violation of the affinity setting rules on X86 during
     interrupt startup which can cause lost and stale interrupts. Move
     the initial affinity setting ahead of actualy enabling the
     interrupt.

   - Ensure that MSI interrupts are completely torn down before freeing
     them in the error handling case.

   - Prevent an array out of bounds access in the irq timings code"

* tag 'irq-urgent-2021-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  driver core: Add missing kernel doc for device::msi_lock
  genirq/msi: Ensure deactivation on teardown
  genirq/timings: Prevent potential array overflow in __irq_timings_store()
  x86/msi: Force affinity setup before startup
  x86/ioapic: Force affinity setup before startup
  genirq: Provide IRQCHIP_AFFINITY_PRE_STARTUP
  PCI/MSI: Protect msi_desc::masked for multi-MSI
  PCI/MSI: Use msi_mask_irq() in pci_msi_shutdown()
  PCI/MSI: Correct misleading comments
  PCI/MSI: Do not set invalid bits in MSI mask
  PCI/MSI: Enforce MSI[X] entry updates to be visible
  PCI/MSI: Enforce that MSI-X table entry is masked for update
  PCI/MSI: Mask all unused MSI-X entries
  PCI/MSI: Enable and mask MSI-X early
2021-08-15 06:49:40 -10:00
Babu Moger
064855a690 x86/resctrl: Fix default monitoring groups reporting
Creating a new sub monitoring group in the root /sys/fs/resctrl leads to
getting the "Unavailable" value for mbm_total_bytes and mbm_local_bytes
on the entire filesystem.

Steps to reproduce:

  1. mount -t resctrl resctrl /sys/fs/resctrl/

  2. cd /sys/fs/resctrl/

  3. cat mon_data/mon_L3_00/mbm_total_bytes
     23189832

  4. Create sub monitor group:
  mkdir mon_groups/test1

  5. cat mon_data/mon_L3_00/mbm_total_bytes
     Unavailable

When a new monitoring group is created, a new RMID is assigned to the
new group. But the RMID is not active yet. When the events are read on
the new RMID, it is expected to report the status as "Unavailable".

When the user reads the events on the default monitoring group with
multiple subgroups, the events on all subgroups are consolidated
together. Currently, if any of the RMID reads report as "Unavailable",
then everything will be reported as "Unavailable".

Fix the issue by discarding the "Unavailable" reads and reporting all
the successful RMID reads. This is not a problem on Intel systems as
Intel reports 0 on Inactive RMIDs.

Fixes: d89b737901 ("x86/intel_rdt/cqm: Add mon_data")
Reported-by: Paweł Szulik <pawel.szulik@intel.com>
Signed-off-by: Babu Moger <Babu.Moger@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=213311
Link: https://lkml.kernel.org/r/162793309296.9224.15871659871696482080.stgit@bmoger-ubuntu
2021-08-12 20:12:20 +02:00
Thomas Gleixner
ff363f480e x86/msi: Force affinity setup before startup
The X86 MSI mechanism cannot handle interrupt affinity changes safely after
startup other than from an interrupt handler, unless interrupt remapping is
enabled. The startup sequence in the generic interrupt code violates that
assumption.

Mark the irq chips with the new IRQCHIP_AFFINITY_PRE_STARTUP flag so that
the default interrupt setting happens before the interrupt is started up
for the first time.

While the interrupt remapping MSI chip does not require this, there is no
point in treating it differently as this might spare an interrupt to a CPU
which is not in the default affinity mask.

For the non-remapping case go to the direct write path when the interrupt
is not yet started similar to the not yet activated case.

Fixes: 1840475676 ("genirq: Expose default irq affinity mask (take 3)")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210729222542.886722080@linutronix.de
2021-08-10 10:59:21 +02:00
Thomas Gleixner
0c0e37dc11 x86/ioapic: Force affinity setup before startup
The IO/APIC cannot handle interrupt affinity changes safely after startup
other than from an interrupt handler. The startup sequence in the generic
interrupt code violates that assumption.

Mark the irq chip with the new IRQCHIP_AFFINITY_PRE_STARTUP flag so that
the default interrupt setting happens before the interrupt is started up
for the first time.

Fixes: 1840475676 ("genirq: Expose default irq affinity mask (take 3)")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210729222542.832143400@linutronix.de
2021-08-10 10:59:21 +02:00
Linus Torvalds
d1b178254c Merge tag 'locking-urgent-2021-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 jump label fix from Thomas Gleixner:
 "A single fix for jump labels to prevent the compiler from agressive
  un-inlining which results in a section mismatch"

* tag 'locking-urgent-2021-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  jump_labels: Mark __jump_label_transform() as __always_inlined to work around aggressive compiler un-inlining
2021-07-25 10:21:19 -07:00
Wei Liu
f5a11c69b6 Revert "x86/hyperv: fix logical processor creation"
This reverts commit 450605c28d.

Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-07-21 15:55:43 +00:00
Ingo Molnar
e48a12e546 jump_labels: Mark __jump_label_transform() as __always_inlined to work around aggressive compiler un-inlining
In randconfig testing, certain UBSAN and CC Kconfig combinations
with GCC 10.3.0:

  CONFIG_X86_32=y

  CONFIG_CC_OPTIMIZE_FOR_SIZE=y

  CONFIG_UBSAN=y
  # CONFIG_UBSAN_TRAP is not set
  # CONFIG_UBSAN_BOUNDS is not set
  CONFIG_UBSAN_SHIFT=y
  # CONFIG_UBSAN_DIV_ZERO is not set
  CONFIG_UBSAN_UNREACHABLE=y
  CONFIG_UBSAN_BOOL=y
  # CONFIG_UBSAN_ENUM is not set
  # CONFIG_UBSAN_ALIGNMENT is not set
  # CONFIG_UBSAN_SANITIZE_ALL is not set

... produce this build warning (and build error if
CONFIG_SECTION_MISMATCH_WARN_ONLY=y is set):

  WARNING: modpost: vmlinux.o(.text+0x4c1cc): Section mismatch in reference from the function __jump_label_transform() to the function .init.text:text_poke_early()
  The function __jump_label_transform() references
  the function __init text_poke_early().
  This is often because __jump_label_transform lacks a __init
  annotation or the annotation of text_poke_early is wrong.

  ERROR: modpost: Section mismatches detected.

The problem is that __jump_label_transform() gets uninlined by GCC,
despite there being only a single local scope user of the 'static inline'
function.

Mark the function __always_inline instead, to work around this compiler
bug/artifact.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-07-13 06:32:05 +02:00
Stephen Boyd
9ef8af2a8f x86/dumpstack: use %pSb/%pBb for backtrace printing
Let's use the new printk formats to print the stacktrace entries when
printing a backtrace to the kernel logs.  This will include any module's
build ID[1] in it so that offline/crash debugging can easily locate the
debuginfo for a module via something like debuginfod[2].

Link: https://lkml.kernel.org/r/20210511003845.2429846-8-swboyd@chromium.org
Link: https://fedoraproject.org/wiki/Releases/FeatureBuildId [1]
Link: https://sourceware.org/elfutils/Debuginfod.html [2]
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Evan Green <evgreen@chromium.org>
Cc: Hsin-Yi Wang <hsinyi@chromium.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-08 11:48:22 -07:00
Kefeng Wang
30120d72a4 x86: convert to setup_initial_init_mm()
Use setup_initial_init_mm() helper to simplify code.

Link: https://lkml.kernel.org/r/20210608083418.137226-16-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-08 11:48:22 -07:00
Linus Torvalds
1423e2660c Merge tag 'x86-fpu-2021-07-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu updates from Thomas Gleixner:
 "Fixes and improvements for FPU handling on x86:

   - Prevent sigaltstack out of bounds writes.

     The kernel unconditionally writes the FPU state to the alternate
     stack without checking whether the stack is large enough to
     accomodate it.

     Check the alternate stack size before doing so and in case it's too
     small force a SIGSEGV instead of silently corrupting user space
     data.

   - MINSIGSTKZ and SIGSTKSZ are constants in signal.h and have never
     been updated despite the fact that the FPU state which is stored on
     the signal stack has grown over time which causes trouble in the
     field when AVX512 is available on a CPU. The kernel does not expose
     the minimum requirements for the alternate stack size depending on
     the available and enabled CPU features.

     ARM already added an aux vector AT_MINSIGSTKSZ for the same reason.
     Add it to x86 as well.

   - A major cleanup of the x86 FPU code. The recent discoveries of
     XSTATE related issues unearthed quite some inconsistencies,
     duplicated code and other issues.

     The fine granular overhaul addresses this, makes the code more
     robust and maintainable, which allows to integrate upcoming XSTATE
     related features in sane ways"

* tag 'x86-fpu-2021-07-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits)
  x86/fpu/xstate: Clear xstate header in copy_xstate_to_uabi_buf() again
  x86/fpu/signal: Let xrstor handle the features to init
  x86/fpu/signal: Handle #PF in the direct restore path
  x86/fpu: Return proper error codes from user access functions
  x86/fpu/signal: Split out the direct restore code
  x86/fpu/signal: Sanitize copy_user_to_fpregs_zeroing()
  x86/fpu/signal: Sanitize the xstate check on sigframe
  x86/fpu/signal: Remove the legacy alignment check
  x86/fpu/signal: Move initial checks into fpu__restore_sig()
  x86/fpu: Mark init_fpstate __ro_after_init
  x86/pkru: Remove xstate fiddling from write_pkru()
  x86/fpu: Don't store PKRU in xstate in fpu_reset_fpstate()
  x86/fpu: Remove PKRU handling from switch_fpu_finish()
  x86/fpu: Mask PKRU from kernel XRSTOR[S] operations
  x86/fpu: Hook up PKRU into ptrace()
  x86/fpu: Add PKRU storage outside of task XSAVE buffer
  x86/fpu: Dont restore PKRU in fpregs_restore_userspace()
  x86/fpu: Rename xfeatures_mask_user() to xfeatures_mask_uabi()
  x86/fpu: Move FXSAVE_LEAK quirk info __copy_kernel_to_fpregs()
  x86/fpu: Rename __fpregs_load_activate() to fpregs_restore_userregs()
  ...
2021-07-07 11:12:01 -07:00
Linus Torvalds
757fa80f4e Merge tag 'trace-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:

 - Added option for per CPU threads to the hwlat tracer

 - Have hwlat tracer handle hotplug CPUs

 - New tracer: osnoise, that detects latency caused by interrupts,
   softirqs and scheduling of other tasks.

 - Added timerlat tracer that creates a thread and measures in detail
   what sources of latency it has for wake ups.

 - Removed the "success" field of the sched_wakeup trace event. This has
   been hardcoded as "1" since 2015, no tooling should be looking at it
   now. If one exists, we can revert this commit, fix that tool and try
   to remove it again in the future.

 - tgid mapping fixed to handle more than PID_MAX_DEFAULT pids/tgids.

 - New boot command line option "tp_printk_stop", as tp_printk causes
   trace events to write to console. When user space starts, this can
   easily live lock the system. Having a boot option to stop just after
   boot up is useful to prevent that from happening.

 - Have ftrace_dump_on_oops boot command line option take numbers that
   match the numbers shown in /proc/sys/kernel/ftrace_dump_on_oops.

 - Bootconfig clean ups, fixes and enhancements.

 - New ktest script that tests bootconfig options.

 - Add tracepoint_probe_register_may_exist() to register a tracepoint
   without triggering a WARN*() if it already exists. BPF has a path
   from user space that can do this. All other paths are considered a
   bug.

 - Small clean ups and fixes

* tag 'trace-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (49 commits)
  tracing: Resize tgid_map to pid_max, not PID_MAX_DEFAULT
  tracing: Simplify & fix saved_tgids logic
  treewide: Add missing semicolons to __assign_str uses
  tracing: Change variable type as bool for clean-up
  trace/timerlat: Fix indentation on timerlat_main()
  trace/osnoise: Make 'noise' variable s64 in run_osnoise()
  tracepoint: Add tracepoint_probe_register_may_exist() for BPF tracing
  tracing: Fix spelling in osnoise tracer "interferences" -> "interference"
  Documentation: Fix a typo on trace/osnoise-tracer
  trace/osnoise: Fix return value on osnoise_init_hotplug_support
  trace/osnoise: Make interval u64 on osnoise_main
  trace/osnoise: Fix 'no previous prototype' warnings
  tracing: Have osnoise_main() add a quiescent state for task rcu
  seq_buf: Make trace_seq_putmem_hex() support data longer than 8
  seq_buf: Fix overflow in seq_buf_putmem_hex()
  trace/osnoise: Support hotplug operations
  trace/hwlat: Support hotplug operations
  trace/hwlat: Protect kdata->kthread with get/put_online_cpus
  trace: Add timerlat tracer
  trace: Add osnoise tracer
  ...
2021-07-03 11:13:22 -07:00
Linus Torvalds
71bd934101 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "190 patches.

  Subsystems affected by this patch series: mm (hugetlb, userfaultfd,
  vmscan, kconfig, proc, z3fold, zbud, ras, mempolicy, memblock,
  migration, thp, nommu, kconfig, madvise, memory-hotplug, zswap,
  zsmalloc, zram, cleanups, kfence, and hmm), procfs, sysctl, misc,
  core-kernel, lib, lz4, checkpatch, init, kprobes, nilfs2, hfs,
  signals, exec, kcov, selftests, compress/decompress, and ipc"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits)
  ipc/util.c: use binary search for max_idx
  ipc/sem.c: use READ_ONCE()/WRITE_ONCE() for use_global_lock
  ipc: use kmalloc for msg_queue and shmid_kernel
  ipc sem: use kvmalloc for sem_undo allocation
  lib/decompressors: remove set but not used variabled 'level'
  selftests/vm/pkeys: exercise x86 XSAVE init state
  selftests/vm/pkeys: refill shadow register after implicit kernel write
  selftests/vm/pkeys: handle negative sys_pkey_alloc() return code
  selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random
  kcov: add __no_sanitize_coverage to fix noinstr for all architectures
  exec: remove checks in __register_bimfmt()
  x86: signal: don't do sas_ss_reset() until we are certain that sigframe won't be abandoned
  hfsplus: report create_date to kstat.btime
  hfsplus: remove unnecessary oom message
  nilfs2: remove redundant continue statement in a while-loop
  kprobes: remove duplicated strong free_insn_page in x86 and s390
  init: print out unknown kernel parameters
  checkpatch: do not complain about positive return values starting with EPOLL
  checkpatch: improve the indented label test
  checkpatch: scripts/spdxcheck.py now requires python3
  ...
2021-07-02 12:08:10 -07:00
Linus Torvalds
e058a84bfd Merge tag 'drm-next-2021-07-01' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
 "Highlights:

   - AMD enables two more GPUs, with resulting header files

   - i915 has started to move to TTM for discrete GPU and enable DG1
     discrete GPU support (not by default yet)

   - new HyperV drm driver

   - vmwgfx adds arm64 support

   - TTM refactoring ongoing

   - 16bpc display support for AMD hw

  Otherwise it's just the usual insane amounts of work all over the
  place in lots of drivers and the core, as mostly summarised below:

  Core:
   - mark AGP ioctls as legacy
   - disable force probing for non-master clients
   - HDR metadata property helpers
   - HDMI infoframe signal colorimetry support
   - remove drm_device.pdev pointer
   - remove DRM_KMS_FB_HELPER config option
   - remove drm_pci_alloc/free
   - drm_err_*/drm_dbg_* helpers
   - use drm driver names for fbdev
   - leaked DMA handle fix
   - 16bpc fixed point format fourcc
   - add prefetching memcpy for WC
   - Documentation fixes

  aperture:
   - add aperture ownership helpers

  dp:
   - aux fixes
   - downstream 0 port handling
   - use extended base receiver capability DPCD
   - Rename DP_PSR_SELECTIVE_UPDATE to better mach eDP spec
   - mst: use khz as link rate during init
   - VCPI fixes for StarTech hub

  ttm:
   - provide tt_shrink file via debugfs
   - warn about freeing pinned BOs
   - fix swapping error handling
   - move page alignment into BO
   - cleanup ttm_agp_backend
   - add ttm_sys_manager
   - don't override vm_ops
   - ttm_bo_mmap removed
   - make ttm_resource base of all managers
   - remove VM_MIXEDMAP usage

  panel:
   - sysfs_emit support
   - simple: runtime PM support
   - simple: power up panel when reading EDID + caching

  bridge:
   - MHDP8546: HDCP support + DT bindings
   - MHDP8546: Register DP AUX channel with userspace
   - TI SN65DSI83 + SN65DSI84: add driver
   - Sil8620: Fix module dependencies
   - dw-hdmi: make CEC driver loading optional
   - Ti-sn65dsi86: refclk fixes, subdrivers, runtime pm
   - It66121: Add driver + DT bindings
   - Adv7511: Support I2S IEC958 encoding
   - Anx7625: fix power-on delay
   - Nwi-dsi: Modesetting fixes; Cleanups
   - lt6911: add missing MODULE_DEVICE_TABLE
   - cdns: fix PM reference leak

  hyperv:
   - add new DRM driver for HyperV graphics

  efifb:
   - non-PCI device handling fixes

  i915:
   - refactor IP/device versioning
   - XeLPD Display IP preperation work
   - ADL-P enablement patches
   - DG1 uAPI behind BROKEN
   - disable mmap ioctl for discerte GPUs
   - start enabling HuC loading for Gen12+
   - major GuC backend rework for new platforms
   - initial TTM support for Discrete GPUs
   - locking rework for TTM prep
   - use correct max source link rate for eDP
   - %p4cc format printing
   - GLK display fixes
   - VLV DSI panel power fixes
   - PSR2 disabled for RKL and ADL-S
   - ACPI _DSM invalid access fixed
   - DMC FW path abstraction
   - ADL-S PCI ID update
   - uAPI headers converted to kerneldoc
   - initial LMEM support for DG1
   - x86/gpu: add Jasperlake to gen11 early quirks

  amdgpu:
   - Aldebaran updates + initial SR-IOV
   - new GPU: Beige Goby and Yellow Carp support
   - more LTTPR display work
   - Vangogh updates
   - SDMA 5.x GCR fixes
   - PCIe ASPM support
   - Renoir TMZ enablement
   - initial multiple eDP panel support
   - use fdinfo to track devices/process info
   - pin/unpin TTM fixes
   - free resource on fence usage query
   - fix fence calculation
   - fix hotunplug/suspend issues
   - GC/MM register access macro cleanup for SR-IOV
   - W=1 fixes
   - ACPI ATCS/ATIF handling rework
   - 16bpc fixed point format support
   - Initial smartshift support
   - RV/PCO power tuning fixes
   - new INFO query for additional vbios info

  amdkfd:
   - SR-IOV aldebaran support
   - HMM SVM support

  radeon:
   - SMU regression fixes
   - Oland flickering fix

  vmwgfx:
   - enable console with fbdev emulation
   - fix cpu updates of coherent multisample surfaces
   - remove reservation semaphore
   - add initial SVGA3 support
   - support arm64

  msm:
   - devcoredump support for display errors
   - dpu/dsi: yaml bindings conversion
   - mdp5: alpha/blend_mode/zpos support
   - a6xx: cached coherent buffer support
   - gpu iova fault improvement
   - a660 support

  rockchip:
   - RK3036 win1 scaling support
   - RK3066/3188 missing register support
   - RK3036/3066/3126/3188 alpha support

  mediatek:
   - MT8167 HDMI support
   - MT8183 DPI dual edge support

  tegra:
   - fixed YUV support/scaling on Tegra186+

  ast:
   - use pcim_iomap
   - fix DP501 EDID

  bochs:
   - screen blanking support

  etnaviv:
   - export more GPU ID values to userspace
   - add HWDB entry for GPU on i.MX8MP
   - rework linear window calcs

  exynos:
   - pm runtime changes

  imx:
   - Annotate dma_fence critical section
   - fix PRG modifiers after drmm conversion
   - Add 8 pixel alignment fix for 1366x768
   - fix YUV advertising
   - add color properties

  ingenic:
   - IPU planes fix

  panfrost:
   - Mediatek MT8183 support + DT bindings
   - export AFBC_FEATURES register to userspace

  simpledrm:
   - %pr for printing resources

  nouveau:
   - pin/unpin TTM fixes

  qxl:
   - unpin shadow BO

  virtio:
   - create dumb BOs as guest blob

  vkms:
   - drmm_universal_plane_alloc
   - add XRGB plane composition
   - overlay support"

* tag 'drm-next-2021-07-01' of git://anongit.freedesktop.org/drm/drm: (1570 commits)
  drm/i915: Reinstate the mmap ioctl for some platforms
  drm/i915/dsc: abstract helpers to get bigjoiner primary/secondary crtc
  Revert "drm/msm/mdp5: provide dynamic bandwidth management"
  drm/msm/mdp5: provide dynamic bandwidth management
  drm/msm/mdp5: add perf blocks for holding fudge factors
  drm/msm/mdp5: switch to standard zpos property
  drm/msm/mdp5: add support for alpha/blend_mode properties
  drm/msm/mdp5: use drm_plane_state for pixel blend mode
  drm/msm/mdp5: use drm_plane_state for storing alpha value
  drm/msm/mdp5: use drm atomic helpers to handle base drm plane state
  drm/msm/dsi: do not enable PHYs when called for the slave DSI interface
  drm/msm: Add debugfs to trigger shrinker
  drm/msm/dpu: Avoid ABBA deadlock between IRQ modules
  drm/msm: devcoredump iommu fault support
  iommu/arm-smmu-qcom: Add stall support
  drm/msm: Improve the a6xx page fault handler
  iommu/arm-smmu-qcom: Add an adreno-smmu-priv callback to get pagefault info
  iommu/arm-smmu: Add support for driver IOMMU fault handlers
  drm/msm: export hangcheck_period in debugfs
  drm/msm/a6xx: add support for Adreno 660 GPU
  ...
2021-07-01 12:53:43 -07:00
Barry Song
66ce75144d kprobes: remove duplicated strong free_insn_page in x86 and s390
free_insn_page() in x86 and s390 is same with the common weak function in
kernel/kprobes.c.  Plus, the comment "Recover page to RW mode before
releasing it" in x86 seems insensible to be there since resetting mapping
is done by common code in vfree() of module_memfree().  So drop these two
duplicated strong functions and related comment, then mark the common one
in kernel/kprobes.c strong.

Link: https://lkml.kernel.org/r/20210608065736.32656-1-song.bao.hua@hisilicon.com
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Qi Liu <liuqi115@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-01 11:06:06 -07:00
Andy Shevchenko
f39650de68 kernel.h: split out panic and oops helpers
kernel.h is being used as a dump for all kinds of stuff for a long time.
Here is the attempt to start cleaning it up by splitting out panic and
oops helpers.

There are several purposes of doing this:
- dropping dependency in bug.h
- dropping a loop by moving out panic_notifier.h
- unload kernel.h from something which has its own domain

At the same time convert users tree-wide to use new headers, although for
the time being include new header back to kernel.h to avoid twisted
indirected includes for existing users.

[akpm@linux-foundation.org: thread_info.h needs limits.h]
[andriy.shevchenko@linux.intel.com: ia64 fix]
  Link: https://lkml.kernel.org/r/20210520130557.55277-1-andriy.shevchenko@linux.intel.com

Link: https://lkml.kernel.org/r/20210511074137.33666-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Co-developed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Sebastian Reichel <sre@kernel.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Stephen Boyd <sboyd@kernel.org>
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Acked-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-01 11:06:04 -07:00
Linus Torvalds
65090f30ab Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
 "191 patches.

  Subsystems affected by this patch series: kthread, ia64, scripts,
  ntfs, squashfs, ocfs2, kernel/watchdog, and mm (gup, pagealloc, slab,
  slub, kmemleak, dax, debug, pagecache, gup, swap, memcg, pagemap,
  mprotect, bootmem, dma, tracing, vmalloc, kasan, initialization,
  pagealloc, and memory-failure)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (191 commits)
  mm,hwpoison: make get_hwpoison_page() call get_any_page()
  mm,hwpoison: send SIGBUS with error virutal address
  mm/page_alloc: split pcp->high across all online CPUs for cpuless nodes
  mm/page_alloc: allow high-order pages to be stored on the per-cpu lists
  mm: replace CONFIG_FLAT_NODE_MEM_MAP with CONFIG_FLATMEM
  mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA
  docs: remove description of DISCONTIGMEM
  arch, mm: remove stale mentions of DISCONIGMEM
  mm: remove CONFIG_DISCONTIGMEM
  m68k: remove support for DISCONTIGMEM
  arc: remove support for DISCONTIGMEM
  arc: update comment about HIGHMEM implementation
  alpha: remove DISCONTIGMEM and NUMA
  mm/page_alloc: move free_the_page
  mm/page_alloc: fix counting of managed_pages
  mm/page_alloc: improve memmap_pages dbg msg
  mm: drop SECTION_SHIFT in code comments
  mm/page_alloc: introduce vm.percpu_pagelist_high_fraction
  mm/page_alloc: limit the number of pages on PCP lists when reclaim is active
  mm/page_alloc: scale the number of pages that are batch freed
  ...
2021-06-29 17:29:11 -07:00
Linus Torvalds
5e6928249b Merge tag 'acpi-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
 "These update the ACPICA code in the kernel to the 20210604 upstream
  revision, add preliminary support for the Platform Runtime Mechanism
  (PRM), address issues related to the handling of device dependencies
  in the ACPI device eunmeration code, improve the tracking of ACPI
  power resource states, improve the ACPI support for suspend-to-idle on
  AMD systems, continue the unification of message printing in the ACPI
  code, address assorted issues and clean up the code in a number of
  places.

  Specifics:

   - Update ACPICA code in the kernel to upstrea revision 20210604
     including the following changes:

      - Add defines for the CXL Host Bridge Structureand and add the
        CFMWS structure definition to CEDT (Alison Schofield).
      - iASL: Finish support for the IVRS ACPI table (Bob Moore).
      - iASL: Add support for the SVKL table (Bob Moore).
      - iASL: Add full support for RGRT ACPI table (Bob Moore).
      - iASL: Add support for the BDAT ACPI table (Bob Moore).
      - iASL: add disassembler support for PRMT (Erik Kaneda).
      - Fix memory leak caused by _CID repair function (Erik Kaneda).
      - Add support for PlatformRtMechanism OpRegion (Erik Kaneda).
      - Add PRMT module header to facilitate parsing (Erik Kaneda).
      - Add _PLD panel positions (Fabian Wüthrich).
      - MADT: add Multiprocessor Wakeup Mailbox Structure and the SVKL
        table headers (Kuppuswamy Sathyanarayanan).
      - Use ACPI_FALLTHROUGH (Wei Ming Chen).

   - Add preliminary support for the Platform Runtime Mechanism (PRM) to
     allow the AML interpreter to call PRM functions (Erik Kaneda).

   - Address some issues related to the handling of device dependencies
     reported by _DEP in the ACPI device enumeration code and clean up
     some related pieces of it (Rafael Wysocki).

   - Improve the tracking of states of ACPI power resources (Rafael
     Wysocki).

   - Improve ACPI support for suspend-to-idle on AMD systems (Alex
     Deucher, Mario Limonciello, Pratik Vishwakarma).

   - Continue the unification and cleanup of message printing in the
     ACPI code (Hanjun Guo, Heiner Kallweit).

   - Fix possible buffer overrun issue with the description_show() sysfs
     attribute method (Krzysztof Wilczyński).

   - Improve the acpi_mask_gpe kernel command line parameter handling
     and clean up the core ACPI code related to sysfs (Andy Shevchenko,
     Baokun Li, Clayton Casciato).

   - Postpone bringing devices in the general ACPI PM domain to D0
     during resume from system-wide suspend until they are really needed
     (Dmitry Torokhov).

   - Make the ACPI processor driver fix up C-state latency if not
     ordered (Mario Limonciello).

   - Add support for identifying devices depening on the given one that
     are not its direct descendants with the help of _DEP (Daniel
     Scally).

   - Extend the checks related to ACPI IRQ overrides on x86 in order to
     avoid false-positives (Hui Wang).

   - Add battery DPTF participant for Intel SoCs (Sumeet Pawnikar).

   - Rearrange the ACPI fan driver and device power management code to
     use a common list of device IDs (Rafael Wysocki).

   - Fix clang CFI violation in the ACPI BGRT table parsing code and
     clean it up (Nathan Chancellor).

   - Add GPE-related quirks for some laptops to the EC driver (Chris
     Chiu, Zhang Rui).

   - Make the ACPI PPTT table parsing code populate the cache-id value
     if present in the firmware (James Morse).

   - Remove redundant clearing of context->ret.pointer from
     acpi_run_osc() (Hans de Goede).

   - Add missing acpi_put_table() in acpi_init_fpdt() (Jing Xiangfeng).

   - Make ACPI APEI handle ARM Processor Error CPER records like Memory
     Error ones to avoid user space task lockups (Xiaofei Tan).

   - Stop warning about disabled ACPI in APEI (Jon Hunter).

   - Fix fall-through warning for Clang in the SBSHC driver (Gustavo A.
     R. Silva).

   - Add custom DSDT file as Makefile prerequisite (Richard Fitzgerald).

   - Initialize local variable to avoid garbage being returned (Colin
     Ian King).

   - Simplify assorted pieces of code, address assorted coding style and
     documentation issues and comment typos (Baokun Li, Christophe
     JAILLET, Clayton Casciato, Liu Shixin, Shaokun Zhang, Wei Yongjun,
     Yang Li, Zhen Lei)"

* tag 'acpi-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (97 commits)
  ACPI: PM: postpone bringing devices to D0 unless we need them
  ACPI: tables: Add custom DSDT file as makefile prerequisite
  ACPI: bgrt: Use sysfs_emit
  ACPI: bgrt: Fix CFI violation
  ACPI: EC: trust DSDT GPE for certain HP laptop
  ACPI: scan: Simplify acpi_table_events_fn()
  ACPI: PM: Adjust behavior for field problems on AMD systems
  ACPI: PM: s2idle: Add support for new Microsoft UUID
  ACPI: PM: s2idle: Add support for multiple func mask
  ACPI: PM: s2idle: Refactor common code
  ACPI: PM: s2idle: Use correct revision id
  ACPI: sysfs: Remove tailing return statement in void function
  ACPI: sysfs: Use __ATTR_RO() and __ATTR_RW() macros
  ACPI: sysfs: Sort headers alphabetically
  ACPI: sysfs: Refactor param_get_trace_state() to drop dead code
  ACPI: sysfs: Unify pattern of memory allocations
  ACPI: sysfs: Allow bitmap list to be supplied to acpi_mask_gpe
  ACPI: sysfs: Make sparse happy about address space in use
  ACPI: scan: Fix race related to dropping dependencies
  ACPI: scan: Reorganize acpi_device_add()
  ...
2021-06-29 13:39:41 -07:00
Linus Torvalds
a22c3f615a Merge tag 'x86-irq-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 interrupt related updates from Thomas Gleixner:

 - Consolidate the VECTOR defines and the usage sites.

 - Cleanup GDT/IDT related code and replace open coded ASM with proper
   native helper functions.

* tag 'x86-irq-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/kexec: Set_[gi]dt() -> native_[gi]dt_invalidate() in machine_kexec_*.c
  x86: Add native_[ig]dt_invalidate()
  x86/idt: Remove address argument from idt_invalidate()
  x86/irq: Add and use NR_EXTERNAL_VECTORS and NR_SYSTEM_VECTORS
  x86/irq: Remove unused vectors defines
2021-06-29 12:36:59 -07:00
Linus Torvalds
a941a0349c Merge tag 'timers-core-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "Time and clocksource/clockevent related updates:

  Core changes:

   - Infrastructure to support per CPU "broadcast" devices for per CPU
     clockevent devices which stop in deep idle states. This allows us
     to utilize the more efficient architected timer on certain ARM SoCs
     for normal operation instead of permanentely using the slow to
     access SoC specific clockevent device.

   - Print the name of the broadcast/wakeup device in /proc/timer_list

   - Make the clocksource watchdog more robust against delays between
     reading the current active clocksource and the watchdog
     clocksource. Such delays can be caused by NMIs, SMIs and vCPU
     preemption.

     Handle this by reading the watchdog clocksource twice, i.e. before
     and after reading the current active clocksource. In case that the
     two watchdog reads shows an excessive time delta, the read sequence
     is repeated up to 3 times.

   - Improve the debug output and add a test module for the watchdog
     mechanism.

   - Reimplementation of the venerable time64_to_tm() function with a
     faster and significantly smaller version. Straight from the source,
     i.e. the author of the related research paper contributed this!

  Driver changes:

   - No new drivers, not even new device tree bindings!

   - Fixes, improvements and cleanups and all over the place"

* tag 'timers-core-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
  time/kunit: Add missing MODULE_LICENSE()
  time: Improve performance of time64_to_tm()
  clockevents: Use list_move() instead of list_del()/list_add()
  clocksource: Print deviation in nanoseconds when a clocksource becomes unstable
  clocksource: Provide kernel module to test clocksource watchdog
  clocksource: Reduce clocksource-skew threshold
  clocksource: Limit number of CPUs checked for clock synchronization
  clocksource: Check per-CPU clock synchronization when marked unstable
  clocksource: Retry clock read if long delays detected
  clockevents: Add missing parameter documentation
  clocksource/drivers/timer-ti-dm: Drop unnecessary restore
  clocksource/arm_arch_timer: Improve Allwinner A64 timer workaround
  clocksource/drivers/arm_global_timer: Remove duplicated argument in arm_global_timer
  clocksource/drivers/arm_global_timer: Make symbol 'gt_clk_rate_change_nb' static
  arm: zynq: don't disable CONFIG_ARM_GLOBAL_TIMER due to CONFIG_CPU_FREQ anymore
  clocksource/drivers/arm_global_timer: Implement rate compensation whenever source clock changes
  clocksource/drivers/ingenic: Rename unreasonable array names
  clocksource/drivers/timer-ti-dm: Save and restore timer TIOCP_CFG
  clocksource/drivers/mediatek: Ack and disable interrupts on suspend
  clocksource/drivers/samsung_pwm: Constify source IO memory
  ...
2021-06-29 12:31:16 -07:00
Linus Torvalds
b694011a4a Merge tag 'hyperv-next-signed-20210629' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv updates from Wei Liu:
 "Just a few minor enhancement patches and bug fixes"

* tag 'hyperv-next-signed-20210629' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  PCI: hv: Add check for hyperv_initialized in init_hv_pci_drv()
  Drivers: hv: Move Hyper-V extended capability check to arch neutral code
  drivers: hv: Fix missing error code in vmbus_connect()
  x86/hyperv: fix logical processor creation
  hv_utils: Fix passing zero to 'PTR_ERR' warning
  scsi: storvsc: Use blk_mq_unique_tag() to generate requestIDs
  Drivers: hv: vmbus: Copy packets sent by Hyper-V out of the ring buffer
  hv_balloon: Remove redundant assignment to region_start
2021-06-29 11:21:35 -07:00
Naoya Horiguchi
a3f5d80ea4 mm,hwpoison: send SIGBUS with error virutal address
Now an action required MCE in already hwpoisoned address surely sends a
SIGBUS to current process, but the SIGBUS doesn't convey error virtual
address.  That's not optimal for hwpoison-aware applications.

To fix the issue, make memory_failure() call kill_accessing_process(),
that does pagetable walk to find the error virtual address.  It could find
multiple virtual addresses for the same error page, and it seems hard to
tell which virtual address is correct one.  But that's rare and sending
incorrect virtual address could be better than no address.  So let's
report the first found virtual address for now.

[naoya.horiguchi@nec.com: fix walk_page_range() return]
  Link: https://lkml.kernel.org/r/20210603051055.GA244241@hori.linux.bs1.fc.nec.co.jp

Link: https://lkml.kernel.org/r/20210521030156.2612074-4-nao.horiguchi@gmail.com
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Aili Yao <yaoaili@kingsoft.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jue Wang <juew@google.com>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29 10:53:55 -07:00
Mike Rapoport
a9ee6cf5c6 mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA
After removal of DISCINTIGMEM the NEED_MULTIPLE_NODES and NUMA
configuration options are equivalent.

Drop CONFIG_NEED_MULTIPLE_NODES and use CONFIG_NUMA instead.

Done with

	$ sed -i 's/CONFIG_NEED_MULTIPLE_NODES/CONFIG_NUMA/' \
		$(git grep -wl CONFIG_NEED_MULTIPLE_NODES)
	$ sed -i 's/NEED_MULTIPLE_NODES/NUMA/' \
		$(git grep -wl NEED_MULTIPLE_NODES)

with manual tweaks afterwards.

[rppt@linux.ibm.com: fix arm boot crash]
  Link: https://lkml.kernel.org/r/YMj9vHhHOiCVN4BF@linux.ibm.com

Link: https://lkml.kernel.org/r/20210608091316.3622-9-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29 10:53:55 -07:00
Liam Howlett
9ce2c3fc0b x86/sgx: use vma_lookup() in sgx_encl_find()
Use vma_lookup() to find the VMA at a specific address.  As vma_lookup()
will return NULL if the address is not within any VMA, the start address
no longer needs to be validated.

Link: https://lkml.kernel.org/r/20210521174745.2219620-10-Liam.Howlett@Oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29 10:53:51 -07:00
Rafael J. Wysocki
3a616ec797 Merge branches 'acpi-prm', 'acpi-sysfs' and 'acpi-x86'
* acpi-prm:
  ACPI: PRM: make symbol 'prm_module_list' static
  ACPI: Add \_SB._OSC bit for PRM
  ACPI: PRM: implement OperationRegion handler for the PlatformRtMechanism subtype

* acpi-sysfs:
  ACPI: sysfs: Remove tailing return statement in void function
  ACPI: sysfs: Use __ATTR_RO() and __ATTR_RW() macros
  ACPI: sysfs: Sort headers alphabetically
  ACPI: sysfs: Refactor param_get_trace_state() to drop dead code
  ACPI: sysfs: Unify pattern of memory allocations
  ACPI: sysfs: Allow bitmap list to be supplied to acpi_mask_gpe
  ACPI: sysfs: Make sparse happy about address space in use
  ACPI: sysfs: fix doc warnings in device_sysfs.c
  ACPI: sysfs: Drop four redundant return statements
  ACPI: sysfs: Fix a buffer overrun problem with description_show()

* acpi-x86:
  x86/acpi: Switch to pr_xxx log functions
2021-06-29 15:48:08 +02:00
Linus Torvalds
36824f198c Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
 "This covers all architectures (except MIPS) so I don't expect any
  other feature pull requests this merge window.

  ARM:

   - Add MTE support in guests, complete with tag save/restore interface

   - Reduce the impact of CMOs by moving them in the page-table code

   - Allow device block mappings at stage-2

   - Reduce the footprint of the vmemmap in protected mode

   - Support the vGIC on dumb systems such as the Apple M1

   - Add selftest infrastructure to support multiple configuration and
     apply that to PMU/non-PMU setups

   - Add selftests for the debug architecture

   - The usual crop of PMU fixes

  PPC:

   - Support for the H_RPT_INVALIDATE hypercall

   - Conversion of Book3S entry/exit to C

   - Bug fixes

  S390:

   - new HW facilities for guests

   - make inline assembly more robust with KASAN and co

  x86:

   - Allow userspace to handle emulation errors (unknown instructions)

   - Lazy allocation of the rmap (host physical -> guest physical
     address)

   - Support for virtualizing TSC scaling on VMX machines

   - Optimizations to avoid shattering huge pages at the beginning of
     live migration

   - Support for initializing the PDPTRs without loading them from
     memory

   - Many TLB flushing cleanups

   - Refuse to load if two-stage paging is available but NX is not (this
     has been a requirement in practice for over a year)

   - A large series that separates the MMU mode (WP/SMAP/SMEP etc.) from
     CR0/CR4/EFER, using the MMU mode everywhere once it is computed
     from the CPU registers

   - Use PM notifier to notify the guest about host suspend or hibernate

   - Support for passing arguments to Hyper-V hypercalls using XMM
     registers

   - Support for Hyper-V TLB flush hypercalls and enlightened MSR bitmap
     on AMD processors

   - Hide Hyper-V hypercalls that are not included in the guest CPUID

   - Fixes for live migration of virtual machines that use the Hyper-V
     "enlightened VMCS" optimization of nested virtualization

   - Bugfixes (not many)

  Generic:

   - Support for retrieving statistics without debugfs

   - Cleanups for the KVM selftests API"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (314 commits)
  KVM: x86: rename apic_access_page_done to apic_access_memslot_enabled
  kvm: x86: disable the narrow guest module parameter on unload
  selftests: kvm: Allows userspace to handle emulation errors.
  kvm: x86: Allow userspace to handle emulation errors
  KVM: x86/mmu: Let guest use GBPAGES if supported in hardware and TDP is on
  KVM: x86/mmu: Get CR4.SMEP from MMU, not vCPU, in shadow page fault
  KVM: x86/mmu: Get CR0.WP from MMU, not vCPU, in shadow page fault
  KVM: x86/mmu: Drop redundant rsvd bits reset for nested NPT
  KVM: x86/mmu: Optimize and clean up so called "last nonleaf level" logic
  KVM: x86: Enhance comments for MMU roles and nested transition trickiness
  KVM: x86/mmu: WARN on any reserved SPTE value when making a valid SPTE
  KVM: x86/mmu: Add helpers to do full reserved SPTE checks w/ generic MMU
  KVM: x86/mmu: Use MMU's role to determine PTTYPE
  KVM: x86/mmu: Collapse 32-bit PAE and 64-bit statements for helpers
  KVM: x86/mmu: Add a helper to calculate root from role_regs
  KVM: x86/mmu: Add helper to update paging metadata
  KVM: x86/mmu: Don't update nested guest's paging bitmasks if CR0.PG=0
  KVM: x86/mmu: Consolidate reset_rsvds_bits_mask() calls
  KVM: x86/mmu: Use MMU role_regs to get LA57, and drop vCPU LA57 helper
  KVM: x86/mmu: Get nested MMU's root level from the MMU's role
  ...
2021-06-28 15:40:51 -07:00
Linus Torvalds
1b1cf8fe99 Merge tag 'x86-splitlock-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 splitlock updates from Ingo Molnar:

 - Add the "ratelimit:N" parameter to the split_lock_detect= boot
   option, to rate-limit the generation of bus-lock exceptions.

   This is both easier on system resources and kinder to offending
   applications than the current policy of outright killing them.

 - Document the split-lock detection feature and its parameters.

* tag 'x86-splitlock-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Documentation/x86: Add ratelimit in buslock.rst
  Documentation/admin-guide: Add bus lock ratelimit
  x86/bus_lock: Set rate limit for bus lock
  Documentation/x86: Add buslock.rst
2021-06-28 13:30:02 -07:00
Linus Torvalds
8e4d7a78f0 Merge tag 'x86-cleanups-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
 "Misc cleanups & removal of obsolete code"

* tag 'x86-cleanups-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sgx: Correct kernel-doc's arg name in sgx_encl_release()
  doc: Remove references to IBM Calgary
  x86/setup: Document that Windows reserves the first MiB
  x86/crash: Remove crash_reserve_low_1M()
  x86/setup: Remove CONFIG_X86_RESERVE_LOW and reservelow= options
  x86/alternative: Align insn bytes vertically
  x86: Fix leftover comment typos
  x86/asm: Simplify __smp_mb() definition
  x86/alternatives: Make the x86nops[] symbol static
2021-06-28 13:10:25 -07:00
Linus Torvalds
98e62da8b3 Merge tag 'x86-cache-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 resource control documentation fixes from Ingo Molnar:
 "Fix Docbook comments in the x86/resctrl code"

* tag 'x86-cache-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/resctrl: Fix kernel-doc in internal.h
  x86/resctrl: Fix kernel-doc in pseudo_lock.c
2021-06-28 13:06:24 -07:00
Linus Torvalds
909489bf9f Merge tag 'x86-asm-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:

 - Micro-optimize and standardize the do_syscall_64() calling convention

 - Make syscall entry flags clearing more conservative

 - Clean up syscall table handling

 - Clean up & standardize assembly macros, in preparation of FRED

 - Misc cleanups and fixes

* tag 'x86-asm-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asm: Make <asm/asm.h> valid on cross-builds as well
  x86/regs: Syscall_get_nr() returns -1 for a non-system call
  x86/entry: Split PUSH_AND_CLEAR_REGS into two submacros
  x86/syscall: Maximize MSR_SYSCALL_MASK
  x86/syscall: Unconditionally prototype {ia32,x32}_sys_call_table[]
  x86/entry: Reverse arguments to do_syscall_64()
  x86/entry: Unify definitions from <asm/calling.h> and <asm/ptrace-abi.h>
  x86/asm: Use _ASM_BYTES() in <asm/nops.h>
  x86/asm: Add _ASM_BYTES() macro for a .byte ... opcode sequence
  x86/asm: Have the __ASM_FORM macros handle commas in arguments
2021-06-28 12:57:11 -07:00
Linus Torvalds
e5a0fc4e20 Merge tag 'x86-apic-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 exception handling updates from Ingo Molnar:

 - Clean up & simplify AP exception handling setup.

 - Consolidate the disjoint IDT setup code living in idt_setup_traps()
   and idt_setup_ist_traps() into a single idt_setup_traps()
   initialization function and call it before cpu_init().

* tag 'x86-apic-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/idt: Rework IDT setup for boot CPU
  x86/cpu: Init AP exception handling from cpu_init_secondary()
2021-06-28 12:46:30 -07:00
Linus Torvalds
54a728dc5e Merge tag 'sched-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler udpates from Ingo Molnar:

 - Changes to core scheduling facilities:

    - Add "Core Scheduling" via CONFIG_SCHED_CORE=y, which enables
      coordinated scheduling across SMT siblings. This is a much
      requested feature for cloud computing platforms, to allow the
      flexible utilization of SMT siblings, without exposing untrusted
      domains to information leaks & side channels, plus to ensure more
      deterministic computing performance on SMT systems used by
      heterogenous workloads.

      There are new prctls to set core scheduling groups, which allows
      more flexible management of workloads that can share siblings.

    - Fix task->state access anti-patterns that may result in missed
      wakeups and rename it to ->__state in the process to catch new
      abuses.

 - Load-balancing changes:

    - Tweak newidle_balance for fair-sched, to improve 'memcache'-like
      workloads.

    - "Age" (decay) average idle time, to better track & improve
      workloads such as 'tbench'.

    - Fix & improve energy-aware (EAS) balancing logic & metrics.

    - Fix & improve the uclamp metrics.

    - Fix task migration (taskset) corner case on !CONFIG_CPUSET.

    - Fix RT and deadline utilization tracking across policy changes

    - Introduce a "burstable" CFS controller via cgroups, which allows
      bursty CPU-bound workloads to borrow a bit against their future
      quota to improve overall latencies & batching. Can be tweaked via
      /sys/fs/cgroup/cpu/<X>/cpu.cfs_burst_us.

    - Rework assymetric topology/capacity detection & handling.

 - Scheduler statistics & tooling:

    - Disable delayacct by default, but add a sysctl to enable it at
      runtime if tooling needs it. Use static keys and other
      optimizations to make it more palatable.

    - Use sched_clock() in delayacct, instead of ktime_get_ns().

 - Misc cleanups and fixes.

* tag 'sched-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
  sched/doc: Update the CPU capacity asymmetry bits
  sched/topology: Rework CPU capacity asymmetry detection
  sched/core: Introduce SD_ASYM_CPUCAPACITY_FULL sched_domain flag
  psi: Fix race between psi_trigger_create/destroy
  sched/fair: Introduce the burstable CFS controller
  sched/uclamp: Fix uclamp_tg_restrict()
  sched/rt: Fix Deadline utilization tracking during policy change
  sched/rt: Fix RT utilization tracking during policy change
  sched: Change task_struct::state
  sched,arch: Remove unused TASK_STATE offsets
  sched,timer: Use __set_current_state()
  sched: Add get_current_state()
  sched,perf,kvm: Fix preemption condition
  sched: Introduce task_is_running()
  sched: Unbreak wakeups
  sched/fair: Age the average idle time
  sched/cpufreq: Consider reduced CPU capacity in energy calculation
  sched/fair: Take thermal pressure into account while estimating energy
  thermal/cpufreq_cooling: Update offline CPUs per-cpu thermal_pressure
  sched/fair: Return early from update_tg_cfs_load() if delta == 0
  ...
2021-06-28 12:14:19 -07:00
Linus Torvalds
28a27cbd86 Merge tag 'perf-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events updates from Ingo Molnar:

 - Platform PMU driver updates:

     - x86 Intel uncore driver updates for Skylake (SNR) and Icelake (ICX) servers
     - Fix RDPMC support
     - Fix [extended-]PEBS-via-PT support
     - Fix Sapphire Rapids event constraints
     - Fix :ppp support on Sapphire Rapids
     - Fix fixed counter sanity check on Alder Lake & X86_FEATURE_HYBRID_CPU
     - Other heterogenous-PMU fixes

 - Kprobes:

     - Remove the unused and misguided kprobe::fault_handler callbacks.
     - Warn about kprobes taking a page fault.
     - Fix the 'nmissed' stat counter.

 - Misc cleanups and fixes.

* tag 'perf-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Fix task context PMU for Hetero
  perf/x86/intel: Fix instructions:ppp support in Sapphire Rapids
  perf/x86/intel: Add more events requires FRONTEND MSR on Sapphire Rapids
  perf/x86/intel: Fix fixed counter check warning for some Alder Lake
  perf/x86/intel: Fix PEBS-via-PT reload base value for Extended PEBS
  perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task
  kprobes: Do not increment probe miss count in the fault handler
  x86,kprobes: WARN if kprobes tries to handle a fault
  kprobes: Remove kprobe::fault_handler
  uprobes: Update uprobe_write_opcode() kernel-doc comment
  perf/hw_breakpoint: Fix DocBook warnings in perf hw_breakpoint
  perf/core: Fix DocBook warnings
  perf/core: Make local function perf_pmu_snapshot_aux() static
  perf/x86/intel/uncore: Enable I/O stacks to IIO PMON mapping on ICX
  perf/x86/intel/uncore: Enable I/O stacks to IIO PMON mapping on SNR
  perf/x86/intel/uncore: Generalize I/O stacks to PMON mapping procedure
  perf/x86/intel/uncore: Drop unnecessary NULL checks after container_of()
2021-06-28 12:03:20 -07:00
Linus Torvalds
b89c07dea1 Merge tags 'objtool-urgent-2021-06-28' and 'objtool-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fix and updates from Ingo Molnar:
 "An ELF format fix for a section flags mismatch bug that breaks kernel
  tooling such as kpatch-build.

  The biggest change in this cycle is the new code to handle and rewrite
  variable sized jump labels - which results in slightly tighter code
  generation in hot paths, through the use of short(er) NOPs.

  Also a number of cleanups and fixes, and a change to the generic
  include/linux/compiler.h to handle a s390 GCC quirk"

* tag 'objtool-urgent-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Don't make .altinstructions writable

* tag 'objtool-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Improve reloc hash size guestimate
  instrumentation.h: Avoid using inline asm operand modifiers
  compiler.h: Avoid using inline asm operand modifiers
  kbuild: Fix objtool dependency for 'OBJECT_FILES_NON_STANDARD_<obj> := n'
  objtool: Reflow handle_jump_alt()
  jump_label/x86: Remove unused JUMP_LABEL_NOP_SIZE
  jump_label, x86: Allow short NOPs
  objtool: Provide stats for jump_labels
  objtool: Rewrite jump_label instructions
  objtool: Decode jump_entry::key addend
  jump_label, x86: Emit short JMP
  jump_label: Free jump_entry::key bit1 for build use
  jump_label, x86: Add variable length patching support
  jump_label, x86: Introduce jump_entry_size()
  jump_label, x86: Improve error when we fail expected text
  jump_label, x86: Factor out the __jump_table generation
  jump_label, x86: Strip ASM jump_label support
  x86, objtool: Dont exclude arch/x86/realmode/
  objtool: Rewrite hashtable sizing
2021-06-28 11:35:55 -07:00
Linus Torvalds
d04f7de0a5 Merge tag 'x86_sev_for_v5.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SEV updates from Borislav Petkov:

 - Differentiate the type of exception the #VC handler raises depending
   on code executed in the guest and handle the case where failure to
   get the RIP would result in a #GP, as it should, instead of in a #PF

 - Disable interrupts while the per-CPU GHCB is held

 - Split the #VC handler depending on where the #VC exception has
   happened and therefore provide for precise context tracking like the
   rest of the exception handlers deal with noinstr regions now

 - Add defines for the GHCB version 2 protocol so that further shared
   development with KVM can happen without merge conflicts

 - The usual small cleanups

* tag 'x86_sev_for_v5.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sev: Use "SEV: " prefix for messages from sev.c
  x86/sev: Add defines for GHCB version 2 MSR protocol requests
  x86/sev: Split up runtime #VC handler for correct state tracking
  x86/sev: Make sure IRQs are disabled while GHCB is active
  x86/sev: Propagate #GP if getting linear instruction address failed
  x86/insn: Extend error reporting from insn_fetch_from_user[_inatomic]()
  x86/insn-eval: Make 0 a valid RIP for insn_get_effective_ip()
  x86/sev: Fix error message in runtime #VC handler
2021-06-28 11:29:12 -07:00
Linus Torvalds
2594b713c1 Merge tag 'x86_cpu_for_v5.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu updates from Borislav Petkov:

 - New AMD models support

 - Allow MONITOR/MWAIT to be used for C1 state entry on Hygon too

 - Use the special RAPL CPUID bit to detect the functionality on AMD and
   Hygon instead of doing family matching.

 - Add support for new Intel microcode deprecating TSX on some models
   and do not enable kernel workarounds for those CPUs when TSX
   transactions always abort, as a result of that microcode update.

* tag 'x86_cpu_for_v5.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tsx: Clear CPUID bits when TSX always force aborts
  x86/events/intel: Do not deploy TSX force abort workaround when TSX is deprecated
  x86/msr: Define new bits in TSX_FORCE_ABORT MSR
  perf/x86/rapl: Use CPUID bit on AMD and Hygon parts
  x86/cstate: Allow ACPI C1 FFH MWAIT use on Hygon systems
  x86/amd_nb: Add AMD family 19h model 50h PCI ids
  x86/cpu: Fix core name for Sapphire Rapids
2021-06-28 11:22:40 -07:00
Linus Torvalds
f565b20734 Merge tag 'ras_core_for_v5.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS updates from Borislav Petkov:

 - Add the required information to the faked APEI-reported mem error so
   that the kernel properly attempts to offline the corresponding page,
   as it does for kernel-detected correctable errors.

 - Fix a typo in AMD's error descriptions.

* tag 'ras_core_for_v5.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  EDAC/mce_amd: Fix typo "FIfo" -> "Fifo"
  x86/mce: Include a MCi_MISC value in faked mce logs
  x86/MCE/AMD, EDAC/mce_amd: Add new SMCA bank types
2021-06-28 11:19:40 -07:00
Daniel Bristot de Oliveira
f7d9f6370e trace/osnoise: Fix 'no previous prototype' warnings
kernel test robot reported some osnoise functions with "no previous
prototype."

Fix these warnings by making local functions static, and by adding:

 void osnoise_trace_irq_entry(int id);
 void osnoise_trace_irq_exit(int id, const char *desc);

to include/linux/trace.h.

Link: https://lkml.kernel.org/r/e40d3cb4be8bde921f4b40fa6a095cf85ab807bd.1624872608.git.bristot@redhat.com

Fixes: bce29ac9ce ("trace: Add osnoise tracer")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-06-28 14:12:26 -04:00
Daniel Bristot de Oliveira
bce29ac9ce trace: Add osnoise tracer
In the context of high-performance computing (HPC), the Operating System
Noise (*osnoise*) refers to the interference experienced by an application
due to activities inside the operating system. In the context of Linux,
NMIs, IRQs, SoftIRQs, and any other system thread can cause noise to the
system. Moreover, hardware-related jobs can also cause noise, for example,
via SMIs.

The osnoise tracer leverages the hwlat_detector by running a similar
loop with preemption, SoftIRQs and IRQs enabled, thus allowing all
the sources of *osnoise* during its execution. Using the same approach
of hwlat, osnoise takes note of the entry and exit point of any
source of interferences, increasing a per-cpu interference counter. The
osnoise tracer also saves an interference counter for each source of
interference. The interference counter for NMI, IRQs, SoftIRQs, and
threads is increased anytime the tool observes these interferences' entry
events. When a noise happens without any interference from the operating
system level, the hardware noise counter increases, pointing to a
hardware-related noise. In this way, osnoise can account for any
source of interference. At the end of the period, the osnoise tracer
prints the sum of all noise, the max single noise, the percentage of CPU
available for the thread, and the counters for the noise sources.

Usage

Write the ASCII text "osnoise" into the current_tracer file of the
tracing system (generally mounted at /sys/kernel/tracing).

For example::

        [root@f32 ~]# cd /sys/kernel/tracing/
        [root@f32 tracing]# echo osnoise > current_tracer

It is possible to follow the trace by reading the trace trace file::

        [root@f32 tracing]# cat trace
        # tracer: osnoise
        #
        #                                _-----=> irqs-off
        #                               / _----=> need-resched
        #                              | / _---=> hardirq/softirq
        #                              || / _--=> preempt-depth                            MAX
        #                              || /                                             SINGLE     Interference counters:
        #                              ||||               RUNTIME      NOISE   % OF CPU  NOISE    +-----------------------------+
        #           TASK-PID      CPU# ||||   TIMESTAMP    IN US       IN US  AVAILABLE  IN US     HW    NMI    IRQ   SIRQ THREAD
        #              | |         |   ||||      |           |             |    |            |      |      |      |      |      |
                   <...>-859     [000] ....    81.637220: 1000000        190  99.98100       9     18      0   1007     18      1
                   <...>-860     [001] ....    81.638154: 1000000        656  99.93440      74     23      0   1006     16      3
                   <...>-861     [002] ....    81.638193: 1000000       5675  99.43250     202      6      0   1013     25     21
                   <...>-862     [003] ....    81.638242: 1000000        125  99.98750      45      1      0   1011     23      0
                   <...>-863     [004] ....    81.638260: 1000000       1721  99.82790     168      7      0   1002     49     41
                   <...>-864     [005] ....    81.638286: 1000000        263  99.97370      57      6      0   1006     26      2
                   <...>-865     [006] ....    81.638302: 1000000        109  99.98910      21      3      0   1006     18      1
                   <...>-866     [007] ....    81.638326: 1000000       7816  99.21840     107      8      0   1016     39     19

In addition to the regular trace fields (from TASK-PID to TIMESTAMP), the
tracer prints a message at the end of each period for each CPU that is
running an osnoise/CPU thread. The osnoise specific fields report:

 - The RUNTIME IN USE reports the amount of time in microseconds that
   the osnoise thread kept looping reading the time.
 - The NOISE IN US reports the sum of noise in microseconds observed
   by the osnoise tracer during the associated runtime.
 - The % OF CPU AVAILABLE reports the percentage of CPU available for
   the osnoise thread during the runtime window.
 - The MAX SINGLE NOISE IN US reports the maximum single noise observed
   during the runtime window.
 - The Interference counters display how many each of the respective
   interference happened during the runtime window.

Note that the example above shows a high number of HW noise samples.
The reason being is that this sample was taken on a virtual machine,
and the host interference is detected as a hardware interference.

Tracer options

The tracer has a set of options inside the osnoise directory, they are:

 - osnoise/cpus: CPUs at which a osnoise thread will execute.
 - osnoise/period_us: the period of the osnoise thread.
 - osnoise/runtime_us: how long an osnoise thread will look for noise.
 - osnoise/stop_tracing_us: stop the system tracing if a single noise
   higher than the configured value happens. Writing 0 disables this
   option.
 - osnoise/stop_tracing_total_us: stop the system tracing if total noise
   higher than the configured value happens. Writing 0 disables this
   option.
 - tracing_threshold: the minimum delta between two time() reads to be
   considered as noise, in us. When set to 0, the default value will
   be used, which is currently 5 us.

Additional Tracing

In addition to the tracer, a set of tracepoints were added to
facilitate the identification of the osnoise source.

 - osnoise:sample_threshold: printed anytime a noise is higher than
   the configurable tolerance_ns.
 - osnoise:nmi_noise: noise from NMI, including the duration.
 - osnoise:irq_noise: noise from an IRQ, including the duration.
 - osnoise:softirq_noise: noise from a SoftIRQ, including the
   duration.
 - osnoise:thread_noise: noise from a thread, including the duration.

Note that all the values are *net values*. For example, if while osnoise
is running, another thread preempts the osnoise thread, it will start a
thread_noise duration at the start. Then, an IRQ takes place, preempting
the thread_noise, starting a irq_noise. When the IRQ ends its execution,
it will compute its duration, and this duration will be subtracted from
the thread_noise, in such a way as to avoid the double accounting of the
IRQ execution. This logic is valid for all sources of noise.

Here is one example of the usage of these tracepoints::

       osnoise/8-961     [008] d.h.  5789.857532: irq_noise: local_timer:236 start 5789.857529929 duration 1845 ns
       osnoise/8-961     [008] dNh.  5789.858408: irq_noise: local_timer:236 start 5789.858404871 duration 2848 ns
     migration/8-54      [008] d...  5789.858413: thread_noise: migration/8:54 start 5789.858409300 duration 3068 ns
       osnoise/8-961     [008] ....  5789.858413: sample_threshold: start 5789.858404555 duration 8723 ns interferences 2

In this example, a noise sample of 8 microseconds was reported in the last
line, pointing to two interferences. Looking backward in the trace, the
two previous entries were about the migration thread running after a
timer IRQ execution. The first event is not part of the noise because
it took place one millisecond before.

It is worth noticing that the sum of the duration reported in the
tracepoints is smaller than eight us reported in the sample_threshold.
The reason roots in the overhead of the entry and exit code that happens
before and after any interference execution. This justifies the dual
approach: measuring thread and tracing.

Link: https://lkml.kernel.org/r/e649467042d60e7b62714c9c6751a56299d15119.1624372313.git.bristot@redhat.com

Cc: Phil Auld <pauld@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Kate Carcia <kcarcia@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexandre Chartre <alexandre.chartre@oracle.com>
Cc: Clark Willaims <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
[
  Made the following functions static:
   trace_irqentry_callback()
   trace_irqexit_callback()
   trace_intel_irqentry_callback()
   trace_intel_irqexit_callback()

  Added to include/trace.h:
   osnoise_arch_register()
   osnoise_arch_unregister()

  Fixed define logic for LATENCY_FS_NOTIFY

  Reported-by: kernel test robot <lkp@intel.com>
]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-06-25 19:57:01 -04:00
Paolo Bonzini
b8917b4ae4 Merge tag 'kvmarm-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for v5.14.

- Add MTE support in guests, complete with tag save/restore interface
- Reduce the impact of CMOs by moving them in the page-table code
- Allow device block mappings at stage-2
- Reduce the footprint of the vmemmap in protected mode
- Support the vGIC on dumb systems such as the Apple M1
- Add selftest infrastructure to support multiple configuration
  and apply that to PMU/non-PMU setups
- Add selftests for the debug architecture
- The usual crop of PMU fixes
2021-06-25 11:24:24 -04:00
Thomas Gleixner
93c2cdc975 x86/fpu/xstate: Clear xstate header in copy_xstate_to_uabi_buf() again
The change which made copy_xstate_to_uabi_buf() usable for
[x]fpregs_get() removed the zeroing of the header which means the
header, which is copied to user space later, contains except for the
xfeatures member, random stack content.

Add the memset() back to zero it before usage.

Fixes: eb6f51723f ("x86/fpu: Make copy_xstate_to_kernel() usable for [x]fpregs_get()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/875yy3wb8h.ffs@nanos.tec.linutronix.de
2021-06-24 17:19:51 +02:00
Fabio M. De Francesco
fd2afa70ef x86/resctrl: Fix kernel-doc in internal.h
Add description of undocumented parameters. Issues detected by
scripts/kernel-doc.

Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/20210618223206.29539-1-fmdefrancesco@gmail.com
2021-06-24 10:23:57 +02:00
Fabio M. De Francesco
f9b871c89a x86/resctrl: Fix kernel-doc in pseudo_lock.c
Add undocumented parameters detected by scripts/kernel-doc.

Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/20210616181530.4094-1-fmdefrancesco@gmail.com
2021-06-24 10:21:05 +02:00