Commit Graph

18764 Commits

Author SHA1 Message Date
Vernon Lovejoy
2e4be0d011 x86/show_trace_log_lvl: Ensure stack pointer is aligned, again
The commit e335bb51cc ("x86/unwind: Ensure stack pointer is aligned")
tried to align the stack pointer in show_trace_log_lvl(), otherwise the
"stack < stack_info.end" check can't guarantee that the last read does
not go past the end of the stack.

However, we have the same problem with the initial value of the stack
pointer, it can also be unaligned. So without this patch this trivial
kernel module

	#include <linux/module.h>

	static int init(void)
	{
		asm volatile("sub    $0x4,%rsp");
		dump_stack();
		asm volatile("add    $0x4,%rsp");

		return -EAGAIN;
	}

	module_init(init);
	MODULE_LICENSE("GPL");

crashes the kernel.

Fixes: e335bb51cc ("x86/unwind: Ensure stack pointer is aligned")
Signed-off-by: Vernon Lovejoy <vlovejoy@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20230512104232.GA10227@redhat.com
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2023-05-16 06:31:04 -07:00
Jiapeng Chong
95f0e3a209 x86/unwind/orc: Use swap() instead of open coding it
Swap is a function interface that provides exchange function. To avoid
code duplication, we can use swap function.

./arch/x86/kernel/unwind_orc.c:235:16-17: WARNING opportunity for swap().

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4641
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230330020014.40489-1-jiapeng.chong@linux.alibaba.com
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2023-05-16 06:06:56 -07:00
Yazen Ghannam
e40879b6d7 x86/MCE: Check a hw error's address to determine proper recovery action
Make sure that machine check errors with a usable address are properly
marked as poison.

This is needed for errors that occur on memory which have
MCG_STATUS[RIPV] clear - i.e., the interrupted process cannot be
restarted reliably. One example is data poison consumption through the
instruction fetch units on AMD Zen-based systems.

The MF_MUST_KILL flag is passed to memory_failure() when
MCG_STATUS[RIPV] is not set. So the associated process will still be
killed.  What this does, practically, is get rid of one more check to
kill_current_task with the eventual goal to remove it completely.

Also, make the handling identical to what is done on the notifier path
(uc_decode_notifier() does that address usability check too).

The scenario described above occurs when hardware can precisely identify
the address of poisoned memory, but execution cannot reliably continue
for the interrupted hardware thread.

  [ bp: Massage commit message. ]

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20230322005131.174499-1-tony.luck@intel.com
2023-05-16 12:16:22 +02:00
Lukas Bulwahn
7583e8fbdc x86/cpu: Remove X86_FEATURE_NAMES
While discussing to change the visibility of X86_FEATURE_NAMES (see Link)
in order to remove CONFIG_EMBEDDED, Boris suggested to simply make the
X86_FEATURE_NAMES functionality unconditional.

As the need for really tiny kernel images has gone away and kernel images
with !X86_FEATURE_NAMES are hardly tested, remove this config and the whole
ifdeffery in the source code.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/all/20230509084007.24373-1-lukas.bulwahn@gmail.com/
Link: https://lore.kernel.org/r/20230510065713.10996-3-lukas.bulwahn@gmail.com
2023-05-15 20:03:08 +02:00
Thomas Gleixner
0c7ffa32db x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup() and enable it
Implement the validation function which tells the core code whether
parallel bringup is possible.

The only condition for now is that the kernel does not run in an encrypted
guest as these will trap the RDMSR via #VC, which cannot be handled at that
point in early startup.

There was an earlier variant for AMD-SEV which used the GHBC protocol for
retrieving the APIC ID via CPUID, but there is no guarantee that the
initial APIC ID in CPUID is the same as the real APIC ID. There is no
enforcement from the secure firmware and the hypervisor can assign APIC IDs
as it sees fit as long as the ACPI/MADT table is consistent with that
assignment.

Unfortunately there is no RDMSR GHCB protocol at the moment, so enabling
AMD-SEV guests for parallel startup needs some more thought.

Intel-TDX provides a secure RDMSR hypercall, but supporting that is outside
the scope of this change.

Fixup announce_cpu() as e.g. on Hyper-V CPU1 is the secondary sibling of
CPU0, which makes the @cpu == 1 logic in announce_cpu() fall apart.

[ mikelley: Reported the announce_cpu() fallout

Originally-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205257.467571745@linutronix.de
2023-05-15 13:45:05 +02:00
David Woodhouse
7e75178a09 x86/smpboot: Support parallel startup of secondary CPUs
In parallel startup mode the APs are kicked alive by the control CPU
quickly after each other and run through the early startup code in
parallel. The real-mode startup code is already serialized with a
bit-spinlock to protect the real-mode stack.

In parallel startup mode the smpboot_control variable obviously cannot
contain the Linux CPU number so the APs have to determine their Linux CPU
number on their own. This is required to find the CPUs per CPU offset in
order to find the idle task stack and other per CPU data.

To achieve this, export the cpuid_to_apicid[] array so that each AP can
find its own CPU number by searching therein based on its APIC ID.

Introduce a flag in the top bits of smpboot_control which indicates that
the AP should find its CPU number by reading the APIC ID from the APIC.

This is required because CPUID based APIC ID retrieval can only provide the
initial APIC ID, which might have been overruled by the firmware. Some AMD
APUs come up with APIC ID = initial APIC ID + 0x10, so the APIC ID to CPU
number lookup would fail miserably if based on CPUID. Also virtualization
can make its own APIC ID assignements. The only requirement is that the
APIC IDs are consistent with the APCI/MADT table.

For the boot CPU or in case parallel bringup is disabled the control bits
are empty and the CPU number is directly available in bit 0-23 of
smpboot_control.

[ tglx: Initial proof of concept patch with bitlock and APIC ID lookup ]
[ dwmw2: Rework and testing, commit message, CPUID 0x1 and CPU0 support ]
[ seanc: Fix stray override of initial_gs in common_cpu_up() ]
[ Oleksandr Natalenko: reported suspend/resume issue fixed in
  x86_acpi_suspend_lowlevel ]
[ tglx: Make it read the APIC ID from the APIC instead of using CPUID,
  	split the bitlock part out ]

Co-developed-by: Thomas Gleixner <tglx@linutronix.de>
Co-developed-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205257.411554373@linutronix.de
2023-05-15 13:45:04 +02:00
Thomas Gleixner
f6f1ae9128 x86/smpboot: Implement a bit spinlock to protect the realmode stack
Parallel AP bringup requires that the APs can run fully parallel through
the early startup code including the real mode trampoline.

To prepare for this implement a bit-spinlock to serialize access to the
real mode stack so that parallel upcoming APs are not going to corrupt each
others stack while going through the real mode startup code.

Co-developed-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205257.355425551@linutronix.de
2023-05-15 13:45:03 +02:00
Thomas Gleixner
bea629d57d x86/apic: Save the APIC virtual base address
For parallel CPU brinugp it's required to read the APIC ID in the low level
startup code. The virtual APIC base address is a constant because its a
fix-mapped address. Exposing that constant which is composed via macros to
assembly code is non-trivial due to header inclusion hell.

Aside of that it's constant only because of the vsyscall ABI
requirement. Once vsyscall is out of the picture the fixmap can be placed
at runtime.

Avoid header hell, stay flexible and store the address in a variable which
can be exposed to the low level startup code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205257.299231005@linutronix.de
2023-05-15 13:45:03 +02:00
Thomas Gleixner
f54d4434c2 x86/apic: Provide cpu_primary_thread mask
Make the primary thread tracking CPU mask based in preparation for simpler
handling of parallel bootup.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205257.186599880@linutronix.de
2023-05-15 13:45:02 +02:00
Thomas Gleixner
8b5a0f957c x86/smpboot: Enable split CPU startup
The x86 CPU bringup state currently does AP wake-up, wait for AP to
respond and then release it for full bringup.

It is safe to be split into a wake-up and and a separate wait+release
state.

Provide the required functions and enable the split CPU bringup, which
prepares for parallel bringup, where the bringup of the non-boot CPUs takes
two iterations: One to prepare and wake all APs and the second to wait and
release them. Depending on timing this can eliminate the wait time
completely.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205257.133453992@linutronix.de
2023-05-15 13:45:01 +02:00
Thomas Gleixner
2711b8e2b7 x86/smpboot: Switch to hotplug core state synchronization
The new AP state tracking and synchronization mechanism in the CPU hotplug
core code allows to remove quite some x86 specific code:

  1) The AP alive synchronization based on cpumasks

  2) The decision whether an AP can be brought up again

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205256.529657366@linutronix.de
2023-05-15 13:44:56 +02:00
Thomas Gleixner
e464640cf7 x86/smpboot: Remove wait for cpu_online()
Now that the core code drops sparse_irq_lock after the idle thread
synchronized, it's pointless to wait for the AP to mark itself online.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205256.316417181@linutronix.de
2023-05-15 13:44:54 +02:00
Thomas Gleixner
c8b7fb09d1 x86/smpboot: Remove cpu_callin_mask
Now that TSC synchronization is SMP function call based there is no reason
to wait for the AP to be set in smp_callin_mask. The control CPU waits for
the AP to set itself in the online mask anyway.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205256.206394064@linutronix.de
2023-05-15 13:44:53 +02:00
Thomas Gleixner
9d349d47f0 x86/smpboot: Make TSC synchronization function call based
Spin-waiting on the control CPU until the AP reaches the TSC
synchronization is just a waste especially in the case that there is no
synchronization required.

As the synchronization has to run with interrupts disabled the control CPU
part can just be done from a SMP function call. The upcoming AP issues that
call async only in the case that synchronization is required.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205256.148255496@linutronix.de
2023-05-15 13:44:53 +02:00
Thomas Gleixner
d4f28f07c2 x86/smpboot: Move synchronization masks to SMP boot code
The usage is in smpboot.c and not in the CPU initialization code.

The XEN_PV usage of cpu_callout_mask is obsolete as cpu_init() not longer
waits and cacheinfo has its own CPU mask now, so cpu_callout_mask can be
made static too.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205256.091511483@linutronix.de
2023-05-15 13:44:52 +02:00
Thomas Gleixner
a32226fa3b x86/cpu/cacheinfo: Remove cpu_callout_mask dependency
cpu_callout_mask is used for the stop machine based MTRR/PAT init.

In preparation of moving the BP/AP synchronization to the core hotplug
code, use a private CPU mask for cacheinfo and manage it in the
starting/dying hotplug state.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205256.035041005@linutronix.de
2023-05-15 13:44:52 +02:00
Thomas Gleixner
e94cd1503b x86/smpboot: Get rid of cpu_init_secondary()
The synchronization of the AP with the control CPU is a SMP boot problem
and has nothing to do with cpu_init().

Open code cpu_init_secondary() in start_secondary() and move
wait_for_master_cpu() into the SMP boot code.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205255.981999763@linutronix.de
2023-05-15 13:44:51 +02:00
David Woodhouse
2b3be65d2e x86/smpboot: Split up native_cpu_up() into separate phases and document them
There are four logical parts to what native_cpu_up() does on the BSP (or
on the controlling CPU for a later hotplug):

 1) Wake the AP by sending the INIT/SIPI/SIPI sequence.

 2) Wait for the AP to make it as far as wait_for_master_cpu() which
    sets that CPU's bit in cpu_initialized_mask, then sets the bit in
    cpu_callout_mask to let the AP proceed through cpu_init().

 3) Wait for the AP to finish cpu_init() and get as far as the
    smp_callin() call, which sets that CPU's bit in cpu_callin_mask.

 4) Perform the TSC synchronization and wait for the AP to actually
    mark itself online in cpu_online_mask.

In preparation to allow these phases to operate in parallel on multiple
APs, split them out into separate functions and document the interactions
a little more clearly in both the BP and AP code paths.

No functional change intended.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205255.928917242@linutronix.de
2023-05-15 13:44:51 +02:00
Thomas Gleixner
c7f15dd3f0 x86/smpboot: Remove unnecessary barrier()
Peter stumbled over the barrier() after the invocation of smp_callin() in
start_secondary():

  "...this barrier() and it's comment seem weird vs smp_callin(). That
   function ends with an atomic bitop (it has to, at the very least it must
   not be weaker than store-release) but also has an explicit wmb() to order
   setup vs CPU_STARTING.

   There is no way the smp_processor_id() referred to in this comment can land
   before cpu_init() even without the barrier()."

The barrier() along with the comment was added in 2003 with commit
d8f19f2cac70 ("[PATCH] x86-64 merge") in the history tree. One of those
well documented combo patches of that time which changes world and some
more. The context back then was:

	/*
	 * Dont put anything before smp_callin(), SMP
	 * booting is too fragile that we want to limit the
	 * things done here to the most necessary things.
	 */
	cpu_init();
	smp_callin();

+	/* otherwise gcc will move up smp_processor_id before the cpu_init */
+ 	barrier();

	Dprintk("cpu %d: waiting for commence\n", smp_processor_id());

Even back in 2003 the compiler was not allowed to reorder that
smp_processor_id() invocation before the cpu_init() function call.
Especially not as smp_processor_id() resolved to:

  asm volatile("movl %%gs:%c1,%0":"=r" (ret__):"i"(pda_offset(field)):"memory");

There is no trace of this change in any mailing list archive including the
back then official x86_64 list discuss@x86-64.org, which would explain the
problem this change solved.

The debug prints are gone by now and the the only smp_processor_id()
invocation today is farther down in start_secondary() after locking
vector_lock which itself prevents reordering.

Even if the compiler would be allowed to reorder this, the code would still
be correct as GSBASE is set up early in the assembly code and is valid when
the CPU reaches start_secondary(), while the code at the time when this
barrier was added did the GSBASE setup in cpu_init().

As the barrier has zero value, remove it.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205255.875713771@linutronix.de
2023-05-15 13:44:50 +02:00
Thomas Gleixner
cded367976 x86/smpboot: Restrict soft_restart_cpu() to SEV
Now that the CPU0 hotplug cruft is gone, the only user is AMD SEV.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205255.822234014@linutronix.de
2023-05-15 13:44:50 +02:00
Thomas Gleixner
5475abbde7 x86/smpboot: Remove the CPU0 hotplug kludge
This was introduced with commit e1c467e690 ("x86, hotplug: Wake up CPU0
via NMI instead of INIT, SIPI, SIPI") to eventually support physical
hotplug of CPU0:

 "We'll change this code in the future to wake up hard offlined CPU0 if
  real platform and request are available."

11 years later this has not happened and physical hotplug is not officially
supported. Remove the cruft.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205255.768845190@linutronix.de
2023-05-15 13:44:49 +02:00
Thomas Gleixner
e59e74dc48 x86/topology: Remove CPU0 hotplug option
This was introduced together with commit e1c467e690 ("x86, hotplug: Wake
up CPU0 via NMI instead of INIT, SIPI, SIPI") to eventually support
physical hotplug of CPU0:

 "We'll change this code in the future to wake up hard offlined CPU0 if
  real platform and request are available."

11 years later this has not happened and physical hotplug is not officially
supported. Remove the cruft.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205255.715707999@linutronix.de
2023-05-15 13:44:49 +02:00
Thomas Gleixner
666e1156b2 x86/smpboot: Rename start_cpu0() to soft_restart_cpu()
This is used in the SEV play_dead() implementation to re-online CPUs. But
that has nothing to do with CPU0.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205255.662319599@linutronix.de
2023-05-15 13:44:48 +02:00
Thomas Gleixner
134a12827b x86/smpboot: Avoid pointless delay calibration if TSC is synchronized
When TSC is synchronized across sockets then there is no reason to
calibrate the delay for the first CPU which comes up on a socket.

Just reuse the existing calibration value.

This removes 100ms pointlessly wasted time from CPU hotplug per socket.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205255.608773568@linutronix.de
2023-05-15 13:44:48 +02:00
Thomas Gleixner
ba831b7b1a cpu/hotplug: Mark arch_disable_smp_support() and bringup_nonboot_cpus() __init
No point in keeping them around.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205255.551974164@linutronix.de
2023-05-15 13:44:47 +02:00
Thomas Gleixner
5107e3ebb8 x86/smpboot: Cleanup topology_phys_to_logical_pkg()/die()
Make topology_phys_to_logical_pkg_die() static as it's only used in
smpboot.c and fixup the kernel-doc warnings for both functions.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205255.493750666@linutronix.de
2023-05-15 13:44:47 +02:00
Borislav Petkov (AMD)
d42a2a8912 x86/alternatives: Fix section mismatch warnings
Fix stuff like:

  WARNING: modpost: vmlinux.o: section mismatch in reference: \
  __optimize_nops (section: .text) -> debug_alternative (section: .init.data)

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230513160146.16039-1-bp@alien8.de
2023-05-13 18:04:42 +02:00
Borislav Petkov (AMD)
d2408e043e x86/alternative: Optimize returns patching
Instead of decoding each instruction in the return sites range only to
realize that that return site is a jump to the default return thunk
which is needed - X86_FEATURE_RETHUNK is enabled - lift that check
before the loop and get rid of that loop overhead.

Add comments about what gets patched, while at it.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230512120952.7924-1-bp@alien8.de
2023-05-12 17:53:18 +02:00
Peter Zijlstra
b6c881b248 x86/alternative: Complicate optimize_nops() some more
Because:

  SMP alternatives: ffffffff810026dc: [2:44) optimized NOPs: eb 2a eb 28 cc cc
    cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
    cc cc cc cc cc cc cc cc cc cc cc cc cc

is quite daft, make things more complicated and have the NOP runlength
detection eat the preceding JMP if they both end at the same target.

  SMP alternatives: ffffffff810026dc: [0:44) optimized NOPs: eb 2a cc cc cc cc
    cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
    cc cc cc cc cc cc cc cc cc cc cc cc cc

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230208171431.433132442@infradead.org
2023-05-11 17:34:20 +02:00
Peter Zijlstra
6c480f2221 x86/alternative: Rewrite optimize_nops() some
Address two issues:

 - it no longer hard requires single byte NOP runs - now it accepts any
   NOP and NOPL encoded instruction (but not the more complicated 32bit
   NOPs).

 - it writes a single 'instruction' replacement.

Specifically, ORC unwinder relies on the tail NOP of an alternative to
be a single instruction. In particular, it relies on the inner bytes not
being executed.

Once the max supported NOP length has been reached (currently 8, could easily
be extended to 11 on x86_64), switch to JMP.d8 and INT3 padding to
achieve the same result.

Objtool uses this guarantee in the analysis of alternative/overlapping
CFI state for the ORC unwinder data. Every instruction edge gets a CFI
state and the more instructions the larger the chance of conflicts.

  [ bp:
  - Add a comment over add_nop() to explain why it does it this way
  - Make add_nops() PARAVIRT only as it is used solely there now ]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230208171431.373412974@infradead.org
2023-05-11 17:33:36 +02:00
Peter Zijlstra
270a69c448 x86/alternative: Support relocations in alternatives
A little while ago someone (Kirill) ran into the whole 'alternatives don't
do relocations nonsense' again and I got annoyed enough to actually look
at the code.

Since the whole alternative machinery already fully decodes the
instructions it is simple enough to adjust immediates and displacement
when needed. Specifically, the immediates for IP modifying instructions
(JMP, CALL, Jcc) and the displacement for RIP-relative instructions.

  [ bp: Massage comment some more and get rid of third loop in
    apply_relocation(). ]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230208171431.313857925@infradead.org
2023-05-10 14:47:08 +02:00
Peter Zijlstra
6becb5026b x86/alternative: Make debug-alternative selective
Using debug-alternative generates a *LOT* of output, extend it a bit
to select which of the many rewrites it reports on.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230208171431.253636689@infradead.org
2023-05-10 13:48:46 +02:00
Juergen Gross
f6b980646b x86/mtrr: Remove physical address size calculation
The physical address width calculation in mtrr_bp_init() can easily be
replaced with using the already available value x86_phys_bits from
struct cpuinfo_x86.

The same information source can be used in mtrr/cleanup.c, removing the
need to pass that value on to mtrr_cleanup().

In print_mtrr_state() use x86_phys_bits instead of recalculating it
from size_or_mask.

Move setting of size_or_mask and size_and_mask into a dedicated new
function in mtrr/generic.c, enabling to make those 2 variables static,
as they are used in generic.c only now.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20230502120931.20719-2-jgross@suse.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-05-09 14:13:30 +02:00
Nathan Fontenot
e281d5cad1 x86/microcode/amd: Remove unneeded pointer arithmetic
Remove unneeded pointer increment arithmetic, the pointer is
set at the beginning of the loop.

Signed-off-by: Nathan Fontenot <nathan.fontenot@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230502174232.73880-1-nathan.fontenot@amd.com
2023-05-08 14:38:38 +02:00
Borislav Petkov (AMD)
37a19366e1 x86/microcode/AMD: Get rid of __find_equiv_id()
Merge it into its only call site.

No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230227160352.7260-1-bp@alien8.de
2023-05-08 14:24:11 +02:00
Borislav Petkov (AMD)
f710ac5442 x86/sev: Get rid of special sev_es_enable_key
A SEV-ES guest is active on AMD when CC_ATTR_GUEST_STATE_ENCRYPT is set.
I.e., MSR_AMD64_SEV, bit 1, SEV_ES_Enabled. So no need for a special
static key.

No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20230328201712.25852-3-bp@alien8.de
2023-05-08 11:49:29 +02:00
Mario Limonciello
23a5b8bb02 x86/amd_nb: Add PCI ID for family 19h model 78h
Commit

  310e782a99 ("platform/x86/amd: pmc: Utilize SMN index 0 for driver probe")

switched to using amd_smn_read() which relies upon the misc PCI ID used
by DF function 3 being included in a table.  The ID for model 78h is
missing in that table, so amd_smn_read() doesn't work.

Add the missing ID into amd_nb, restoring s2idle on this system.

  [ bp: Simplify commit message. ]

Fixes: 310e782a99 ("platform/x86/amd: pmc: Utilize SMN index 0 for driver probe")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>  # pci_ids.h
Acked-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20230427053338.16653-2-mario.limonciello@amd.com
2023-05-08 11:25:19 +02:00
Chen Yu
044f0e27de x86/sched: Add the SD_ASYM_PACKING flag to the die domain of hybrid processors
Intel Meteor Lake hybrid processors have cores in two separate dies. The
cores in one of the dies have higher maximum frequency. Use the SD_ASYM_
PACKING flag to give higher priority to the die with CPUs of higher maximum
frequency.

Suggested-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230406203148.19182-13-ricardo.neri-calderon@linux.intel.com
2023-05-08 10:58:38 +02:00
Ricardo Neri
046a5a95c3 x86/sched/itmt: Give all SMT siblings of a core the same priority
X86 does not have the SD_ASYM_PACKING flag in the SMT domain. The scheduler
knows how to handle SMT and non-SMT cores of different priority. There is
no reason for SMT siblings of a core to have different priorities.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Len Brown <len.brown@intel.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Link: https://lore.kernel.org/r/20230406203148.19182-12-ricardo.neri-calderon@linux.intel.com
2023-05-08 10:58:38 +02:00
Ricardo Neri
995998ebde x86/sched: Remove SD_ASYM_PACKING from the SMT domain flags
There is no difference between any of the SMT siblings of a physical core.
Do not do asym_packing load balancing at this level.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Link: https://lore.kernel.org/r/20230406203148.19182-11-ricardo.neri-calderon@linux.intel.com
2023-05-08 10:58:37 +02:00
Linus Torvalds
58390c8ce1 Merge tag 'iommu-updates-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:

 - Convert to platform remove callback returning void

 - Extend changing default domain to normal group

 - Intel VT-d updates:
     - Remove VT-d virtual command interface and IOASID
     - Allow the VT-d driver to support non-PRI IOPF
     - Remove PASID supervisor request support
     - Various small and misc cleanups

 - ARM SMMU updates:
     - Device-tree binding updates:
         * Allow Qualcomm GPU SMMUs to accept relevant clock properties
         * Document Qualcomm 8550 SoC as implementing an MMU-500
         * Favour new "qcom,smmu-500" binding for Adreno SMMUs

     - Fix S2CR quirk detection on non-architectural Qualcomm SMMU
       implementations

     - Acknowledge SMMUv3 PRI queue overflow when consuming events

     - Document (in a comment) why ATS is disabled for bypass streams

 - AMD IOMMU updates:
     - 5-level page-table support
     - NUMA awareness for memory allocations

 - Unisoc driver: Support for reattaching an existing domain

 - Rockchip driver: Add missing set_platform_dma_ops callback

 - Mediatek driver: Adjust the dma-ranges

 - Various other small fixes and cleanups

* tag 'iommu-updates-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (82 commits)
  iommu: Remove iommu_group_get_by_id()
  iommu: Make iommu_release_device() static
  iommu/vt-d: Remove BUG_ON in dmar_insert_dev_scope()
  iommu/vt-d: Remove a useless BUG_ON(dev->is_virtfn)
  iommu/vt-d: Remove BUG_ON in map/unmap()
  iommu/vt-d: Remove BUG_ON when domain->pgd is NULL
  iommu/vt-d: Remove BUG_ON in handling iotlb cache invalidation
  iommu/vt-d: Remove BUG_ON on checking valid pfn range
  iommu/vt-d: Make size of operands same in bitwise operations
  iommu/vt-d: Remove PASID supervisor request support
  iommu/vt-d: Use non-privileged mode for all PASIDs
  iommu/vt-d: Remove extern from function prototypes
  iommu/vt-d: Do not use GFP_ATOMIC when not needed
  iommu/vt-d: Remove unnecessary checks in iopf disabling path
  iommu/vt-d: Move PRI handling to IOPF feature path
  iommu/vt-d: Move pfsid and ats_qdep calculation to device probe path
  iommu/vt-d: Move iopf code from SVA to IOPF enabling path
  iommu/vt-d: Allow SVA with device-specific IOPF
  dmaengine: idxd: Add enable/disable device IOPF feature
  arm64: dts: mt8186: Add dma-ranges for the parent "soc" node
  ...
2023-04-30 13:00:38 -07:00
Linus Torvalds
2aff7c706c Merge tag 'objtool-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool updates from Ingo Molnar:

 - Mark arch_cpu_idle_dead() __noreturn, make all architectures &
   drivers that did this inconsistently follow this new, common
   convention, and fix all the fallout that objtool can now detect
   statically

 - Fix/improve the ORC unwinder becoming unreliable due to
   UNWIND_HINT_EMPTY ambiguity, split it into UNWIND_HINT_END_OF_STACK
   and UNWIND_HINT_UNDEFINED to resolve it

 - Fix noinstr violations in the KCSAN code and the lkdtm/stackleak code

 - Generate ORC data for __pfx code

 - Add more __noreturn annotations to various kernel startup/shutdown
   and panic functions

 - Misc improvements & fixes

* tag 'objtool-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
  x86/hyperv: Mark hv_ghcb_terminate() as noreturn
  scsi: message: fusion: Mark mpt_halt_firmware() __noreturn
  x86/cpu: Mark {hlt,resume}_play_dead() __noreturn
  btrfs: Mark btrfs_assertfail() __noreturn
  objtool: Include weak functions in global_noreturns check
  cpu: Mark nmi_panic_self_stop() __noreturn
  cpu: Mark panic_smp_self_stop() __noreturn
  arm64/cpu: Mark cpu_park_loop() and friends __noreturn
  x86/head: Mark *_start_kernel() __noreturn
  init: Mark start_kernel() __noreturn
  init: Mark [arch_call_]rest_init() __noreturn
  objtool: Generate ORC data for __pfx code
  x86/linkage: Fix padding for typed functions
  objtool: Separate prefix code from stack validation code
  objtool: Remove superfluous dead_end_function() check
  objtool: Add symbol iteration helpers
  objtool: Add WARN_INSN()
  scripts/objdump-func: Support multiple functions
  context_tracking: Fix KCSAN noinstr violation
  objtool: Add stackleak instrumentation to uaccess safe list
  ...
2023-04-28 14:02:54 -07:00
Linus Torvalds
22b8cc3e78 Merge tag 'x86_mm_for_6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 LAM (Linear Address Masking) support from Dave Hansen:
 "Add support for the new Linear Address Masking CPU feature.

  This is similar to ARM's Top Byte Ignore and allows userspace to store
  metadata in some bits of pointers without masking it out before use"

* tag 'x86_mm_for_6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm/iommu/sva: Do not allow to set FORCE_TAGGED_SVA bit from outside
  x86/mm/iommu/sva: Fix error code for LAM enabling failure due to SVA
  selftests/x86/lam: Add test cases for LAM vs thread creation
  selftests/x86/lam: Add ARCH_FORCE_TAGGED_SVA test cases for linear-address masking
  selftests/x86/lam: Add inherit test cases for linear-address masking
  selftests/x86/lam: Add io_uring test cases for linear-address masking
  selftests/x86/lam: Add mmap and SYSCALL test cases for linear-address masking
  selftests/x86/lam: Add malloc and tag-bits test cases for linear-address masking
  x86/mm/iommu/sva: Make LAM and SVA mutually exclusive
  iommu/sva: Replace pasid_valid() helper with mm_valid_pasid()
  mm: Expose untagging mask in /proc/$PID/status
  x86/mm: Provide arch_prctl() interface for LAM
  x86/mm: Reduce untagged_addr() overhead for systems without LAM
  x86/uaccess: Provide untagged_addr() and remove tags before address check
  mm: Introduce untagged_addr_remote()
  x86/mm: Handle LAM on context switch
  x86: CPUID and CR3/CR4 flags for Linear Address Masking
  x86: Allow atomic MM_CONTEXT flags setting
  x86/mm: Rework address range check in get_user() and put_user()
2023-04-28 09:43:49 -07:00
Linus Torvalds
4980c176a7 Merge tag 'x86_cache_for_6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 resctrl update from Dave Hansen:
 "Reduce redundant counter reads with resctrl refactoring"

* tag 'x86_cache_for_6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/resctrl: Avoid redundant counter read in __mon_event_count()
2023-04-28 09:30:51 -07:00
Linus Torvalds
682f7bbad2 Merge tag 'x86_cleanups_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Borislav Petkov:

 - Unify duplicated __pa() and __va() definitions

 - Simplify sysctl tables registration

 - Remove unused symbols

 - Correct function name in comment

* tag 'x86_cleanups_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Centralize __pa()/__va() definitions
  x86: Simplify one-level sysctl registration for itmt_kern_table
  x86: Simplify one-level sysctl registration for abi_table2
  x86/platform/intel-mid: Remove unused definitions from intel-mid.h
  x86/uaccess: Remove memcpy_page_flushcache()
  x86/entry: Change stale function name in comment to error_return()
2023-04-28 09:22:30 -07:00
Linus Torvalds
33afd4b763 Merge tag 'mm-nonmm-stable-2023-04-27-16-01' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
 "Mainly singleton patches all over the place.

  Series of note are:

   - updates to scripts/gdb from Glenn Washburn

   - kexec cleanups from Bjorn Helgaas"

* tag 'mm-nonmm-stable-2023-04-27-16-01' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (50 commits)
  mailmap: add entries for Paul Mackerras
  libgcc: add forward declarations for generic library routines
  mailmap: add entry for Oleksandr
  ocfs2: reduce ioctl stack usage
  fs/proc: add Kthread flag to /proc/$pid/status
  ia64: fix an addr to taddr in huge_pte_offset()
  checkpatch: introduce proper bindings license check
  epoll: rename global epmutex
  scripts/gdb: add GDB convenience functions $lx_dentry_name() and $lx_i_dentry()
  scripts/gdb: create linux/vfs.py for VFS related GDB helpers
  uapi/linux/const.h: prefer ISO-friendly __typeof__
  delayacct: track delays from IRQ/SOFTIRQ
  scripts/gdb: timerlist: convert int chunks to str
  scripts/gdb: print interrupts
  scripts/gdb: raise error with reduced debugging information
  scripts/gdb: add a Radix Tree Parser
  lib/rbtree: use '+' instead of '|' for setting color.
  proc/stat: remove arch_idle_time()
  checkpatch: check for misuse of the link tags
  checkpatch: allow Closes tags with links
  ...
2023-04-27 19:57:00 -07:00
Linus Torvalds
da46b58ff8 Merge tag 'hyperv-next-signed-20230424' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv updates from Wei Liu:

 - PCI passthrough for Hyper-V confidential VMs (Michael Kelley)

 - Hyper-V VTL mode support (Saurabh Sengar)

 - Move panic report initialization code earlier (Long Li)

 - Various improvements and bug fixes (Dexuan Cui and Michael Kelley)

* tag 'hyperv-next-signed-20230424' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (22 commits)
  PCI: hv: Replace retarget_msi_interrupt_params with hyperv_pcpu_input_arg
  Drivers: hv: move panic report code from vmbus to hv early init code
  x86/hyperv: VTL support for Hyper-V
  Drivers: hv: Kconfig: Add HYPERV_VTL_MODE
  x86/hyperv: Make hv_get_nmi_reason public
  x86/hyperv: Add VTL specific structs and hypercalls
  x86/init: Make get/set_rtc_noop() public
  x86/hyperv: Exclude lazy TLB mode CPUs from enlightened TLB flushes
  x86/hyperv: Add callback filter to cpumask_to_vpset()
  Drivers: hv: vmbus: Remove the per-CPU post_msg_page
  clocksource: hyper-v: make sure Invariant-TSC is used if it is available
  PCI: hv: Enable PCI pass-thru devices in Confidential VMs
  Drivers: hv: Don't remap addresses that are above shared_gpa_boundary
  hv_netvsc: Remove second mapping of send and recv buffers
  Drivers: hv: vmbus: Remove second way of mapping ring buffers
  Drivers: hv: vmbus: Remove second mapping of VMBus monitor pages
  swiotlb: Remove bounce buffer remapping for Hyper-V
  Driver: VMBus: Add Devicetree support
  dt-bindings: bus: Add Hyper-V VMBus
  Drivers: hv: vmbus: Convert acpi_device to more generic platform_device
  ...
2023-04-27 17:17:12 -07:00
Linus Torvalds
b6a7828502 Merge tag 'modules-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull module updates from Luis Chamberlain:
 "The summary of the changes for this pull requests is:

   - Song Liu's new struct module_memory replacement

   - Nick Alcock's MODULE_LICENSE() removal for non-modules

   - My cleanups and enhancements to reduce the areas where we vmalloc
     module memory for duplicates, and the respective debug code which
     proves the remaining vmalloc pressure comes from userspace.

  Most of the changes have been in linux-next for quite some time except
  the minor fixes I made to check if a module was already loaded prior
  to allocating the final module memory with vmalloc and the respective
  debug code it introduces to help clarify the issue. Although the
  functional change is small it is rather safe as it can only *help*
  reduce vmalloc space for duplicates and is confirmed to fix a bootup
  issue with over 400 CPUs with KASAN enabled. I don't expect stable
  kernels to pick up that fix as the cleanups would have also had to
  have been picked up. Folks on larger CPU systems with modules will
  want to just upgrade if vmalloc space has been an issue on bootup.

  Given the size of this request, here's some more elaborate details:

  The functional change change in this pull request is the very first
  patch from Song Liu which replaces the 'struct module_layout' with a
  new 'struct module_memory'. The old data structure tried to put
  together all types of supported module memory types in one data
  structure, the new one abstracts the differences in memory types in a
  module to allow each one to provide their own set of details. This
  paves the way in the future so we can deal with them in a cleaner way.
  If you look at changes they also provide a nice cleanup of how we
  handle these different memory areas in a module. This change has been
  in linux-next since before the merge window opened for v6.3 so to
  provide more than a full kernel cycle of testing. It's a good thing as
  quite a bit of fixes have been found for it.

  Jason Baron then made dynamic debug a first class citizen module user
  by using module notifier callbacks to allocate / remove module
  specific dynamic debug information.

  Nick Alcock has done quite a bit of work cross-tree to remove module
  license tags from things which cannot possibly be module at my request
  so to:

   a) help him with his longer term tooling goals which require a
      deterministic evaluation if a piece a symbol code could ever be
      part of a module or not. But quite recently it is has been made
      clear that tooling is not the only one that would benefit.
      Disambiguating symbols also helps efforts such as live patching,
      kprobes and BPF, but for other reasons and R&D on this area is
      active with no clear solution in sight.

   b) help us inch closer to the now generally accepted long term goal
      of automating all the MODULE_LICENSE() tags from SPDX license tags

  In so far as a) is concerned, although module license tags are a no-op
  for non-modules, tools which would want create a mapping of possible
  modules can only rely on the module license tag after the commit
  8b41fc4454 ("kbuild: create modules.builtin without
  Makefile.modbuiltin or tristate.conf").

  Nick has been working on this *for years* and AFAICT I was the only
  one to suggest two alternatives to this approach for tooling. The
  complexity in one of my suggested approaches lies in that we'd need a
  possible-obj-m and a could-be-module which would check if the object
  being built is part of any kconfig build which could ever lead to it
  being part of a module, and if so define a new define
  -DPOSSIBLE_MODULE [0].

  A more obvious yet theoretical approach I've suggested would be to
  have a tristate in kconfig imply the same new -DPOSSIBLE_MODULE as
  well but that means getting kconfig symbol names mapping to modules
  always, and I don't think that's the case today. I am not aware of
  Nick or anyone exploring either of these options. Quite recently Josh
  Poimboeuf has pointed out that live patching, kprobes and BPF would
  benefit from resolving some part of the disambiguation as well but for
  other reasons. The function granularity KASLR (fgkaslr) patches were
  mentioned but Joe Lawrence has clarified this effort has been dropped
  with no clear solution in sight [1].

  In the meantime removing module license tags from code which could
  never be modules is welcomed for both objectives mentioned above. Some
  developers have also welcomed these changes as it has helped clarify
  when a module was never possible and they forgot to clean this up, and
  so you'll see quite a bit of Nick's patches in other pull requests for
  this merge window. I just picked up the stragglers after rc3. LWN has
  good coverage on the motivation behind this work [2] and the typical
  cross-tree issues he ran into along the way. The only concrete blocker
  issue he ran into was that we should not remove the MODULE_LICENSE()
  tags from files which have no SPDX tags yet, even if they can never be
  modules. Nick ended up giving up on his efforts due to having to do
  this vetting and backlash he ran into from folks who really did *not
  understand* the core of the issue nor were providing any alternative /
  guidance. I've gone through his changes and dropped the patches which
  dropped the module license tags where an SPDX license tag was missing,
  it only consisted of 11 drivers. To see if a pull request deals with a
  file which lacks SPDX tags you can just use:

    ./scripts/spdxcheck.py -f \
	$(git diff --name-only commid-id | xargs echo)

  You'll see a core module file in this pull request for the above, but
  that's not related to his changes. WE just need to add the SPDX
  license tag for the kernel/module/kmod.c file in the future but it
  demonstrates the effectiveness of the script.

  Most of Nick's changes were spread out through different trees, and I
  just picked up the slack after rc3 for the last kernel was out. Those
  changes have been in linux-next for over two weeks.

  The cleanups, debug code I added and final fix I added for modules
  were motivated by David Hildenbrand's report of boot failing on a
  systems with over 400 CPUs when KASAN was enabled due to running out
  of virtual memory space. Although the functional change only consists
  of 3 lines in the patch "module: avoid allocation if module is already
  present and ready", proving that this was the best we can do on the
  modules side took quite a bit of effort and new debug code.

  The initial cleanups I did on the modules side of things has been in
  linux-next since around rc3 of the last kernel, the actual final fix
  for and debug code however have only been in linux-next for about a
  week or so but I think it is worth getting that code in for this merge
  window as it does help fix / prove / evaluate the issues reported with
  larger number of CPUs. Userspace is not yet fixed as it is taking a
  bit of time for folks to understand the crux of the issue and find a
  proper resolution. Worst come to worst, I have a kludge-of-concept [3]
  of how to make kernel_read*() calls for modules unique / converge
  them, but I'm currently inclined to just see if userspace can fix this
  instead"

Link: https://lore.kernel.org/all/Y/kXDqW+7d71C4wz@bombadil.infradead.org/ [0]
Link: https://lkml.kernel.org/r/025f2151-ce7c-5630-9b90-98742c97ac65@redhat.com [1]
Link: https://lwn.net/Articles/927569/ [2]
Link: https://lkml.kernel.org/r/20230414052840.1994456-3-mcgrof@kernel.org [3]

* tag 'modules-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (121 commits)
  module: add debugging auto-load duplicate module support
  module: stats: fix invalid_mod_bytes typo
  module: remove use of uninitialized variable len
  module: fix building stats for 32-bit targets
  module: stats: include uapi/linux/module.h
  module: avoid allocation if module is already present and ready
  module: add debug stats to help identify memory pressure
  module: extract patient module check into helper
  modules/kmod: replace implementation with a semaphore
  Change DEFINE_SEMAPHORE() to take a number argument
  module: fix kmemleak annotations for non init ELF sections
  module: Ignore L0 and rename is_arm_mapping_symbol()
  module: Move is_arm_mapping_symbol() to module_symbol.h
  module: Sync code of is_arm_mapping_symbol()
  scripts/gdb: use mem instead of core_layout to get the module address
  interconnect: remove module-related code
  interconnect: remove MODULE_LICENSE in non-modules
  zswap: remove MODULE_LICENSE in non-modules
  zpool: remove MODULE_LICENSE in non-modules
  x86/mm/dump_pagetables: remove MODULE_LICENSE in non-modules
  ...
2023-04-27 16:36:55 -07:00
Linus Torvalds
556eb8b791 Merge tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
 "Here is the large set of driver core changes for 6.4-rc1.

  Once again, a busy development cycle, with lots of changes happening
  in the driver core in the quest to be able to move "struct bus" and
  "struct class" into read-only memory, a task now complete with these
  changes.

  This will make the future rust interactions with the driver core more
  "provably correct" as well as providing more obvious lifetime rules
  for all busses and classes in the kernel.

  The changes required for this did touch many individual classes and
  busses as many callbacks were changed to take const * parameters
  instead. All of these changes have been submitted to the various
  subsystem maintainers, giving them plenty of time to review, and most
  of them actually did so.

  Other than those changes, included in here are a small set of other
  things:

   - kobject logging improvements

   - cacheinfo improvements and updates

   - obligatory fw_devlink updates and fixes

   - documentation updates

   - device property cleanups and const * changes

   - firwmare loader dependency fixes.

  All of these have been in linux-next for a while with no reported
  problems"

* tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits)
  device property: make device_property functions take const device *
  driver core: update comments in device_rename()
  driver core: Don't require dynamic_debug for initcall_debug probe timing
  firmware_loader: rework crypto dependencies
  firmware_loader: Strip off \n from customized path
  zram: fix up permission for the hot_add sysfs file
  cacheinfo: Add use_arch[|_cache]_info field/function
  arch_topology: Remove early cacheinfo error message if -ENOENT
  cacheinfo: Check cache properties are present in DT
  cacheinfo: Check sib_leaf in cache_leaves_are_shared()
  cacheinfo: Allow early level detection when DT/ACPI info is missing/broken
  cacheinfo: Add arm64 early level initializer implementation
  cacheinfo: Add arch specific early level initializer
  tty: make tty_class a static const structure
  driver core: class: remove struct class_interface * from callbacks
  driver core: class: mark the struct class in struct class_interface constant
  driver core: class: make class_register() take a const *
  driver core: class: mark class_release() as taking a const *
  driver core: remove incorrect comment for device_create*
  MIPS: vpe-cmp: remove module owner pointer from struct class usage.
  ...
2023-04-27 11:53:57 -07:00
Linus Torvalds
df45da57cb Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
 "ACPI:

   - Improve error reporting when failing to manage SDEI on AGDI device
     removal

  Assembly routines:

   - Improve register constraints so that the compiler can make use of
     the zero register instead of moving an immediate #0 into a GPR

   - Allow the compiler to allocate the registers used for CAS
     instructions

  CPU features and system registers:

   - Cleanups to the way in which CPU features are identified from the
     ID register fields

   - Extend system register definition generation to handle Enum types
     when defining shared register fields

   - Generate definitions for new _EL2 registers and add new fields for
     ID_AA64PFR1_EL1

   - Allow SVE to be disabled separately from SME on the kernel
     command-line

  Tracing:

   - Support for "direct calls" in ftrace, which enables BPF tracing for
     arm64

  Kdump:

   - Don't bother unmapping the crashkernel from the linear mapping,
     which then allows us to use huge (block) mappings and reduce TLB
     pressure when a crashkernel is loaded.

  Memory management:

   - Try again to remove data cache invalidation from the coherent DMA
     allocation path

   - Simplify the fixmap code by mapping at page granularity

   - Allow the kfence pool to be allocated early, preventing the rest of
     the linear mapping from being forced to page granularity

  Perf and PMU:

   - Move CPU PMU code out to drivers/perf/ where it can be reused by
     the 32-bit ARM architecture when running on ARMv8 CPUs

   - Fix race between CPU PMU probing and pKVM host de-privilege

   - Add support for Apple M2 CPU PMU

   - Adjust the generic PERF_COUNT_HW_BRANCH_INSTRUCTIONS event
     dynamically, depending on what the CPU actually supports

   - Minor fixes and cleanups to system PMU drivers

  Stack tracing:

   - Use the XPACLRI instruction to strip PAC from pointers, rather than
     rolling our own function in C

   - Remove redundant PAC removal for toolchains that handle this in
     their builtins

   - Make backtracing more resilient in the face of instrumentation

  Miscellaneous:

   - Fix single-step with KGDB

   - Remove harmless warning when 'nokaslr' is passed on the kernel
     command-line

   - Minor fixes and cleanups across the board"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (72 commits)
  KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege
  arm64: kexec: include reboot.h
  arm64: delete dead code in this_cpu_set_vectors()
  arm64/cpufeature: Use helper macro to specify ID register for capabilites
  drivers/perf: hisi: add NULL check for name
  drivers/perf: hisi: Remove redundant initialized of pmu->name
  arm64/cpufeature: Consistently use symbolic constants for min_field_value
  arm64/cpufeature: Pull out helper for CPUID register definitions
  arm64/sysreg: Convert HFGITR_EL2 to automatic generation
  ACPI: AGDI: Improve error reporting for problems during .remove()
  arm64: kernel: Fix kernel warning when nokaslr is passed to commandline
  perf/arm-cmn: Fix port detection for CMN-700
  arm64: kgdb: Set PSTATE.SS to 1 to re-enable single-step
  arm64: move PAC masks to <asm/pointer_auth.h>
  arm64: use XPACLRI to strip PAC
  arm64: avoid redundant PAC stripping in __builtin_return_address()
  arm64/sme: Fix some comments of ARM SME
  arm64/signal: Alloc tpidr2 sigframe after checking system_supports_tpidr2()
  arm64/signal: Use system_supports_tpidr2() to check TPIDR2
  arm64/idreg: Don't disable SME when disabling SVE
  ...
2023-04-25 12:39:01 -07:00