Commit Graph

14390 Commits

Author SHA1 Message Date
Jann Horn
3fef5c2f4d x86/entry/64: Wipe KASAN stack shadow before rewind_stack_do_exit()
commit f12d11c5c1 upstream.

Reset the KASAN shadow state of the task stack before rewinding RSP.
Without this, a kernel oops will leave parts of the stack poisoned, and
code running under do_exit() can trip over such poisoned regions and cause
nonsensical false-positive KASAN reports about stack-out-of-bounds bugs.

This does not wipe the exception stacks; if an oops happens on an exception
stack, it might result in random KASAN false-positives from other tasks
afterwards. This is probably relatively uninteresting, since if the kernel
oopses on an exception stack, there are most likely bigger things to worry
about. It'd be more interesting if vmapped stacks and KASAN were
compatible, since then handle_stack_overflow() would oops from exception
stack context.

Fixes: 2deb4be280 ("x86/dumpstack: When OOPSing, rewind the stack before do_exit()")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: kasan-dev@googlegroups.com
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180828184033.93712-1-jannh@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-05 09:29:51 +02:00
Andi Kleen
2ff13cec04 x86/speculation/l1tf: Increase l1tf memory limit for Nehalem+
commit cc51e5428e upstream.

On Nehalem and newer core CPUs the CPU cache internally uses 44 bits
physical address space. The L1TF workaround is limited by this internal
cache address width, and needs to have one bit free there for the
mitigation to work.

Older client systems report only 36bit physical address space so the range
check decides that L1TF is not mitigated for a 36bit phys/32GB system with
some memory holes.

But since these actually have the larger internal cache width this warning
is bogus because it would only really be needed if the system had more than
43bits of memory.

Add a new internal x86_cache_bits field. Normally it is the same as the
physical bits field reported by CPUID, but for Nehalem and newerforce it to
be at least 44bits.

Change the L1TF memory size warning to use the new cache_bits field to
avoid bogus warnings and remove the bogus comment about memory size.

Fixes: 17dbca1193 ("x86/speculation/l1tf: Add sysfs reporting for l1tf")
Reported-by: George Anchev <studio@anchev.net>
Reported-by: Christopher Snowhill <kode54@gmail.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Michael Hocko <mhocko@suse.com>
Cc: vbabka@suse.cz
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180824170351.34874-1-andi@firstfloor.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-05 09:29:51 +02:00
Andi Kleen
084c0d5b4c x86/spectre: Add missing family 6 check to microcode check
commit 1ab534e85c upstream.

The check for Spectre microcodes does not check for family 6, only the
model numbers.

Add a family 6 check to avoid ambiguity with other families.

Fixes: a5b2966364 ("x86/cpufeature: Blacklist SPEC_CTRL/PRED_CMD on early Spectre v2 microcodes")
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180824170351.34874-2-andi@firstfloor.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-05 09:29:51 +02:00
Rian Hunter
1e227ae17f x86/process: Re-export start_thread()
commit dc76803e57 upstream.

The consolidation of the start_thread() functions removed the export
unintentionally. This breaks binfmt handlers built as a module.

Add it back.

Fixes: e634d8fc79 ("x86-64: merge the standard and compat start_thread() functions")
Signed-off-by: Rian Hunter <rian@alum.mit.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Dmitry Safonov <dima@arista.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180819230854.7275-1-rian@alum.mit.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-05 09:29:48 +02:00
Vlastimil Babka
0ea75fa0f6 x86/speculation/l1tf: Suggest what to do on systems with too much RAM
commit 6a012288d6 upstream.

Two users have reported [1] that they have an "extremely unlikely" system
with more than MAX_PA/2 memory and L1TF mitigation is not effective.

Make the warning more helpful by suggesting the proper mem=X kernel boot
parameter to make it effective and a link to the L1TF document to help
decide if the mitigation is worth the unusable RAM.

[1] https://bugzilla.suse.com/show_bug.cgi?id=1105536

Suggested-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/966571f0-9d7f-43dc-92c6-a10eec7a1254@suse.cz
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-05 09:29:48 +02:00
Paulo Zanoni
6bdd19331f x86/gpu: reserve ICL's graphics stolen memory
commit db0c8d8b03 upstream.

ICL changes the registers and addresses to 64 bits.

I also briefly looked at implementing an u64 version of the PCI config
read functions, but I concluded this wouldn't be trivial, so it's not
worth doing it for a single user that can't have any racing problems
while reading the register in two separate operations.

v2:
 - Scrub the development (non-public) changelog (Joonas).
 - Remove the i915.ko bits so this can be easily backported in order
   to properly avoid stolen memory even on machines without i915.ko
   (Joonas).
 - CC stable for the reasons above.

Issue: VIZ-9250
CC: stable@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Fixes: 412310019a ("drm/i915/icl: Add initial Icelake definitions.")
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180504203252.28048-1-paulo.r.zanoni@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-05 09:29:47 +02:00
M. Vefa Bicakci
0ae4fd9024 xen/pv: Call get_cpu_address_sizes to set x86_virt/phys_bits
commit 405c018a25 upstream.

Commit d94a155c59 ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits
adjustment corruption") has moved the query and calculation of the
x86_virt_bits and x86_phys_bits fields of the cpuinfo_x86 struct
from the get_cpu_cap function to a new function named
get_cpu_address_sizes.

One of the call sites related to Xen PV VMs was unfortunately missed
in the aforementioned commit. This prevents successful boot-up of
kernel versions 4.17 and up in Xen PV VMs if CONFIG_DEBUG_VIRTUAL
is enabled, due to the following code path:

  enlighten_pv.c::xen_start_kernel
    mmu_pv.c::xen_reserve_special_pages
      page.h::__pa
        physaddr.c::__phys_addr
          physaddr.h::phys_addr_valid

phys_addr_valid uses boot_cpu_data.x86_phys_bits to validate physical
addresses. boot_cpu_data.x86_phys_bits is no longer populated before
the call to xen_reserve_special_pages due to the aforementioned commit
though, so the validation performed by phys_addr_valid fails, which
causes __phys_addr to trigger a BUG, preventing boot-up.

Signed-off-by: M. Vefa Bicakci <m.v.b@runbox.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: xen-devel@lists.xenproject.org
Cc: x86@kernel.org
Cc: stable@vger.kernel.org # for v4.17 and up
Fixes: d94a155c59 ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption")
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-17 21:03:48 +02:00
Dou Liyang
35cf328313 x86/platform/UV: Mark memblock related init code and data correctly
commit 24cfd8ca1d upstream.

parse_mem_block_size() and mem_block_size are only used during init. Mark
them accordingly.

Fixes: d7609f4210 ("x86/platform/UV: Add kernel parameter to set memory block size")
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: hpa@zytor.com
Cc: Mike Travis <mike.travis@hpe.com>
Cc: Andrew Banman <andrew.banman@hpe.com>
Link: https://lkml.kernel.org/r/20180730075947.23023-1-douly.fnst@cn.fujitsu.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-17 21:03:47 +02:00
Guenter Roeck
a8ac113d44 x86/l1tf: Fix build error seen if CONFIG_KVM_INTEL is disabled
commit 1eb46908b3 upstream.

allmodconfig+CONFIG_INTEL_KVM=n results in the following build error.

  ERROR: "l1tf_vmx_mitigation" [arch/x86/kvm/kvm.ko] undefined!

Fixes: 5b76a3cff0 ("KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry")
Reported-by: Meelis Roos <mroos@linux.ee>
Cc: Meelis Roos <mroos@linux.ee>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-17 21:03:47 +02:00
Vlastimil Babka
3b39dc7463 x86/smp: fix non-SMP broken build due to redefinition of apic_id_is_primary_thread
commit d0055f351e upstream.

The function has an inline "return false;" definition with CONFIG_SMP=n
but the "real" definition is also visible leading to "redefinition of
‘apic_id_is_primary_thread’" compiler error.

Guard it with #ifdef CONFIG_SMP

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Fixes: 6a4d2657e0 ("x86/smp: Provide topology_is_primary_thread()")
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15 17:37:34 +02:00
Josh Poimboeuf
abf914eefa x86/microcode: Allow late microcode loading with SMT disabled
commit 07d981ad4c upstream.

The kernel unnecessarily prevents late microcode loading when SMT is
disabled.  It should be safe to allow it if all the primary threads are
online.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15 17:37:34 +02:00
Thomas Gleixner
4d3579f7f9 cpu/hotplug: Fix SMT supported evaluation
commit bc2d8d262c upstream.

Josh reported that the late SMT evaluation in cpu_smt_state_init() sets
cpu_smt_control to CPU_SMT_NOT_SUPPORTED in case that 'nosmt' was supplied
on the kernel command line as it cannot differentiate between SMT disabled
by BIOS and SMT soft disable via 'nosmt'. That wreckages the state and
makes the sysfs interface unusable.

Rework this so that during bringup of the non boot CPUs the availability of
SMT is determined in cpu_smt_allowed(). If a newly booted CPU is not a
'primary' thread then set the local cpu_smt_available marker and evaluate
this explicitely right after the initial SMP bringup has finished.

SMT evaulation on x86 is a trainwreck as the firmware has all the
information _before_ booting the kernel, but there is no interface to query
it.

Fixes: 73d5e2b472 ("cpu/hotplug: detect SMT disabled by BIOS")
Reported-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15 17:37:33 +02:00
Paolo Bonzini
ee82c13f6c x86/speculation: Use ARCH_CAPABILITIES to skip L1D flush on vmentry
commit 8e0b2b9166 upstream.

Bit 3 of ARCH_CAPABILITIES tells a hypervisor that L1D flush on vmentry is
not needed.  Add a new value to enum vmx_l1d_flush_state, which is used
either if there is no L1TF bug at all, or if bit 3 is set in ARCH_CAPABILITIES.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15 17:37:33 +02:00
Paolo Bonzini
3ba71a6ffc x86/speculation: Simplify sysfs report of VMX L1TF vulnerability
commit ea156d192f upstream.

Three changes to the content of the sysfs file:

 - If EPT is disabled, L1TF cannot be exploited even across threads on the
   same core, and SMT is irrelevant.

 - If mitigation is completely disabled, and SMT is enabled, print "vulnerable"
   instead of "vulnerable, SMT vulnerable"

 - Reorder the two parts so that the main vulnerability state comes first
   and the detail on SMT is second.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15 17:37:33 +02:00
Nicolai Stange
3131918d53 x86/irq: Let interrupt handlers set kvm_cpu_l1tf_flush_l1d
commit ffcba43ff6 upstream.

The last missing piece to having vmx_l1d_flush() take interrupts after
VMEXIT into account is to set the kvm_cpu_l1tf_flush_l1d per-cpu flag on
irq entry.

Issue calls to kvm_set_cpu_l1tf_flush_l1d() from entering_irq(),
ipi_entering_ack_irq(), smp_reschedule_interrupt() and
uv_bau_message_interrupt().

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15 17:37:32 +02:00
Nicolai Stange
5ab84cf0c2 x86: Don't include linux/irq.h from asm/hardirq.h
commit 447ae31667 upstream.

The next patch in this series will have to make the definition of
irq_cpustat_t available to entering_irq().

Inclusion of asm/hardirq.h into asm/apic.h would cause circular header
dependencies like

  asm/smp.h
    asm/apic.h
      asm/hardirq.h
        linux/irq.h
          linux/topology.h
            linux/smp.h
              asm/smp.h

or

  linux/gfp.h
    linux/mmzone.h
      asm/mmzone.h
        asm/mmzone_64.h
          asm/smp.h
            asm/apic.h
              asm/hardirq.h
                linux/irq.h
                  linux/irqdesc.h
                    linux/kobject.h
                      linux/sysfs.h
                        linux/kernfs.h
                          linux/idr.h
                            linux/gfp.h

and others.

This causes compilation errors because of the header guards becoming
effective in the second inclusion: symbols/macros that had been defined
before wouldn't be available to intermediate headers in the #include chain
anymore.

A possible workaround would be to move the definition of irq_cpustat_t
into its own header and include that from both, asm/hardirq.h and
asm/apic.h.

However, this wouldn't solve the real problem, namely asm/harirq.h
unnecessarily pulling in all the linux/irq.h cruft: nothing in
asm/hardirq.h itself requires it. Also, note that there are some other
archs, like e.g. arm64, which don't have that #include in their
asm/hardirq.h.

Remove the linux/irq.h #include from x86' asm/hardirq.h.

Fix resulting compilation errors by adding appropriate #includes to *.c
files as needed.

Note that some of these *.c files could be cleaned up a bit wrt. to their
set of #includes, but that should better be done from separate patches, if
at all.

Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15 17:37:32 +02:00
Jiri Kosina
82abbe0ea7 x86/bugs, kvm: Introduce boot-time control of L1TF mitigations
commit d90a7a0ec8 upstream.

Introduce the 'l1tf=' kernel command line option to allow for boot-time
switching of mitigation that is used on processors affected by L1TF.

The possible values are:

  full
	Provides all available mitigations for the L1TF vulnerability. Disables
	SMT and enables all mitigations in the hypervisors. SMT control via
	/sys/devices/system/cpu/smt/control is still possible after boot.
	Hypervisors will issue a warning when the first VM is started in
	a potentially insecure configuration, i.e. SMT enabled or L1D flush
	disabled.

  full,force
	Same as 'full', but disables SMT control. Implies the 'nosmt=force'
	command line option. sysfs control of SMT and the hypervisor flush
	control is disabled.

  flush
	Leaves SMT enabled and enables the conditional hypervisor mitigation.
	Hypervisors will issue a warning when the first VM is started in a
	potentially insecure configuration, i.e. SMT enabled or L1D flush
	disabled.

  flush,nosmt
	Disables SMT and enables the conditional hypervisor mitigation. SMT
	control via /sys/devices/system/cpu/smt/control is still possible
	after boot. If SMT is reenabled or flushing disabled at runtime
	hypervisors will issue a warning.

  flush,nowarn
	Same as 'flush', but hypervisors will not warn when
	a VM is started in a potentially insecure configuration.

  off
	Disables hypervisor mitigations and doesn't emit any warnings.

Default is 'flush'.

Let KVM adhere to these semantics, which means:

  - 'lt1f=full,force'	: Performe L1D flushes. No runtime control
    			  possible.

  - 'l1tf=full'
  - 'l1tf-flush'
  - 'l1tf=flush,nosmt'	: Perform L1D flushes and warn on VM start if
			  SMT has been runtime enabled or L1D flushing
			  has been run-time enabled

  - 'l1tf=flush,nowarn'	: Perform L1D flushes and no warnings are emitted.

  - 'l1tf=off'		: L1D flushes are not performed and no warnings
			  are emitted.

KVM can always override the L1D flushing behavior using its 'vmentry_l1d_flush'
module parameter except when lt1f=full,force is set.

This makes KVM's private 'nosmt' option redundant, and as it is a bit
non-systematic anyway (this is something to control globally, not on
hypervisor level), remove that option.

Add the missing Documentation entry for the l1tf vulnerability sysfs file
while at it.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142323.202758176@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15 17:37:30 +02:00
Thomas Gleixner
347ff08ab7 cpu/hotplug: Set CPU_SMT_NOT_SUPPORTED early
commit fee0aede6f upstream.

The CPU_SMT_NOT_SUPPORTED state is set (if the processor does not support
SMT) when the sysfs SMT control file is initialized.

That was fine so far as this was only required to make the output of the
control file correct and to prevent writes in that case.

With the upcoming l1tf command line parameter, this needs to be set up
before the L1TF mitigation selection and command line parsing happens.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142323.121795971@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15 17:37:30 +02:00
Thomas Gleixner
fe30a30f59 x86/kvm: Allow runtime control of L1D flush
commit 895ae47f99 upstream.

All mitigation modes can be switched at run time with a static key now:

 - Use sysfs_streq() instead of strcmp() to handle the trailing new line
   from sysfs writes correctly.
 - Make the static key management handle multiple invocations properly.
 - Set the module parameter file to RW

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142322.954525119@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15 17:37:30 +02:00
Thomas Gleixner
3c68ad96f2 x86/l1tf: Handle EPT disabled state proper
commit a7b9020b06 upstream.

If Extended Page Tables (EPT) are disabled or not supported, no L1D
flushing is required. The setup function can just avoid setting up the L1D
flush for the EPT=n case.

Invoke it after the hardware setup has be done and enable_ept has the
correct state and expose the EPT disabled state in the mitigation status as
well.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142322.612160168@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15 17:37:29 +02:00
Thomas Gleixner
8fc95fe3fe x86/litf: Introduce vmx status variable
commit 72c6d2db64 upstream.

Store the effective mitigation of VMX in a status variable and use it to
report the VMX state in the l1tf sysfs file.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142322.433098358@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15 17:37:29 +02:00
Thomas Gleixner
fbe359706d Revert "x86/apic: Ignore secondary threads if nosmt=force"
commit 506a66f374 upstream.

Dave Hansen reported, that it's outright dangerous to keep SMT siblings
disabled completely so they are stuck in the BIOS and wait for SIPI.

The reason is that Machine Check Exceptions are broadcasted to siblings and
the soft disabled sibling has CR4.MCE = 0. If a MCE is delivered to a
logical core with CR4.MCE = 0, it asserts IERR#, which shuts down or
reboots the machine. The MCE chapter in the SDM contains the following
blurb:

    Because the logical processors within a physical package are tightly
    coupled with respect to shared hardware resources, both logical
    processors are notified of machine check errors that occur within a
    given physical processor. If machine-check exceptions are enabled when
    a fatal error is reported, all the logical processors within a physical
    package are dispatched to the machine-check exception handler. If
    machine-check exceptions are disabled, the logical processors enter the
    shutdown state and assert the IERR# signal. When enabling machine-check
    exceptions, the MCE flag in control register CR4 should be set for each
    logical processor.

Reverting the commit which ignores siblings at enumeration time solves only
half of the problem. The core cpuhotplug logic needs to be adjusted as
well.

This thoughtful engineered mechanism also turns the boot process on all
Intel HT enabled systems into a MCE lottery. MCE is enabled on the boot CPU
before the secondary CPUs are brought up. Depending on the number of
physical cores the window in which this situation can happen is smaller or
larger. On a HSW-EX it's about 750ms:

MCE is enabled on the boot CPU:

[    0.244017] mce: CPU supports 22 MCE banks

The corresponding sibling #72 boots:

[    1.008005] .... node  #0, CPUs:    #72

That means if an MCE hits on physical core 0 (logical CPUs 0 and 72)
between these two points the machine is going to shutdown. At least it's a
known safe state.

It's obvious that the early boot can be hit by an MCE as well and then runs
into the same situation because MCEs are not yet enabled on the boot CPU.
But after enabling them on the boot CPU, it does not make any sense to
prevent the kernel from recovering.

Adjust the nosmt kernel parameter documentation as well.

Reverts: 2207def700 ("x86/apic: Ignore secondary threads if nosmt=force")
Reported-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15 17:37:27 +02:00
Borislav Petkov
ccd135cdd9 x86/CPU/AMD: Move TOPOEXT reenablement before reading smp_num_siblings
commit 7ce2f0393e upstream.

The TOPOEXT reenablement is a workaround for broken BIOSen which didn't
enable the CPUID bit. amd_get_topology_early(), however, relies on
that bit being set so that it can read out the CPUID leaf and set
smp_num_siblings properly.

Move the reenablement up to early_init_amd(). While at it, simplify
amd_get_topology_early().

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15 17:37:27 +02:00
Thomas Gleixner
5b44df38ed x86/apic: Ignore secondary threads if nosmt=force
commit 2207def700 upstream.

nosmt on the kernel command line merely prevents the onlining of the
secondary SMT siblings.

nosmt=force makes the APIC detection code ignore the secondary SMT siblings
completely, so they even do not show up as possible CPUs. That reduces the
amount of memory allocations for per cpu variables and saves other
resources from being allocated too large.

This is not fully equivalent to disabling SMT in the BIOS because the low
level SMT enabling in the BIOS can result in partitioning of resources
between the siblings, which is not undone by just ignoring them. Some CPUs
can use the full resources when their sibling is not onlined, but this is
depending on the CPU family and model and it's not well documented whether
this applies to all partitioned resources. That means depending on the
workload disabling SMT in the BIOS might result in better performance.

Linus analysis of the Intel manual:

  The intel optimization manual is not very clear on what the partitioning
  rules are.

  I find:

    "In general, the buffers for staging instructions between major pipe
     stages  are partitioned. These buffers include µop queues after the
     execution trace cache, the queues after the register rename stage, the
     reorder buffer which stages instructions for retirement, and the load
     and store buffers.

     In the case of load and store buffers, partitioning also provided an
     easier implementation to maintain memory ordering for each logical
     processor and detect memory ordering violations"

  but some of that partitioning may be relaxed if the HT thread is "not
  active":

    "In Intel microarchitecture code name Sandy Bridge, the micro-op queue
     is statically partitioned to provide 28 entries for each logical
     processor,  irrespective of software executing in single thread or
     multiple threads. If one logical processor is not active in Intel
     microarchitecture code name Ivy Bridge, then a single thread executing
     on that processor  core can use the 56 entries in the micro-op queue"

  but I do not know what "not active" means, and how dynamic it is. Some of
  that partitioning may be entirely static and depend on the early BIOS
  disabling of HT, and even if we park the cores, the resources will just be
  wasted.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15 17:37:27 +02:00
Thomas Gleixner
20d82bac62 x86/cpu/AMD: Evaluate smp_num_siblings early
commit 1e1d7e25fd upstream.

To support force disabling of SMT it's required to know the number of
thread siblings early. amd_get_topology() cannot be called before the APIC
driver is selected, so split out the part which initializes
smp_num_siblings and invoke it from amd_early_init().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15 17:37:27 +02:00
Borislav Petkov
f2444b280d x86/CPU/AMD: Do not check CPUID max ext level before parsing SMP info
commit 119bff8a9c upstream.

Old code used to check whether CPUID ext max level is >= 0x80000008 because
that last leaf contains the number of cores of the physical CPU.  The three
functions called there now do not depend on that leaf anymore so the check
can go.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15 17:37:26 +02:00
Thomas Gleixner
dcf82399d6 x86/cpu/intel: Evaluate smp_num_siblings early
commit 1910ad5624 upstream.

Make use of the new early detection function to initialize smp_num_siblings
on the boot cpu before the MP-Table or ACPI/MADT scan happens. That's
required for force disabling SMT.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15 17:37:26 +02:00
Thomas Gleixner
c3155990e2 x86/cpu/topology: Provide detect_extended_topology_early()
commit 95f3d39ccf upstream.

To support force disabling of SMT it's required to know the number of
thread siblings early. detect_extended_topology() cannot be called before
the APIC driver is selected, so split out the part which initializes
smp_num_siblings.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15 17:37:26 +02:00
Thomas Gleixner
b4be123b58 x86/cpu/common: Provide detect_ht_early()
commit 545401f444 upstream.

To support force disabling of SMT it's required to know the number of
thread siblings early. detect_ht() cannot be called before the APIC driver
is selected, so split out the part which initializes smp_num_siblings.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15 17:37:26 +02:00
Thomas Gleixner
6886c1b815 x86/cpu/AMD: Remove the pointless detect_ht() call
commit 44ca36de56 upstream.

Real 32bit AMD CPUs do not have SMT and the only value of the call was to
reach the magic printout which got removed.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15 17:37:26 +02:00
Thomas Gleixner
49e0975a05 x86/cpu: Remove the pointless CPU printout
commit 55e6d279ab upstream.

The value of this printout is dubious at best and there is no point in
having it in two different places along with convoluted ways to reach it.

Remove it completely.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15 17:37:26 +02:00
Thomas Gleixner
2883a1f89b x86/topology: Provide topology_smt_supported()
commit f048c399e0 upstream.

Provide information whether SMT is supoorted by the CPUs. Preparatory patch
for SMT control mechanism.

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15 17:37:25 +02:00
Thomas Gleixner
aa5de56185 x86/smp: Provide topology_is_primary_thread()
commit 6a4d2657e0 upstream.

If the CPU is supporting SMT then the primary thread can be found by
checking the lower APIC ID bits for zero. smp_num_siblings is used to build
the mask for the APIC ID bits which need to be taken into account.

This uses the MPTABLE or ACPI/MADT supplied APIC ID, which can be different
than the initial APIC ID in CPUID. But according to AMD the lower bits have
to be consistent. Intel gave a tentative confirmation as well.

Preparatory patch to support disabling SMT at boot/runtime.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15 17:37:25 +02:00
Konrad Rzeszutek Wilk
89d8157a39 x86/bugs: Move the l1tf function and define pr_fmt properly
commit 56563f53d3 upstream.

The pr_warn in l1tf_select_mitigation would have used the prior pr_fmt
which was defined as "Spectre V2 : ".

Move the function to be past SSBD and also define the pr_fmt.

Fixes: 17dbca1193 ("x86/speculation/l1tf: Add sysfs reporting for l1tf")
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15 17:37:25 +02:00
Andi Kleen
dcd1b1099b x86/speculation/l1tf: Add sysfs reporting for l1tf
commit 17dbca1193 upstream.

L1TF core kernel workarounds are cheap and normally always enabled, However
they still should be reported in sysfs if the system is vulnerable or
mitigated. Add the necessary CPU feature/bug bits.

- Extend the existing checks for Meltdowns to determine if the system is
  vulnerable. All CPUs which are not vulnerable to Meltdown are also not
  vulnerable to L1TF

- Check for 32bit non PAE and emit a warning as there is no practical way
  for mitigation due to the limited physical address bits

- If the system has more than MAX_PA/2 physical memory the invert page
  workarounds don't protect the system against the L1TF attack anymore,
  because an inverted physical address will also point to valid
  memory. Print a warning in this case and report that the system is
  vulnerable.

Add a function which returns the PFN limit for the L1TF mitigation, which
will be used in follow up patches for sanity and range checks.

[ tglx: Renamed the CPU feature bit to L1TF_PTEINV ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15 17:37:25 +02:00
Andi Kleen
adb3336455 x86/speculation/l1tf: Make sure the first page is always reserved
commit 10a70416e1 upstream.

The L1TF workaround doesn't make any attempt to mitigate speculate accesses
to the first physical page for zeroed PTEs. Normally it only contains some
data from the early real mode BIOS.

It's not entirely clear that the first page is reserved in all
configurations, so add an extra reservation call to make sure it is really
reserved. In most configurations (e.g.  with the standard reservations)
it's likely a nop.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15 17:37:24 +02:00
Masami Hiramatsu
16fe09de54 kprobes/x86: Fix %p uses in error messages
commit 0ea063306e upstream.

Remove all %p uses in error messages in kprobes/x86.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Howells <dhowells@redhat.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jon Medhurst <tixy@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tobin C . Harding <me@tobin.cc>
Cc: Will Deacon <will.deacon@arm.com>
Cc: acme@kernel.org
Cc: akpm@linux-foundation.org
Cc: brueckner@linux.vnet.ibm.com
Cc: linux-arch@vger.kernel.org
Cc: rostedt@goodmis.org
Cc: schwidefsky@de.ibm.com
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/lkml/152491902310.9916.13355297638917767319.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15 17:37:24 +02:00
Jiri Kosina
f8a0aeefc2 x86/speculation: Protect against userspace-userspace spectreRSB
commit fdf82a7856 upstream.

The article "Spectre Returns! Speculation Attacks using the Return Stack
Buffer" [1] describes two new (sub-)variants of spectrev2-like attacks,
making use solely of the RSB contents even on CPUs that don't fallback to
BTB on RSB underflow (Skylake+).

Mitigate userspace-userspace attacks by always unconditionally filling RSB on
context switch when the generic spectrev2 mitigation has been enabled.

[1] https://arxiv.org/pdf/1807.07940.pdf

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1807261308190.997@cbobk.fhfr.pm
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15 17:37:24 +02:00
Peter Zijlstra
b13b271933 x86/paravirt: Fix spectre-v2 mitigations for paravirt guests
commit 5800dc5c19 upstream.

Nadav reported that on guests we're failing to rewrite the indirect
calls to CALLEE_SAVE paravirt functions. In particular the
pv_queued_spin_unlock() call is left unpatched and that is all over the
place. This obviously wrecks Spectre-v2 mitigation (for paravirt
guests) which relies on not actually having indirect calls around.

The reason is an incorrect clobber test in paravirt_patch_call(); this
function rewrites an indirect call with a direct call to the _SAME_
function, there is no possible way the clobbers can be different
because of this.

Therefore remove this clobber check. Also put WARNs on the other patch
failure case (not enough room for the instruction) which I've not seen
trigger in my (limited) testing.

Three live kernel image disassemblies for lock_sock_nested (as a small
function that illustrates the problem nicely). PRE is the current
situation for guests, POST is with this patch applied and NATIVE is with
or without the patch for !guests.

PRE:

(gdb) disassemble lock_sock_nested
Dump of assembler code for function lock_sock_nested:
   0xffffffff817be970 <+0>:     push   %rbp
   0xffffffff817be971 <+1>:     mov    %rdi,%rbp
   0xffffffff817be974 <+4>:     push   %rbx
   0xffffffff817be975 <+5>:     lea    0x88(%rbp),%rbx
   0xffffffff817be97c <+12>:    callq  0xffffffff819f7160 <_cond_resched>
   0xffffffff817be981 <+17>:    mov    %rbx,%rdi
   0xffffffff817be984 <+20>:    callq  0xffffffff819fbb00 <_raw_spin_lock_bh>
   0xffffffff817be989 <+25>:    mov    0x8c(%rbp),%eax
   0xffffffff817be98f <+31>:    test   %eax,%eax
   0xffffffff817be991 <+33>:    jne    0xffffffff817be9ba <lock_sock_nested+74>
   0xffffffff817be993 <+35>:    movl   $0x1,0x8c(%rbp)
   0xffffffff817be99d <+45>:    mov    %rbx,%rdi
   0xffffffff817be9a0 <+48>:    callq  *0xffffffff822299e8
   0xffffffff817be9a7 <+55>:    pop    %rbx
   0xffffffff817be9a8 <+56>:    pop    %rbp
   0xffffffff817be9a9 <+57>:    mov    $0x200,%esi
   0xffffffff817be9ae <+62>:    mov    $0xffffffff817be993,%rdi
   0xffffffff817be9b5 <+69>:    jmpq   0xffffffff81063ae0 <__local_bh_enable_ip>
   0xffffffff817be9ba <+74>:    mov    %rbp,%rdi
   0xffffffff817be9bd <+77>:    callq  0xffffffff817be8c0 <__lock_sock>
   0xffffffff817be9c2 <+82>:    jmp    0xffffffff817be993 <lock_sock_nested+35>
End of assembler dump.

POST:

(gdb) disassemble lock_sock_nested
Dump of assembler code for function lock_sock_nested:
   0xffffffff817be970 <+0>:     push   %rbp
   0xffffffff817be971 <+1>:     mov    %rdi,%rbp
   0xffffffff817be974 <+4>:     push   %rbx
   0xffffffff817be975 <+5>:     lea    0x88(%rbp),%rbx
   0xffffffff817be97c <+12>:    callq  0xffffffff819f7160 <_cond_resched>
   0xffffffff817be981 <+17>:    mov    %rbx,%rdi
   0xffffffff817be984 <+20>:    callq  0xffffffff819fbb00 <_raw_spin_lock_bh>
   0xffffffff817be989 <+25>:    mov    0x8c(%rbp),%eax
   0xffffffff817be98f <+31>:    test   %eax,%eax
   0xffffffff817be991 <+33>:    jne    0xffffffff817be9ba <lock_sock_nested+74>
   0xffffffff817be993 <+35>:    movl   $0x1,0x8c(%rbp)
   0xffffffff817be99d <+45>:    mov    %rbx,%rdi
   0xffffffff817be9a0 <+48>:    callq  0xffffffff810a0c20 <__raw_callee_save___pv_queued_spin_unlock>
   0xffffffff817be9a5 <+53>:    xchg   %ax,%ax
   0xffffffff817be9a7 <+55>:    pop    %rbx
   0xffffffff817be9a8 <+56>:    pop    %rbp
   0xffffffff817be9a9 <+57>:    mov    $0x200,%esi
   0xffffffff817be9ae <+62>:    mov    $0xffffffff817be993,%rdi
   0xffffffff817be9b5 <+69>:    jmpq   0xffffffff81063aa0 <__local_bh_enable_ip>
   0xffffffff817be9ba <+74>:    mov    %rbp,%rdi
   0xffffffff817be9bd <+77>:    callq  0xffffffff817be8c0 <__lock_sock>
   0xffffffff817be9c2 <+82>:    jmp    0xffffffff817be993 <lock_sock_nested+35>
End of assembler dump.

NATIVE:

(gdb) disassemble lock_sock_nested
Dump of assembler code for function lock_sock_nested:
   0xffffffff817be970 <+0>:     push   %rbp
   0xffffffff817be971 <+1>:     mov    %rdi,%rbp
   0xffffffff817be974 <+4>:     push   %rbx
   0xffffffff817be975 <+5>:     lea    0x88(%rbp),%rbx
   0xffffffff817be97c <+12>:    callq  0xffffffff819f7160 <_cond_resched>
   0xffffffff817be981 <+17>:    mov    %rbx,%rdi
   0xffffffff817be984 <+20>:    callq  0xffffffff819fbb00 <_raw_spin_lock_bh>
   0xffffffff817be989 <+25>:    mov    0x8c(%rbp),%eax
   0xffffffff817be98f <+31>:    test   %eax,%eax
   0xffffffff817be991 <+33>:    jne    0xffffffff817be9ba <lock_sock_nested+74>
   0xffffffff817be993 <+35>:    movl   $0x1,0x8c(%rbp)
   0xffffffff817be99d <+45>:    mov    %rbx,%rdi
   0xffffffff817be9a0 <+48>:    movb   $0x0,(%rdi)
   0xffffffff817be9a3 <+51>:    nopl   0x0(%rax)
   0xffffffff817be9a7 <+55>:    pop    %rbx
   0xffffffff817be9a8 <+56>:    pop    %rbp
   0xffffffff817be9a9 <+57>:    mov    $0x200,%esi
   0xffffffff817be9ae <+62>:    mov    $0xffffffff817be993,%rdi
   0xffffffff817be9b5 <+69>:    jmpq   0xffffffff81063ae0 <__local_bh_enable_ip>
   0xffffffff817be9ba <+74>:    mov    %rbp,%rdi
   0xffffffff817be9bd <+77>:    callq  0xffffffff817be8c0 <__lock_sock>
   0xffffffff817be9c2 <+82>:    jmp    0xffffffff817be993 <lock_sock_nested+35>
End of assembler dump.


Fixes: 63f70270cc ("[PATCH] i386: PARAVIRT: add common patching machinery")
Fixes: 3010a0663f ("x86/paravirt, objtool: Annotate indirect calls")
Reported-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-15 17:37:24 +02:00
Len Brown
d9e6dbcf28 x86/apic: Future-proof the TSC_DEADLINE quirk for SKX
All SKX with stepping higher than 4 support the TSC_DEADLINE,
no matter the microcode version.

Without this patch, upcoming SKX steppings will not be able to use
their TSC_DEADLINE timer.

Signed-off-by: Len Brown <len.brown@intel.com>
Cc: <stable@kernel.org> # v4.14+
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 616dd5872e ("x86/apic: Update TSC_DEADLINE quirk with additional SKX stepping")
Link: http://lkml.kernel.org/r/d0c7129e509660be9ec6b233284b8d42d90659e8.1532207856.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-24 10:05:13 +02:00
Linus Torvalds
ef81e63e17 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Ingo Molnar:
 "A single fix for a MCE-polling regression, which prevented the
  disabling of polling"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/MCE: Remove min interval polling limitation
2018-07-21 17:25:49 -07:00
Linus Torvalds
43227e098c Merge branch 'x86-pti-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 pti fixes from Ingo Molnar:
 "An APM fix, and a BTS hardware-tracing fix related to PTI changes"

* 'x86-pti-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apm: Don't access __preempt_count with zeroed fs
  x86/events/intel/ds: Fix bts_interrupt_threshold alignment
2018-07-21 17:23:58 -07:00
Linus Torvalds
47f7dc4b84 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
 "Miscellaneous bugfixes, plus a small patchlet related to Spectre v2"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvmclock: fix TSC calibration for nested guests
  KVM: VMX: Mark VMXArea with revision_id of physical CPU even when eVMCS enabled
  KVM: irqfd: fix race between EPOLLHUP and irq_bypass_register_consumer
  KVM/Eventfd: Avoid crash when assign and deassign specific eventfd in parallel.
  x86/kvmclock: set pvti_cpu0_va after enabling kvmclock
  x86/kvm/Kconfig: Ensure CRYPTO_DEV_CCP_DD state at minimum matches KVM_AMD
  kvm: nVMX: Restore exit qual for VM-entry failure due to MSR loading
  x86/kvm/vmx: don't read current->thread.{fs,gs}base of legacy tasks
  KVM: VMX: support MSR_IA32_ARCH_CAPABILITIES as a feature MSR
2018-07-18 11:08:44 -07:00
Peng Hao
e10f780503 kvmclock: fix TSC calibration for nested guests
Inside a nested guest, access to hardware can be slow enough that
tsc_read_refs always return ULLONG_MAX, causing tsc_refine_calibration_work
to be called periodically and the nested guest to spend a lot of time
reading the ACPI timer.

However, if the TSC frequency is available from the pvclock page,
we can just set X86_FEATURE_TSC_KNOWN_FREQ and avoid the recalibration.
'refine' operation.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Peng Hao <peng.hao2@zte.com.cn>
[Commit message rewritten. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-07-18 11:43:17 +02:00
Dewet Thibaut
fbdb328c6b x86/MCE: Remove min interval polling limitation
commit b3b7c4795c ("x86/MCE: Serialize sysfs changes") introduced a min
interval limitation when setting the check interval for polled MCEs.
However, the logic is that 0 disables polling for corrected MCEs, see
Documentation/x86/x86_64/machinecheck. The limitation prevents disabling.

Remove this limitation and allow the value 0 to disable polling again.

Fixes: b3b7c4795c ("x86/MCE: Serialize sysfs changes")
Signed-off-by: Dewet Thibaut <thibaut.dewet@nokia.com>
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20180716084927.24869-1-alexander.sverdlin@nokia.com
2018-07-17 17:56:25 +02:00
Ville Syrjälä
6f6060a5c9 x86/apm: Don't access __preempt_count with zeroed fs
APM_DO_POP_SEGS does not restore fs/gs which were zeroed by
APM_DO_ZERO_SEGS. Trying to access __preempt_count with
zeroed fs doesn't really work.

Move the ibrs call outside the APM_DO_SAVE_SEGS/APM_DO_RESTORE_SEGS
invocations so that fs is actually restored before calling
preempt_enable().

Fixes the following sort of oopses:
[    0.313581] general protection fault: 0000 [#1] PREEMPT SMP
[    0.313803] Modules linked in:
[    0.314040] CPU: 0 PID: 268 Comm: kapmd Not tainted 4.16.0-rc1-triton-bisect-00090-gdd84441a7971 #19
[    0.316161] EIP: __apm_bios_call_simple+0xc8/0x170
[    0.316161] EFLAGS: 00210016 CPU: 0
[    0.316161] EAX: 00000102 EBX: 00000000 ECX: 00000102 EDX: 00000000
[    0.316161] ESI: 0000530e EDI: dea95f64 EBP: dea95f18 ESP: dea95ef0
[    0.316161]  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[    0.316161] CR0: 80050033 CR2: 00000000 CR3: 015d3000 CR4: 000006d0
[    0.316161] Call Trace:
[    0.316161]  ? cpumask_weight.constprop.15+0x20/0x20
[    0.316161]  on_cpu0+0x44/0x70
[    0.316161]  apm+0x54e/0x720
[    0.316161]  ? __switch_to_asm+0x26/0x40
[    0.316161]  ? __schedule+0x17d/0x590
[    0.316161]  kthread+0xc0/0xf0
[    0.316161]  ? proc_apm_show+0x150/0x150
[    0.316161]  ? kthread_create_worker_on_cpu+0x20/0x20
[    0.316161]  ret_from_fork+0x2e/0x38
[    0.316161] Code: da 8e c2 8e e2 8e ea 57 55 2e ff 1d e0 bb 5d b1 0f 92 c3 5d 5f 07 1f 89 47 0c 90 8d b4 26 00 00 00 00 90 8d b4 26 00 00 00 00 90 <64> ff 0d 84 16 5c b1 74 7f 8b 45 dc 8e e0 8b 45 d8 8e e8 8b 45
[    0.316161] EIP: __apm_bios_call_simple+0xc8/0x170 SS:ESP: 0068:dea95ef0
[    0.316161] ---[ end trace 656253db2deaa12c ]---

Fixes: dd84441a79 ("x86/speculation: Use IBRS if available before calling into firmware")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Cc:  David Woodhouse <dwmw@amazon.co.uk>
Cc:  "H. Peter Anvin" <hpa@zytor.com>
Cc:  x86@kernel.org
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20180709133534.5963-1-ville.syrjala@linux.intel.com
2018-07-16 17:59:57 +02:00
Radim Krčmář
94ffba4846 x86/kvmclock: set pvti_cpu0_va after enabling kvmclock
pvti_cpu0_va is the address of shared kvmclock data structure.

pvti_cpu0_va is currently kept unset (1) on 32 bit systems, (2) when
kvmclock vsyscall is disabled, and (3) if kvmclock is not stable.
This poses a problem, because kvm_ptp needs pvti_cpu0_va, but (1) can
work on 32 bit, (2) has little relation to the vsyscall, and (3) does
not need stable kvmclock (although kvmclock won't be used for system
clock if it's not stable, so kvm_ptp is pointless in that case).

Expose pvti_cpu0_va whenever kvmclock is enabled to allow all users to
work with it.

This fixes a regression found on Gentoo: https://bugs.gentoo.org/658544.

Fixes: 9f08890ab9 ("x86/pvclock: add setter for pvclock_pvti_cpu0_va")
Cc: stable@vger.kernel.org
Reported-by: Andreas Steinmetz <ast@domdv.de>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-07-15 17:44:16 +02:00
Linus Torvalds
23adbe6fb5 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/pti updates from Thomas Gleixner:
 "Two small fixes correcting the handling of SSB mitigations on AMD
  processors"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/bugs: Fix the AMD SSBD usage of the SPEC_CTRL MSR
  x86/bugs: Update when to check for the LS_CFG SSBD mitigation
2018-07-08 13:56:25 -07:00
Jann Horn
15279df6f2 x86/mtrr: Don't copy out-of-bounds data in mtrr_write
Don't access the provided buffer out of bounds - this can cause a kernel
out-of-bounds read when invoked through sys_splice() or other things that
use kernel_write()/__kernel_write().

Fixes: 7f8ec5a4f0 ("x86/mtrr: Convert to use strncpy_from_user() helper")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180706215003.156702-1-jannh@google.com
2018-07-07 18:58:41 +02:00
Nick Desaulniers
d0a8d9378d x86/paravirt: Make native_save_fl() extern inline
native_save_fl() is marked static inline, but by using it as
a function pointer in arch/x86/kernel/paravirt.c, it MUST be outlined.

paravirt's use of native_save_fl() also requires that no GPRs other than
%rax are clobbered.

Compilers have different heuristics which they use to emit stack guard
code, the emittance of which can break paravirt's callee saved assumption
by clobbering %rcx.

Marking a function definition extern inline means that if this version
cannot be inlined, then the out-of-line version will be preferred. By
having the out-of-line version be implemented in assembly, it cannot be
instrumented with a stack protector, which might violate custom calling
conventions that code like paravirt rely on.

The semantics of extern inline has changed since gnu89. This means that
folks using GCC versions >= 5.1 may see symbol redefinition errors at
link time for subdirs that override KBUILD_CFLAGS (making the C standard
used implicit) regardless of this patch. This has been cleaned up
earlier in the patch set, but is left as a note in the commit message
for future travelers.

Reports:
 https://lkml.org/lkml/2018/5/7/534
 https://github.com/ClangBuiltLinux/linux/issues/16

Discussion:
 https://bugs.llvm.org/show_bug.cgi?id=37512
 https://lkml.org/lkml/2018/5/24/1371

Thanks to the many folks that participated in the discussion.

Debugged-by: Alistair Strachan <astrachan@google.com>
Debugged-by: Matthias Kaehlcke <mka@chromium.org>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: H. Peter Anvin <hpa@zytor.com>
Suggested-by: Tom Stellar <tstellar@redhat.com>
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@redhat.com
Cc: akataria@vmware.com
Cc: akpm@linux-foundation.org
Cc: andrea.parri@amarulasolutions.com
Cc: ard.biesheuvel@linaro.org
Cc: aryabinin@virtuozzo.com
Cc: astrachan@google.com
Cc: boris.ostrovsky@oracle.com
Cc: brijesh.singh@amd.com
Cc: caoj.fnst@cn.fujitsu.com
Cc: geert@linux-m68k.org
Cc: ghackmann@google.com
Cc: gregkh@linuxfoundation.org
Cc: jan.kiszka@siemens.com
Cc: jarkko.sakkinen@linux.intel.com
Cc: joe@perches.com
Cc: jpoimboe@redhat.com
Cc: keescook@google.com
Cc: kirill.shutemov@linux.intel.com
Cc: kstewart@linuxfoundation.org
Cc: linux-efi@vger.kernel.org
Cc: linux-kbuild@vger.kernel.org
Cc: manojgupta@google.com
Cc: mawilcox@microsoft.com
Cc: michal.lkml@markovi.net
Cc: mjg59@google.com
Cc: mka@chromium.org
Cc: pombredanne@nexb.com
Cc: rientjes@google.com
Cc: rostedt@goodmis.org
Cc: thomas.lendacky@amd.com
Cc: tweek@google.com
Cc: virtualization@lists.linux-foundation.org
Cc: will.deacon@arm.com
Cc: yamada.masahiro@socionext.com
Link: http://lkml.kernel.org/r/20180621162324.36656-4-ndesaulniers@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-03 10:56:27 +02:00