Commit Graph

16182 Commits

Author SHA1 Message Date
Balbir Singh
21f81d9022 powerpc/kexec: Fix radix to hash kexec due to IAMR/AMOR
commit 1e2a516e89 upstream.

This patch fixes a crash seen while doing a kexec from radix mode to
hash mode. Key 0 is special in hash and used in the RPN by default, we
set the key values to 0 today. In radix mode key 0 is used to control
supervisor<->user access. In hash key 0 is used by default, so the
first instruction after the switch causes a crash on kexec.

Commit 3b10d0095a ("powerpc/mm/radix: Prevent kernel execution of
user space") introduced the setting of IAMR and AMOR values to prevent
execution of user mode instructions from supervisor mode. We need to
clean up these SPR's on kexec.

Fixes: 3b10d0095a ("powerpc/mm/radix: Prevent kernel execution of user space")
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-21 07:00:12 +02:00
Kees Cook
fa3a378b22 powerpc: move ELF_ET_DYN_BASE to 4GB / 4MB
commit 47ebb09d54 upstream.

Now that explicitly executed loaders are loaded in the mmap region, we
have more freedom to decide where we position PIE binaries in the
address space to avoid possible collisions with mmap or stack regions.

For 64-bit, align to 4GB to allow runtimes to use the entire 32-bit
address space for 32-bit pointers.  On 32-bit use 4MB, which is the
traditional x86 minimum load location, likely to avoid historically
requiring a 4MB page table entry when only a portion of the first 4MB
would be used (since the NULL address is avoided).

Link: http://lkml.kernel.org/r/1498154792-49952-4-git-send-email-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Pratyush Anand <panand@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-21 07:00:12 +02:00
Naveen N. Rao
fc1c180ba6 powerpc/64s: Handle data breakpoints in Radix mode
commit d89ba5353f upstream.

On Power9, trying to use data breakpoints throws the splat shown
below. This is because the check for a data breakpoint in DSISR is in
do_hash_page(), which is not called when in Radix mode.

  Unable to handle kernel paging request for data at address 0xc000000000e19218
  Faulting instruction address: 0xc0000000001155e8
  cpu 0x0: Vector: 300 (Data Access) at [c0000000ef1e7b20]
  pc: c0000000001155e8: find_pid_ns+0x48/0xe0
  lr: c000000000116ac4: find_task_by_vpid+0x44/0x90
  sp: c0000000ef1e7da0
  msr: 9000000000009033
  dar: c000000000e19218
  dsisr: 400000

Move the check to handle_page_fault() so as to catch data breakpoints
in both Hash and Radix MMU modes.

We have to change the check in do_hash_page() against 0xa410 to use
0xa450, so as to include the value of (DSISR_DABRMATCH << 16).

There are two sites that call handle_page_fault() when in Radix, both
already pass DSISR in r4.

Fixes: caca285e5a ("powerpc/mm/radix: Use STD_MMU_64 to properly isolate hash related code")
Reported-by: Shriya R. Kulkarni <shriykul@in.ibm.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
[mpe: Fix the fall-through case on hash, we need to reload DSISR]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29 13:02:50 +02:00
Naveen N. Rao
7a6c505366 powerpc/kprobes: Pause function_graph tracing during jprobes handling
commit a9f8553e93 upstream.

This fixes a crash when function_graph and jprobes are used together.
This is essentially commit 237d28db03 ("ftrace/jprobes/x86: Fix
conflict between jprobes and function graph tracing"), but for powerpc.

Jprobes breaks function_graph tracing since the jprobe hook needs to use
jprobe_return(), which never returns back to the hook, but instead to
the original jprobe'd function. The solution is to momentarily pause
function_graph tracing before invoking the jprobe hook and re-enable it
when returning back to the original jprobe'd function.

Fixes: 6794c78243 ("powerpc64: port of the function graph tracer")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29 13:02:50 +02:00
Paul Mackerras
3ff8eda23a KVM: PPC: Book3S HV: Save/restore host values of debug registers
commit 7ceaa6dcd8 upstream.

At present, HV KVM on POWER8 and POWER9 machines loses any instruction
or data breakpoint set in the host whenever a guest is run.
Instruction breakpoints are currently only used by xmon, but ptrace
and the perf_event subsystem can set data breakpoints as well as xmon.

To fix this, we save the host values of the debug registers (CIABR,
DAWR and DAWRX) before entering the guest and restore them on exit.
To provide space to save them in the stack frame, we expand the stack
frame allocated by kvmppc_hv_entry() from 112 to 144 bytes.

Fixes: b005255e12 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29 13:02:48 +02:00
Paul Mackerras
324df574b9 KVM: PPC: Book3S HV: Restore critical SPRs to host values on guest exit
commit 4c3bb4ccd0 upstream.

This restores several special-purpose registers (SPRs) to sane values
on guest exit that were missed before.

TAR and VRSAVE are readable and writable by userspace, and we need to
save and restore them to prevent the guest from potentially affecting
userspace execution (not that TAR or VRSAVE are used by any known
program that run uses the KVM_RUN ioctl).  We save/restore these
in kvmppc_vcpu_run_hv() rather than on every guest entry/exit.

FSCR affects userspace execution in that it can prohibit access to
certain facilities by userspace.  We restore it to the normal value
for the task on exit from the KVM_RUN ioctl.

IAMR is normally 0, and is restored to 0 on guest exit.  However,
with a radix host on POWER9, it is set to a value that prevents the
kernel from executing user-accessible memory.  On POWER9, we save
IAMR on guest entry and restore it on guest exit to the saved value
rather than 0.  On POWER8 we continue to set it to 0 on guest exit.

PSPB is normally 0.  We restore it to 0 on guest exit to prevent
userspace taking advantage of the guest having set it non-zero
(which would allow userspace to set its SMT priority to high).

UAMOR is normally 0.  We restore it to 0 on guest exit to prevent
the AMR from being used as a covert channel between userspace
processes, since the AMR is not context-switched at present.

Fixes: b005255e12 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29 13:02:48 +02:00
Paul Mackerras
4ae3f3581f KVM: PPC: Book3S HV: Context-switch EBB registers properly
commit ca8efa1df1 upstream.

This adds code to save the values of three SPRs (special-purpose
registers) used by userspace to control event-based branches (EBBs),
which are essentially interrupts that get delivered directly to
userspace.  These registers are loaded up with guest values when
entering the guest, and their values are saved when exiting the
guest, but we were not saving the host values and restoring them
before going back to userspace.

On POWER8 this would only affect userspace programs which explicitly
request the use of EBBs and also use the KVM_RUN ioctl, since the
only source of EBBs on POWER8 is the PMU, and there is an explicit
enable bit in the PMU registers (and those PMU registers do get
properly context-switched between host and guest).  On POWER9 there
is provision for externally-generated EBBs, and these are not subject
to the control in the PMU registers.

Since these registers only affect userspace, we can save them when
we first come in from userspace and restore them before returning to
userspace, rather than saving/restoring the host values on every
guest entry/exit.  Similarly, we don't need to worry about their
values on offline secondary threads since they execute in the context
of the idle task, which never executes in userspace.

Fixes: b005255e12 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29 13:02:48 +02:00
Paul Mackerras
a7d29e276e KVM: PPC: Book3S HV: Ignore timebase offset on POWER9 DD1
commit 3d3efb68c1 upstream.

POWER9 DD1 has an erratum where writing to the TBU40 register, which
is used to apply an offset to the timebase, can cause the timebase to
lose counts.  This results in the timebase on some CPUs getting out of
sync with other CPUs, which then results in misbehaviour of the
timekeeping code.

To work around the problem, we make KVM ignore the timebase offset for
all guests on POWER9 DD1 machines.  This means that live migration
cannot be supported on POWER9 DD1 machines.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29 13:02:47 +02:00
Paul Mackerras
4d9c6828d0 KVM: PPC: Book3S HV: Preserve userspace HTM state properly
commit 46a704f840 upstream.

If userspace attempts to call the KVM_RUN ioctl when it has hardware
transactional memory (HTM) enabled, the values that it has put in the
HTM-related SPRs TFHAR, TFIAR and TEXASR will get overwritten by
guest values.  To fix this, we detect this condition and save those
SPR values in the thread struct, and disable HTM for the task.  If
userspace goes to access those SPRs or the HTM facility in future,
a TM-unavailable interrupt will occur and the handler will reload
those SPRs and re-enable HTM.

If userspace has started a transaction and suspended it, we would
currently lose the transactional state in the guest entry path and
would almost certainly get a "TM Bad Thing" interrupt, which would
cause the host to crash.  To avoid this, we detect this case and
return from the KVM_RUN ioctl with an EINVAL error, with the KVM
exit reason set to KVM_EXIT_FAIL_ENTRY.

Fixes: b005255e12 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29 13:02:47 +02:00
Paul Mackerras
1e0d799688 KVM: PPC: Book3S HV: Cope with host using large decrementer mode
commit 2f2724630f upstream.

POWER9 introduces a new mode for the decrementer register, called
large decrementer mode, in which the decrementer counter is 56 bits
wide rather than 32, and reads are sign-extended rather than
zero-extended.  For the decrementer, this new mode is optional and
controlled by a bit in the LPCR.  The hypervisor decrementer (HDEC)
is 56 bits wide on POWER9 and has no mode control.

Since KVM code reads and writes the decrementer and hypervisor
decrementer registers in a few places, it needs to be aware of the
need to treat the decrementer value as a 64-bit quantity, and only do
a 32-bit sign extension when large decrementer mode is not in effect.
Similarly, the HDEC should always be treated as a 64-bit quantity on
POWER9.  We define a new EXTEND_HDEC macro to encapsulate the feature
test for POWER9 and the sign extension.

To enable the sign extension to be removed in large decrementer mode,
we test the LPCR_LD bit in the host LPCR image stored in the struct
kvm for the guest.  If is set then large decrementer mode is enabled
and the sign extension should be skipped.

This is partly based on an earlier patch by Oliver O'Halloran.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29 13:02:47 +02:00
Ravi Bangoria
a2063a2054 powerpc/perf: Fix oops when kthread execs user process
commit bf05fc25f2 upstream.

When a kthread calls call_usermodehelper() the steps are:
  1. allocate current->mm
  2. load_elf_binary()
  3. populate current->thread.regs

While doing this, interrupts are not disabled. If there is a perf
interrupt in the middle of this process (i.e. step 1 has completed
but not yet reached to step 3) and if perf tries to read userspace
regs, kernel oops with following log:

  Unable to handle kernel paging request for data at address 0x00000000
  Faulting instruction address: 0xc0000000000da0fc
  ...
  Call Trace:
  perf_output_sample_regs+0x6c/0xd0
  perf_output_sample+0x4e4/0x830
  perf_event_output_forward+0x64/0x90
  __perf_event_overflow+0x8c/0x1e0
  record_and_restart+0x220/0x5c0
  perf_event_interrupt+0x2d8/0x4d0
  performance_monitor_exception+0x54/0x70
  performance_monitor_common+0x158/0x160
  --- interrupt: f01 at avtab_search_node+0x150/0x1a0
      LR = avtab_search_node+0x100/0x1a0
  ...
  load_elf_binary+0x6e8/0x15a0
  search_binary_handler+0xe8/0x290
  do_execveat_common.isra.14+0x5f4/0x840
  call_usermodehelper_exec_async+0x170/0x210
  ret_from_kernel_thread+0x5c/0x7c

Fix it by setting abi to PERF_SAMPLE_REGS_ABI_NONE when userspace
pt_regs are not set.

Fixes: ed4a4ef85c ("powerpc/perf: Add support for sampling interrupt register state")
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29 13:02:45 +02:00
Hugh Dickins
27f9070614 mm: larger stack guard gap, between vmas
commit 1be7107fbe upstream.

Stack guard page is a useful feature to reduce a risk of stack smashing
into a different mapping. We have been using a single page gap which
is sufficient to prevent having stack adjacent to a different mapping.
But this seems to be insufficient in the light of the stack usage in
userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
which is 256kB or stack strings with MAX_ARG_STRLEN.

This will become especially dangerous for suid binaries and the default
no limit for the stack size limit because those applications can be
tricked to consume a large portion of the stack and a single glibc call
could jump over the guard page. These attacks are not theoretical,
unfortunatelly.

Make those attacks less probable by increasing the stack guard gap
to 1MB (on systems with 4k pages; but make it depend on the page size
because systems with larger base pages might cap stack allocations in
the PAGE_SIZE units) which should cover larger alloca() and VLA stack
allocations. It is obviously not a full fix because the problem is
somehow inherent, but it should reduce attack space a lot.

One could argue that the gap size should be configurable from userspace,
but that can be done later when somebody finds that the new 1MB is wrong
for some special case applications.  For now, add a kernel command line
option (stack_guard_gap) to specify the stack gap size (in page units).

Implementation wise, first delete all the old code for stack guard page:
because although we could get away with accounting one extra page in a
stack vma, accounting a larger gap can break userspace - case in point,
a program run with "ulimit -S -v 20000" failed when the 1MB gap was
counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
and strict non-overcommit mode.

Instead of keeping gap inside the stack vma, maintain the stack guard
gap as a gap between vmas: using vm_start_gap() in place of vm_start
(or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
places which need to respect the gap - mainly arch_get_unmapped_area(),
and and the vma tree's subtree_gap support for that.

Original-patch-by: Oleg Nesterov <oleg@redhat.com>
Original-patch-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[wt: backport to 4.11: adjust context]
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-24 07:06:22 +02:00
Oliver O'Halloran
0393c1c8a6 powerpc/mm: Add physical address to Linux page table dump
commit aaa2295292 upstream.

The current page table dumper scans the Linux page tables and coalesces mappings
with adjacent virtual addresses and similar PTE flags. This behaviour is
somewhat broken when you consider the IOREMAP space where entirely unrelated
mappings will appear to be virtually contiguous. This patch modifies the range
coalescing so that only ranges that are both physically and virtually contiguous
are combined. This patch also adds to the dump output the physical address at
the start of each range.

Fixes: 8eb07b1870 ("powerpc/mm: Dump linux pagetables")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
[mpe: Print the physicall address with 0x like the other addresses]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-24 07:06:18 +02:00
Breno Leitao
93dfb4e6a7 powerpc/kernel: Initialize load_tm on task creation
commit 7f22ced437 upstream.

Currently tsk->thread.load_tm is not initialized in the task creation
and can contain garbage on a new task.

This is an undesired behaviour, since it affects the timing to enable
and disable the transactional memory laziness (disabling and enabling
the MSR TM bit, which affects TM reclaim and recheckpoint in the
scheduling process).

Fixes: 5d176f751e ("powerpc: tm: Enable transactional memory (TM) lazily for userspace")
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:48 +02:00
Breno Leitao
154b657008 powerpc/kernel: Fix FP and vector register restoration
commit 1195892c09 upstream.

Currently tsk->thread->load_vec and load_fp are not initialized during
task creation, which can lead to garbage values in these variables (non-zero
values).

These variables will be checked later in restore_math() to validate if the
FP and vector registers are being utilized. Since these values might be
non-zero, the restore_math() will continue to save the FP and vectors even if
they were never utilized by the userspace application. load_fp and load_vec
counters will then overflow (they wrap at 255) and the FP and Altivec will be
finally disabled, but before that condition is reached (counter overflow)
several context switches will have restored FP and vector registers without
need, causing a performance degradation.

Fixes: 70fe3d980f ("powerpc: Restore FPU/VEC/VSX if previously used")
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Gustavo Romero <gusbromero@gmail.com>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:48 +02:00
Michael Bringmann
c3e2874653 powerpc/hotplug-mem: Fix missing endian conversion of aa_index
commit dc421b200f upstream.

When adding or removing memory, the aa_index (affinity value) for the
memblock must also be converted to match the endianness of the rest
of the 'ibm,dynamic-memory' property.  Otherwise, subsequent retrieval
of the attribute will likely lead to non-existent nodes, followed by
using the default node in the code inappropriately.

Fixes: 5f97b2a0d1 ("powerpc/pseries: Implement memory hotplug add in the kernel")
Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:48 +02:00
Michael Ellerman
840407e3da powerpc/numa: Fix percpu allocations to be NUMA aware
commit ba4a648f12 upstream.

In commit 8c27226119 ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"), we
switched to the generic implementation of cpu_to_node(), which uses a percpu
variable to hold the NUMA node for each CPU.

Unfortunately we neglected to notice that we use cpu_to_node() in the allocation
of our percpu areas, leading to a chicken and egg problem. In practice what
happens is when we are setting up the percpu areas, cpu_to_node() reports that
all CPUs are on node 0, so we allocate all percpu areas on node 0.

This is visible in the dmesg output, as all pcpu allocs being in group 0:

  pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
  pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
  pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
  pcpu-alloc: [0] 24 25 26 27 [0] 28 29 30 31
  pcpu-alloc: [0] 32 33 34 35 [0] 36 37 38 39
  pcpu-alloc: [0] 40 41 42 43 [0] 44 45 46 47

To fix it we need an early_cpu_to_node() which can run prior to percpu being
setup. We already have the numa_cpu_lookup_table we can use, so just plumb it
in. With the patch dmesg output shows two groups, 0 and 1:

  pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
  pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
  pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
  pcpu-alloc: [1] 24 25 26 27 [1] 28 29 30 31
  pcpu-alloc: [1] 32 33 34 35 [1] 36 37 38 39
  pcpu-alloc: [1] 40 41 42 43 [1] 44 45 46 47

We can also check the data_offset in the paca of various CPUs, with the fix we
see:

  CPU 0:  data_offset = 0x0ffe8b0000
  CPU 24: data_offset = 0x1ffe5b0000

And we can see from dmesg that CPU 24 has an allocation on node 1:

  node   0: [mem 0x0000000000000000-0x0000000fffffffff]
  node   1: [mem 0x0000001000000000-0x0000001fffffffff]

Fixes: 8c27226119 ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:48 +02:00
Christophe Leroy
2bcd25cbc6 powerpc/sysdev/simple_gpio: Fix oops in gpio save_regs function
commit 6f553912ee upstream.

of_mm_gpiochip_add_data() generates an oops for NULL pointer dereference.

of_mm_gpiochip_add_data() calls mm_gc->save_regs() before
setting the data, therefore ->save_regs() cannot use gpiochip_get_data()

Fixes: 937daafca7 ("powerpc: simple-gpio: use gpiochip data pointer")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:47 +02:00
Jeremy Kerr
baa4d4112e powerpc/spufs: Fix hash faults for kernel regions
commit d75e4919cc upstream.

Commit ac29c64089 ("powerpc/mm: Replace _PAGE_USER with
_PAGE_PRIVILEGED") swapped _PAGE_USER for _PAGE_PRIVILEGED, and
introduced check_pte_access() which denied kernel access to
non-_PAGE_PRIVILEGED pages.

However, it didn't add _PAGE_PRIVILEGED to the hash fault handler
for spufs' kernel accesses, so the DMAs required to establish SPE
memory no longer work.

This change adds _PAGE_PRIVILEGED to the hash fault handler for
kernel accesses.

Fixes: ac29c64089 ("powerpc/mm: Replace _PAGE_USER with _PAGE_PRIVILEGED")
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Reported-by: Sombat Tragolgosol <sombat3960@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:06 +02:00
Michael Neuling
919c7173e0 powerpc: Fix booting P9 hash with CONFIG_PPC_RADIX_MMU=N
commit d957fb4d17 upstream.

Currently if you disable CONFIG_PPC_RADIX_MMU you'll crash on boot on
a P9. This is because we still set MMU_FTR_TYPE_RADIX via
ibm,pa-features and MMU_FTR_TYPE_RADIX is what's used for code patching
in much of the asm code (ie. slb_miss_realmode)

This patch fixes the problem by stopping MMU_FTR_TYPE_RADIX from being
set from ibm.pa-features.

We may eventually end up removing the CONFIG_PPC_RADIX_MMU option
completely but until then this fixes the issue.

Fixes: 17a3dd2f5f ("powerpc/mm/radix: Use firmware feature to enable Radix MMU")
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:06 +02:00
Michael Neuling
75295195f7 powerpc/tm: Fix FP and VMX register corruption
commit f48e91e87e upstream.

In commit dc3106690b ("powerpc: tm: Always use fp_state and vr_state
to store live registers"), a section of code was removed that copied
the current state to checkpointed state. That code should not have been
removed.

When an FP (Floating Point) unavailable is taken inside a transaction,
we need to abort the transaction. This is because at the time of the
tbegin, the FP state is bogus so the state stored in the checkpointed
registers is incorrect. To fix this, we treclaim (to get the
checkpointed GPRs) and then copy the thread_struct FP live state into
the checkpointed state. We then trecheckpoint so that the FP state is
correctly restored into the CPU.

The copying of the FP registers from live to checkpointed is what was
missing.

This simplifies the logic slightly from the original patch.
tm_reclaim_thread() will now always write the checkpointed FP
state. Either the checkpointed FP state will be written as part of
the actual treclaim (in tm.S), or it'll be a copy of the live
state. Which one we use is based on MSR[FP] from userspace.

Similarly for VMX.

Fixes: dc3106690b ("powerpc: tm: Always use fp_state and vr_state to store live registers")
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: cyrilbur@gmail.com
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:23 +02:00
Michael Ellerman
f9fde78726 powerpc/mm: Fix crash in page table dump with huge pages
commit bfb9956ab4 upstream.

The page table dump code doesn't know about huge pages, so currently
it crashes (or walks random memory, usually leading to a crash), if it
finds a huge page. On Book3S we only see huge pages in the Linux page
tables when we're using the P9 Radix MMU.

Teaching the code to properly handle huge pages is a bit more involved,
so for now just prevent the crash.

Fixes: 8eb07b1870 ("powerpc/mm: Dump linux pagetables")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:23 +02:00
LiuHailong
1d19e3f139 powerpc/64e: Fix hang when debugging programs with relocated kernel
commit fd615f69a1 upstream.

Debug interrupts can be taken during interrupt entry, since interrupt
entry does not automatically turn them off.  The kernel will check
whether the faulting instruction is between [interrupt_base_book3e,
__end_interrupts], and if so clear MSR[DE] and return.

However, when the kernel is built with CONFIG_RELOCATABLE, it can't use
LOAD_REG_IMMEDIATE(r14,interrupt_base_book3e) and
LOAD_REG_IMMEDIATE(r15,__end_interrupts), as they ignore relocation.
Thus, if the kernel is actually running at a different address than it
was built at, the address comparison will fail, and the exception entry
code will hang at kernel_dbg_exc.

r2(toc) is also not usable here, as r2 still holds data from the
interrupted context, so LOAD_REG_ADDR() doesn't work either.  So we use
the *name@got* to get the EV of two labels directly.

Test programs test.c shows as follows:
int main(int argc, char *argv[])
{
	if (access("/proc/sys/kernel/perf_event_paranoid", F_OK) == -1)
		printf("Kernel doesn't have perf_event support\n");
}

Steps to reproduce the bug, for example:
 1) ./gdb ./test
 2) (gdb) b access
 3) (gdb) r
 4) (gdb) s

Signed-off-by: Liu Hailong <liu.hailong6@zte.com.cn>
Signed-off-by: Jiang Xuexin <jiang.xuexin@zte.com.cn>
Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>
Reviewed-by: Liu Song <liu.song11@zte.com.cn>
Reviewed-by: Huang Jian <huang.jian@zte.com.cn>
[scottwood: cleaned up commit message, and specified bad behavior
 as a hang rather than an oops to correspond to mainline kernel behavior]
Fixes: 1cb6e06492 ("powerpc/book3e: support CONFIG_RELOCATABLE")
Signed-off-by: Scott Wood <oss@buserror.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:23 +02:00
Alistair Popple
31f96d860e powerpc/powernv: Fix TCE kill on NVLink2
commit 6b3d12a948 upstream.

Commit 616badd2fb ("powerpc/powernv: Use OPAL call for TCE kill on
NVLink2") forced all TCE kills to go via the OPAL call for
NVLink2. However the PHB3 implementation of TCE kill was still being
called directly from some functions which in some circumstances caused
a machine check.

This patch adds an equivalent IODA2 version of the function which uses
the correct invalidation method depending on PHB model and changes all
external callers to use it instead.

Fixes: 616badd2fb ("powerpc/powernv: Use OPAL call for TCE kill on NVLink2")
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:23 +02:00
Alexey Kardashevskiy
8781143e13 powerpc/iommu: Do not call PageTransHuge() on tail pages
commit e889e96e98 upstream.

The CMA pages migration code does not support compound pages at
the moment so it performs few tests before proceeding to actual page
migration.

One of the tests - PageTransHuge() - has VM_BUG_ON_PAGE(PageTail()) as
it is designed to be called on head pages only. Since we also test for
PageCompound(), and it contains PageTail() and PageHead(), we can
simplify the check by leaving just PageCompound() and therefore avoid
possible VM_BUG_ON_PAGE.

Fixes: 2e5bbb5461 ("KVM: PPC: Book3S HV: Migrate pinned pages out of CMA")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:23 +02:00
Tyrel Datwyler
eb0807b288 powerpc/sysfs: Fix reference leak of cpu device_nodes present at boot
commit e76ca27790 upstream.

For CPUs present at boot each logical CPU acquires a reference to the
associated device node of the core. This happens in register_cpu() which
is called by topology_init(). The result of this is that we end up with
a reference held by each thread of the core. However, these references
are never freed if the CPU core is DLPAR removed.

This patch fixes the reference leaks by acquiring and releasing the references
in the CPU hotplug callbacks un/register_cpu_online(). With this patch symmetric
reference counting is observed with both CPUs present at boot, and those DLPAR
added after boot.

Fixes: f86e4718f2 ("driver/core: cpu: initialize of_node in cpu's device struture")
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:22 +02:00
Tyrel Datwyler
d731227327 powerpc/pseries: Fix of_node_put() underflow during DLPAR remove
commit 68baf692c4 upstream.

Historically struct device_node references were tracked using a kref embedded as
a struct field. Commit 75b57ecf9d ("of: Make device nodes kobjects so they
show up in sysfs") (Mar 2014) refactored device_nodes to be kobjects such that
the device tree could by more simply exposed to userspace using sysfs.

Commit 0829f6d1f6 ("of: device_node kobject lifecycle fixes") (Mar 2014)
followed up these changes to better control the kobject lifecycle and in
particular the referecne counting via of_node_get(), of_node_put(), and
of_node_init().

A result of this second commit was that it introduced an of_node_put() call when
a dynamic node is detached, in of_node_remove(), that removes the initial kobj
reference created by of_node_init().

Traditionally as the original dynamic device node user the pseries code had
assumed responsibilty for releasing this final reference in its platform
specific DLPAR detach code.

This patch fixes a refcount underflow introduced by commit 0829f6d1f6, and
recently exposed by the upstreaming of the recount API.

Messages like the following are no longer seen in the kernel log with this
patch following DLPAR remove operations of cpus and pci devices.

  rpadlpar_io: slot PHB 72 removed
  refcount_t: underflow; use-after-free.
  ------------[ cut here ]------------
  WARNING: CPU: 5 PID: 3335 at lib/refcount.c:128 refcount_sub_and_test+0xf4/0x110

Fixes: 0829f6d1f6 ("of: device_node kobject lifecycle fixes")
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
[mpe: Make change log commit references more verbose]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:22 +02:00
Mahesh Salgaonkar
11e382b3b8 powerpc/book3s/mce: Move add_taint() later in virtual mode
commit d93b0ac01a upstream.

machine_check_early() gets called in real mode. The very first time when
add_taint() is called, it prints a warning which ends up calling opal
call (that uses OPAL_CALL wrapper) for writing it to console. If we get a
very first machine check while we are in opal we are doomed. OPAL_CALL
overwrites the PACASAVEDMSR in r13 and in this case when we are done with
MCE handling the original opal call will use this new MSR on it's way
back to opal_return. This usually leads to unexpected behaviour or the
kernel to panic. Instead move the add_taint() call later in the virtual
mode where it is safe to call.

This is broken with current FW level. We got lucky so far for not getting
very first MCE hit while in OPAL. But easily reproducible on Mambo.

Fixes: 27ea2c420c ("powerpc: Set the correct kernel taint on machine check errors.")
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:22 +02:00
Russell Currey
e548d552c0 powerpc/eeh: Avoid use after free in eeh_handle_special_event()
commit daeba2956f upstream.

eeh_handle_special_event() is called when an EEH event is detected but
can't be narrowed down to a specific PE.  This function looks through
every PE to find one in an erroneous state, then calls the regular event
handler eeh_handle_normal_event() once it knows which PE has an error.

However, if eeh_handle_normal_event() found that the PE cannot possibly
be recovered, it will free it, rendering the passed PE stale.
This leads to a use after free in eeh_handle_special_event() as it attempts to
clear the "recovering" state on the PE after eeh_handle_normal_event() returns.

Thus, make sure the PE is valid when attempting to clear state in
eeh_handle_special_event().

Fixes: 8a6b1bc70d ("powerpc/eeh: EEH core to handle special event")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:22 +02:00
David Gibson
63f44c113b powerpc/mm: Ensure IRQs are off in switch_mm()
commit 9765ad134a upstream.

powerpc expects IRQs to already be (soft) disabled when switch_mm() is
called, as made clear in the commit message of 9c1e105238 ("powerpc: Allow
perf_counters to access user memory at interrupt time").

Aside from any race conditions that might exist between switch_mm() and an IRQ,
there is also an unconditional hard_irq_disable() in switch_slb(). If that isn't
followed at some point by an IRQ enable then interrupts will remain disabled
until we return to userspace.

It is true that when switch_mm() is called from the scheduler IRQs are off, but
not when it's called by use_mm(). Looking closer we see that last year in commit
f98db6013c ("sched/core: Add switch_mm_irqs_off() and use it in the scheduler")
this was made more explicit by the addition of switch_mm_irqs_off() which is now
called by the scheduler, vs switch_mm() which is used by use_mm().

Arguably it is a bug in use_mm() to call switch_mm() in a different context than
it expects, but fixing that will take time.

This was discovered recently when vhost started throwing warnings such as:

  BUG: sleeping function called from invalid context at kernel/mutex.c:578
  in_atomic(): 0, irqs_disabled(): 1, pid: 10768, name: vhost-10760
  no locks held by vhost-10760/10768.
  irq event stamp: 10
  hardirqs last  enabled at (9):  _raw_spin_unlock_irq+0x40/0x80
  hardirqs last disabled at (10): switch_slb+0x2e4/0x490
  softirqs last  enabled at (0):  copy_process+0x5e8/0x1260
  softirqs last disabled at (0):  (null)
  Call Trace:
    show_stack+0x88/0x390 (unreliable)
    dump_stack+0x30/0x44
    __might_sleep+0x1c4/0x2d0
    mutex_lock_nested+0x74/0x5c0
    cgroup_attach_task_all+0x5c/0x180
    vhost_attach_cgroups_work+0x58/0x80 [vhost]
    vhost_worker+0x24c/0x3d0 [vhost]
    kthread+0xec/0x100
    ret_from_kernel_thread+0x5c/0xd4

Prior to commit 04b96e5528 ("vhost: lockless enqueuing") (Aug 2016) the
vhost_worker() would do a spin_unlock_irq() not long after calling use_mm(),
which had the effect of reenabling IRQs. Since that commit removed the locking
in vhost_worker() the body of the vhost_worker() loop now runs with interrupts
off causing the warnings.

This patch addresses the problem by making the powerpc code mirror the x86 code,
ie. we disable interrupts in switch_mm(), and optimise the scheduler case by
defining switch_mm_irqs_off().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[mpe: Flesh out/rewrite change log, add stable]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:22 +02:00
Ankit Kumar
f25c78c879 pstore: Fix flags to enable dumps on powerpc
commit 041939c1ec upstream.

After commit c950fd6f20 kernel registers pstore write based on flag set.
Pstore write for powerpc is broken as flags(PSTORE_FLAGS_DMESG) is not set for
powerpc architecture. On panic, kernel doesn't write message to
/fs/pstore/dmesg*(Entry doesn't gets created at all).

This patch enables pstore write for powerpc architecture by setting
PSTORE_FLAGS_DMESG flag.

Fixes: c950fd6f20 ("pstore: Split pstore fragile flags")
Signed-off-by: Ankit Kumar <ankit@linux.vnet.ibm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:51 +02:00
Linus Torvalds
92b4fc7563 Merge tag 'powerpc-4.11-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
 "Just two fixes.

  The first fixes kprobing a stdu, and is marked for stable as it's been
  broken for ~ever. In hindsight this could have gone in next.

  The other is a fix for a change we merged this cycle, where if we take
  a certain exception when the kernel is running relocated (currently
  only used for kdump), we checkstop the box.

  Thanks to Ravi Bangoria"

* tag 'powerpc-4.11-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64: Fix HMI exception on LE with CONFIG_RELOCATABLE=y
  powerpc/kprobe: Fix oops when kprobed on 'stdu' instruction
2017-04-21 09:34:45 -07:00
Michael Ellerman
be5c5e843c powerpc/64: Fix HMI exception on LE with CONFIG_RELOCATABLE=y
Prior to commit 2337d20728 ("powerpc/64: CONFIG_RELOCATABLE support for hmi
interrupts"), the branch from hmi_exception_early() to hmi_exception_realmode()
was just a bl hmi_exception_realmode, which the linker would turn into a bl to
the local entry point of hmi_exception_realmode. This was broken when
CONFIG_RELOCATABLE=y because hmi_exception_realmode() is not in the low part of
the kernel text that is copied down to 0x0.

But in fixing that, we added a new bug on little endian kernels. Because the
branch is now a bctrl when CONFIG_RELOCATABLE=y, we branch to the global entry
point of hmi_exception_realmode(). The global entry point must be called with
r12 containing the address of hmi_exception_realmode(), because it uses that
value to calculate the TOC value (r2).

This may manifest as a checkstop, because we take a junk value from r12 which
came from HSRR1, add a small constant to it and then use that as the TOC
pointer. The HSRR1 value will have 0x9 as the top nibble, which puts it above
RAM and somewhere in MMIO space.

Fix it by changing the BRANCH_LINK_TO_FAR() macro to always use r12 to load the
label we're branching to. This means r12 will be setup correctly on LE, fixing
this bug, and r12 is also volatile across function calls on BE so it's a good
choice anyway.

Fixes: 2337d20728 ("powerpc/64: CONFIG_RELOCATABLE support for hmi interrupts")
Reported-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-18 20:19:52 +10:00
Ravi Bangoria
9e1ba4f27f powerpc/kprobe: Fix oops when kprobed on 'stdu' instruction
If we set a kprobe on a 'stdu' instruction on powerpc64, we see a kernel
OOPS:

  Bad kernel stack pointer cd93c840 at c000000000009868
  Oops: Bad kernel stack pointer, sig: 6 [#1]
  ...
  GPR00: c000001fcd93cb30 00000000cd93c840 c0000000015c5e00 00000000cd93c840
  ...
  NIP [c000000000009868] resume_kernel+0x2c/0x58
  LR [c000000000006208] program_check_common+0x108/0x180

On a 64-bit system when the user probes on a 'stdu' instruction, the kernel does
not emulate actual store in emulate_step() because it may corrupt the exception
frame. So the kernel does the actual store operation in exception return code
i.e. resume_kernel().

resume_kernel() loads the saved stack pointer from memory using lwz, which only
loads the low 32-bits of the address, causing the kernel crash.

Fix this by loading the 64-bit value instead.

Fixes: be96f63375 ("powerpc: Split out instruction analysis part of emulate_step()")
Cc: stable@vger.kernel.org # v3.18+
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
[mpe: Change log massage, add stable tag]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-18 20:19:21 +10:00
Linus Torvalds
894ca30cf6 Merge tag 'powerpc-4.11-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
 "Some more powerpc fixes for 4.11:

  Headed to stable:

   - disable HFSCR[TM] if TM is not supported, fixes a potential host
     kernel crash triggered by a hostile guest, but only in
     configurations that no one uses

   - don't try to fix up misaligned load-with-reservation instructions

   - fix flush_(d|i)cache_range() called from modules on little endian
     kernels

   - add missing global TLB invalidate if cxl is active

   - fix missing preempt_disable() in crc32c-vpmsum

  And a fix for selftests build changes that went in this release:

   - selftests/powerpc: Fix standalone powerpc build

  Thanks to: Benjamin Herrenschmidt, Frederic Barrat, Oliver O'Halloran,
  Paul Mackerras"

* tag 'powerpc-4.11-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/crypto/crc32c-vpmsum: Fix missing preempt_disable()
  powerpc/mm: Add missing global TLB invalidate if cxl is active
  powerpc/64: Fix flush_(d|i)cache_range() called from modules
  powerpc: Don't try to fix up misaligned load-with-reservation instructions
  powerpc: Disable HFSCR[TM] if TM is not supported
  selftests/powerpc: Fix standalone powerpc build
2017-04-08 11:06:12 -07:00
Michael Ellerman
4749228f02 powerpc/crypto/crc32c-vpmsum: Fix missing preempt_disable()
In crc32c_vpmsum() we call enable_kernel_altivec() without first
disabling preemption, which is not allowed:

  WARNING: CPU: 9 PID: 2949 at ../arch/powerpc/kernel/process.c:277 enable_kernel_altivec+0x100/0x120
  Modules linked in: dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c vmx_crypto ...
  CPU: 9 PID: 2949 Comm: docker Not tainted 4.11.0-rc5-compiler_gcc-6.3.1-00033-g308ac7563944 #381
  ...
  NIP [c00000000001e320] enable_kernel_altivec+0x100/0x120
  LR [d000000003df0910] crc32c_vpmsum+0x108/0x150 [crc32c_vpmsum]
  Call Trace:
    0xc138fd09 (unreliable)
    crc32c_vpmsum+0x108/0x150 [crc32c_vpmsum]
    crc32c_vpmsum_update+0x3c/0x60 [crc32c_vpmsum]
    crypto_shash_update+0x88/0x1c0
    crc32c+0x64/0x90 [libcrc32c]
    dm_bm_checksum+0x48/0x80 [dm_persistent_data]
    sb_check+0x84/0x120 [dm_thin_pool]
    dm_bm_validate_buffer.isra.0+0xc0/0x1b0 [dm_persistent_data]
    dm_bm_read_lock+0x80/0xf0 [dm_persistent_data]
    __create_persistent_data_objects+0x16c/0x810 [dm_thin_pool]
    dm_pool_metadata_open+0xb0/0x1a0 [dm_thin_pool]
    pool_ctr+0x4cc/0xb60 [dm_thin_pool]
    dm_table_add_target+0x16c/0x3c0
    table_load+0x184/0x400
    ctl_ioctl+0x2f0/0x560
    dm_ctl_ioctl+0x38/0x50
    do_vfs_ioctl+0xd8/0x920
    SyS_ioctl+0x68/0xc0
    system_call+0x38/0xfc

It used to be sufficient just to call pagefault_disable(), because that
also disabled preemption. But the two were decoupled in commit 8222dbe21e
("sched/preempt, mm/fault: Decouple preemption from the page fault
logic") in mid 2015.

So add the missing preempt_disable/enable(). We should also call
disable_kernel_fp(), although it does nothing by default, there is a
debug switch to make it active and all enables should be paired with
disables.

Fixes: 6dd7a82cc5 ("crypto: powerpc - Add POWER8 optimised crc32c")
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-07 21:12:58 +10:00
Dan Carpenter
abd80dcbc4 KVM: PPC: Book3S HV: Check for kmalloc errors in ioctl
kzalloc() won't actually fail because sizeof(*resize) is small, but
static checkers complain.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-04-06 15:50:43 +10:00
Frederic Barrat
88b1bf7268 powerpc/mm: Add missing global TLB invalidate if cxl is active
Commit 4c6d9acce1 ("powerpc/mm: Add hooks for cxl") converted local
TLB invalidates to global if the cxl driver is active. This is necessary
because the CAPP snoops invalidations to forward them to the PSL on the
cxl adapter. However one path was forgotten. native_flush_hash_range()
still does local TLB invalidates, as found out the hard way recently.

This patch fixes it by following the same logic as previously: if the
cxl driver is active, the local TLB invalidates are 'upgraded' to
global.

Fixes: 4c6d9acce1 ("powerpc/mm: Add hooks for cxl")
Cc: stable@vger.kernel.org # v3.18+
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-05 22:13:37 +10:00
Oliver O'Halloran
8f5f525d5b powerpc/64: Fix flush_(d|i)cache_range() called from modules
When the kernel is compiled to use 64bit ABIv2 the _GLOBAL() macro does
not include a global entry point. A function's global entry point is
used when the function is called from a different TOC context and in the
kernel this typically means a call from a module into the vmlinux (or
vice-versa).

There are a few exported asm functions declared with _GLOBAL() and
calling them from a module will likely crash the kernel since any TOC
relative load will yield garbage.

flush_icache_range() and flush_dcache_range() are both exported to
modules, and use the TOC, so must use _GLOBAL_TOC().

Fixes: 721aeaa9fd ("powerpc: Build little endian ppc64 kernel with ABIv2")
Cc: stable@vger.kernel.org # v3.16+
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-05 21:40:21 +10:00
Paul Mackerras
48fe9e9488 powerpc: Don't try to fix up misaligned load-with-reservation instructions
In the past, there was only one load-with-reservation instruction,
lwarx, and if a program attempted a lwarx on a misaligned address, it
would take an alignment interrupt and the kernel handler would emulate
it as though it was lwzx, which was not really correct, but benign since
it is loading the right amount of data, and the lwarx should be paired
with a stwcx. to the same address, which would also cause an alignment
interrupt which would result in a SIGBUS being delivered to the process.

We now have 5 different sizes of load-with-reservation instruction. Of
those, lharx and ldarx cause an immediate SIGBUS by luck since their
entries in aligninfo[] overlap instructions which were not fixed up, but
lqarx overlaps with lhz and will be emulated as such. lbarx can never
generate an alignment interrupt since it only operates on 1 byte.

To straighten this out and fix the lqarx case, this adds code to detect
the l[hwdq]arx instructions and return without fixing them up, resulting
in a SIGBUS being delivered to the process.

Cc: stable@vger.kernel.org
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-04 23:16:57 +10:00
Benjamin Herrenschmidt
7ed23e1bae powerpc: Disable HFSCR[TM] if TM is not supported
On Power8 & Power9 the early CPU inititialisation in __init_HFSCR()
turns on HFSCR[TM] (Hypervisor Facility Status and Control Register
[Transactional Memory]), but that doesn't take into account that TM
might be disabled by CPU features, or disabled by the kernel being built
with CONFIG_PPC_TRANSACTIONAL_MEM=n.

So later in boot, when we have setup the CPU features, clear HSCR[TM] if
the TM CPU feature has been disabled. We use CPU_FTR_TM_COMP to account
for the CONFIG_PPC_TRANSACTIONAL_MEM=n case.

Without this a KVM guest might try use TM, even if told not to, and
cause an oops in the host kernel. Typically the oops is seen in
__kvmppc_vcore_entry() and may or may not be fatal to the host, but is
always bad news.

In practice all shipping CPU revisions do support TM, and all host
kernels we are aware of build with TM support enabled, so no one should
actually be able to hit this in the wild.

Fixes: 2a3563b023 ("powerpc: Setup in HFSCR for POWER8")
Cc: stable@vger.kernel.org # v3.10+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
[mpe: Rewrite change log with input from Sam, add Fixes/stable]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-28 19:52:37 +11:00
Paul Mackerras
fc36a90326 Revert "powerpc/64: Disable use of radix under a hypervisor"
This reverts commit 3f91a89d42.

Now that we do have the machinery for using the radix MMU under a
hypervisor, the extra check and comment introduced in 3f91a89d42 are
no longer correct.  The result is that when booted under a hypervisor
that only allows use of radix, we clear the MMU_FTR_TYPE_RADIX and
then set it again, and print a warning about ignoring the
disable_radix command line option, even though the command line does
not include "disable_radix".

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-21 16:50:28 +11:00
Nicholas Piggin
6d98ce0be5 powerpc/64s: Fix idle wakeup potential to clobber registers
We concluded there may be a window where the idle wakeup code could get
to pnv_wakeup_tb_loss() (which clobbers non-volatile GPRs), but the
hardware may set SRR1[46:47] to 01b (no state loss) which would result
in the wakeup code failing to restore non-volatile GPRs.

I was not able to trigger this condition with trivial tests on real
hardware or simulator, but the ISA (at least 2.07) seems to allow for
it, and Gautham says that it can happen if there is an exception pending
when the sleep/winkle instruction is executed.

Fixes: 1706567117 ("powerpc/kvm: make hypervisor state restore a function")
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-20 20:35:12 +11:00
Linus Torvalds
a07a6e4121 Merge tag 'powerpc-4.11-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull more powerpc fixes from Michael Ellerman:
 "A couple of minor powerpc fixes for 4.11:

   - wire up statx() syscall

   - don't print a warning on memory hotplug when HPT resizing isn't
     available

  Thanks to: David Gibson, Chandan Rajendra"

* tag 'powerpc-4.11-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/pseries: Don't give a warning when HPT resizing isn't available
  powerpc: Wire up statx() syscall
2017-03-19 18:49:28 -07:00
Michael Ellerman
8971e1c79d powerpc/pseries: Don't give a warning when HPT resizing isn't available
As of commit 438cc81a41 ("powerpc/pseries: Automatically resize HPT
for memory hot add/remove"), when running on the pseries platform, we
always attempt to use the PAPR extension to resize the hashed page
table (HPT) when we add or remove memory.

This is fine, but when the extension is not available we'll give a
harmless, but scary warning. Instead check if the firmware supports HPT
resizing before populating the mmu_hash_ops.resize_hpt pointer.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-17 16:10:58 +11:00
Chandan Rajendra
f717629c7f powerpc: Wire up statx() syscall
Test runs on a ppc64 BE guest succeeded. linux/samples/statx/test-statx
program was executed on the following file types,

1. Regular file
2. Directory
3. device file
4. symlink
5. Named pipe

The test run also included invoking test-statx with the runtime options
provided in the main() function of test-statx.c

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-16 20:45:53 +11:00
Linus Torvalds
defc7d7522 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:

 - self-test failure of crc32c on powerpc

 - regressions of ecb(aes) when used with xts/lrw in s5p-sss

 - a number of bugs in the omap RNG driver

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: s5p-sss - Fix spinlock recursion on LRW(AES)
  hwrng: omap - Do not access INTMASK_REG on EIP76
  hwrng: omap - use devm_clk_get() instead of of_clk_get()
  hwrng: omap - write registers after enabling the clock
  crypto: s5p-sss - Fix completing crypto request in IRQ handler
  crypto: powerpc - Fix initialisation of crc32c context
2017-03-15 09:26:04 -07:00
Linus Torvalds
fb5fe0fd62 Merge tag 'powerpc-4.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull some more powerpc fixes from Michael Ellerman:
 "The main item is the addition of the Power9 Machine Check handler.
  This was delayed to make sure some details were correct, and is as
  minimal as possible.

  The rest is small fixes, two for the Power9 PMU, two dealing with
  obscure toolchain problems, two for the PowerNV IOMMU code (used by
  VFIO), and one to fix a crash on 32-bit machines with macio devices
  due to missing dma_ops.

  Thanks to:
    Alexey Kardashevskiy, Cyril Bur, Larry Finger, Madhavan Srinivasan,
    Nicholas Piggin"

* tag 'powerpc-4.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64s: POWER9 machine check handler
  powerpc/64s: allow machine check handler to set severity and initiator
  powerpc/64s: fix handling of non-synchronous machine checks
  powerpc/pmac: Fix crash in dma-mapping.h with NULL dma_ops
  powerpc/powernv/ioda2: Update iommu table base on ownership change
  powerpc/powernv/ioda2: Gracefully fail if too many TCE levels requested
  selftests/powerpc: Replace stxvx and lxvx with stxvd2x/lxvd2x
  powerpc/perf: Handle sdar_mode for marked event in power9
  powerpc/perf: Fix perf_get_data_addr() for power9 DD1
  powerpc/boot: Fix zImage TOC alignment
2017-03-13 19:48:22 -07:00
Linus Torvalds
5a45a5a881 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:

 - a fix for the kexec/purgatory regression which was introduced in the
   merge window via an innocent sparse fix. We could have reverted that
   commit, but on deeper inspection it turned out that the whole
   machinery is neither documented nor robust. So a proper cleanup was
   done instead

 - the fix for the TLB flush issue which was discovered recently

 - a simple typo fix for a reboot quirk

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tlb: Fix tlb flushing when lguest clears PGE
  kexec, x86/purgatory: Unbreak it and clean it up
  x86/reboot/quirks: Fix typo in ASUS EeeBook X205TA reboot quirk
2017-03-12 14:18:49 -07:00
Thomas Gleixner
40c50c1fec kexec, x86/purgatory: Unbreak it and clean it up
The purgatory code defines global variables which are referenced via a
symbol lookup in the kexec code (core and arch).

A recent commit addressing sparse warnings made these static and thereby
broke kexec_file.

Why did this happen? Simply because the whole machinery is undocumented and
lacks any form of forward declarations. The variable names are unspecific
and lack a prefix, so adding forward declarations creates shadow variables
in the core code. Aside of that the code relies on magic constants and
duplicate struct definitions with no way to ensure that these things stay
in sync. The section placement of the purgatory variables happened by
chance and not by design.

Unbreak kexec and cleanup the mess:

 - Add proper forward declarations and document the usage
 - Use common struct definition
 - Use the proper common defines instead of magic constants
 - Add a purgatory_ prefix to have a proper name space
 - Use ARRAY_SIZE() instead of a homebrewn reimplementation
 - Add proper sections to the purgatory variables [ From Mike ]

Fixes: 72042a8c7b ("x86/purgatory: Make functions and variables static")
Reported-by: Mike Galbraith <<efault@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Nicholas Mc Guire <der.herr@hofr.at>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Tobin C. Harding" <me@tobin.cc>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1703101315140.3681@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-10 20:55:09 +01:00