Commit Graph

388 Commits

Author SHA1 Message Date
Kan Liang
e1f0f55f27 perf/x86/intel/uncore: Add more IMC PCI IDs for KabyLake and CoffeeLake CPUs
[ Upstream commit c10a8de0d3 ]

KabyLake and CoffeeLake CPUs have the same client uncore events as SkyLake.

Add the PCI IDs for the KabyLake Y, U, S processor lines and CoffeeLake U,
H, S processor lines.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/20181019170419.378-1-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-01 09:42:53 +01:00
Natarajan, Janakarajan
784f839589 perf/x86/amd/uncore: Set ThreadMask and SliceMask for L3 Cache perf events
[ Upstream commit d7cbbe49a9 ]

In Family 17h, some L3 Cache Performance events require the ThreadMask
and SliceMask to be set. For other events, these fields do not affect
the count either way.

Set ThreadMask and SliceMask to 0xFF and 0xF respectively.

Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H . Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suravee <Suravee.Suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/Message-ID:
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-11-04 14:52:40 +01:00
Kan Liang
40568f21f2 perf/x86/intel/uncore: Fix PCI BDF address of M3UPI on SKX
[ Upstream commit 9d92cfeaf5 ]

The counters on M3UPI Link 0 and Link 3 don't count properly, and writing
0 to these counters may causes system crash on some machines.

The PCI BDF addresses of the M3UPI in the current code are incorrect.

The correct addresses should be:

  D18:F1	0x204D
  D18:F2	0x204E
  D18:F5	0x204D

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: cd34cd97b7 ("perf/x86/intel/uncore: Add Skylake server uncore support")
Link: http://lkml.kernel.org/r/1537538826-55489-1-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-11-04 14:52:40 +01:00
Jacek Tomaka
1e9054e75d perf/x86/intel: Add support/quirk for the MISPREDICT bit on Knights Landing CPUs
[ Upstream commit 16160c1946 ]

Problem: perf did not show branch predicted/mispredicted bit in brstack.

Output of perf -F brstack for profile collected

Before:

 0x4fdbcd/0x4fdc03/-/-/-/0
 0x45f4c1/0x4fdba0/-/-/-/0
 0x45f544/0x45f4bb/-/-/-/0
 0x45f555/0x45f53c/-/-/-/0
 0x7f66901cc24b/0x45f555/-/-/-/0
 0x7f66901cc22e/0x7f66901cc23d/-/-/-/0
 0x7f66901cc1ff/0x7f66901cc20f/-/-/-/0
 0x7f66901cc1e8/0x7f66901cc1fc/-/-/-/0

After:

 0x4fdbcd/0x4fdc03/P/-/-/0
 0x45f4c1/0x4fdba0/P/-/-/0
 0x45f544/0x45f4bb/P/-/-/0
 0x45f555/0x45f53c/P/-/-/0
 0x7f66901cc24b/0x45f555/P/-/-/0
 0x7f66901cc22e/0x7f66901cc23d/P/-/-/0
 0x7f66901cc1ff/0x7f66901cc20f/P/-/-/0
 0x7f66901cc1e8/0x7f66901cc1fc/P/-/-/0

Cause:

As mentioned in Software Development Manual vol 3, 17.4.8.1,
IA32_PERF_CAPABILITIES[5:0] indicates the format of the address that is
stored in the LBR stack. Knights Landing reports 1 (LBR_FORMAT_LIP) as
its format. Despite that, registers containing FROM address of the branch,
do have MISPREDICT bit but because of the format indicated in
IA32_PERF_CAPABILITIES[5:0], LBR did not read MISPREDICT bit.

Solution:

Teach LBR about above Knights Landing quirk and make it read MISPREDICT bit.

Signed-off-by: Jacek Tomaka <jacek.tomaka@poczta.fm>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180802013830.10600-1-jacekt@dugeo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-10 08:54:25 +02:00
Kan Liang
d5963fae7f perf/x86/intel/lbr: Fix incomplete LBR call stack
[ Upstream commit 0592e57b24 ]

LBR has a limited stack size. If a task has a deeper call stack than
LBR's stack size, only the overflowed part is reported. A complete call
stack may not be reconstructed by perf tool.

Current code doesn't access all LBR registers. It only read the ones
below the TOS. The LBR registers above the TOS will be discarded
unconditionally.

When a CALL is captured, the TOS is incremented by 1 , modulo max LBR
stack size. The LBR HW only records the call stack information to the
register which the TOS points to. It will not touch other LBR
registers. So the registers above the TOS probably still store the valid
call stack information for an overflowed call stack, which need to be
reported.

To retrieve complete call stack information, we need to start from TOS,
read all LBR registers until an invalid entry is detected.
0s can be used to detect the invalid entry, because:

 - When a RET is captured, the HW zeros the LBR register which TOS points
   to, then decreases the TOS.
 - The LBR registers are reset to 0 when adding a new LBR event or
   scheduling an existing LBR event.
 - A taken branch at IP 0 is not expected

The context switch code is also modified to save/restore all valid LBR
registers. Furthermore, the LBR registers, which don't have valid call
stack information, need to be reset in restore, because they may be
polluted while swapped out.

Here is a small test program, tchain_deep.
Its call stack is deeper than 32.

 noinline void f33(void)
 {
        int i;

        for (i = 0; i < 10000000;) {
                if (i%2)
                        i++;
                else
                        i++;
        }
 }

 noinline void f32(void)
 {
        f33();
 }

 noinline void f31(void)
 {
        f32();
 }

 ... ...

 noinline void f1(void)
 {
        f2();
 }

 int main()
 {
        f1();
 }

Here is the test result on SKX. The max stack size of SKX is 32.

Without the patch:

 $ perf record -e cycles --call-graph lbr -- ./tchain_deep
 $ perf report --stdio
 #
 # Children      Self  Command      Shared Object     Symbol
 # ........  ........  ...........  ................  .................
 #
   100.00%    99.99%  tchain_deep    tchain_deep       [.] f33
            |
             --99.99%--f30
                       f31
                       f32
                       f33

With the patch:

 $ perf record -e cycles --call-graph lbr -- ./tchain_deep
 $ perf report --stdio
 # Children      Self  Command      Shared Object     Symbol
 # ........  ........  ...........  ................  ..................
 #
    99.99%     0.00%  tchain_deep    tchain_deep       [.] f1
            |
            ---f1
               f2
               f3
               f4
               f5
               f6
               f7
               f8
               f9
               f10
               f11
               f12
               f13
               f14
               f15
               f16
               f17
               f18
               f19
               f20
               f21
               f22
               f23
               f24
               f25
               f26
               f27
               f28
               f29
               f30
               f31
               f32
               f33

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@kernel.org
Cc: eranian@google.com
Link: https://lore.kernel.org/lkml/1528213126-4312-1-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-03 17:00:52 -07:00
Andy Lutomirski
53f01e2004 x86/nmi: Fix NMI uaccess race against CR3 switching
commit 4012e77a90 upstream.

A NMI can hit in the middle of context switching or in the middle of
switch_mm_irqs_off().  In either case, CR3 might not match current->mm,
which could cause copy_from_user_nmi() and friends to read the wrong
memory.

Fix it by adding a new nmi_uaccess_okay() helper and checking it in
copy_from_user_nmi() and in __copy_from_user_nmi()'s callers.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Rik van Riel <riel@surriel.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/dd956eba16646fd0b15c3c0741269dfd84452dac.1535557289.git.luto@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-05 09:26:39 +02:00
Thomas Gleixner
00f795e12b perf/x86/amd/ibs: Don't access non-started event
[ Upstream commit d2753e6b48 ]

Paul Menzel reported the following bug:

> Enabling the undefined behavior sanitizer and building GNU/Linux 4.18-rc5+
> (with some unrelated commits) with GCC 8.1.0 from Debian Sid/unstable, the
> warning below is shown.
>
> > [    2.111913]
> > ================================================================================
> > [    2.111917] UBSAN: Undefined behaviour in arch/x86/events/amd/ibs.c:582:24
> > [    2.111919] member access within null pointer of type 'struct perf_event'
> > [    2.111926] CPU: 0 PID: 144 Comm: udevadm Not tainted 4.18.0-rc5-00316-g4864b68cedf2 #104
> > [    2.111928] Hardware name: ASROCK E350M1/E350M1, BIOS TIMELESS 01/01/1970
> > [    2.111930] Call Trace:
> > [    2.111943]  dump_stack+0x55/0x89
> > [    2.111949]  ubsan_epilogue+0xb/0x33
> > [    2.111953]  handle_null_ptr_deref+0x7f/0x90
> > [    2.111958]  __ubsan_handle_type_mismatch_v1+0x55/0x60
> > [    2.111964]  perf_ibs_handle_irq+0x596/0x620

The code dereferences event before checking the STARTED bit. Patch
below should cure the issue.

The warning should not trigger, if I analyzed the thing correctly.
(And Paul's testing confirms this.)

Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Menzel <pmenzel+linux-x86@molgen.mpg.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1807200958390.1580@nanos.tec.linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-05 09:26:28 +02:00
Kan Liang
596a9bfe81 perf/x86/intel/uncore: Correct fixed counter index check for NHM
[ Upstream commit d71f11c076 ]

For Nehalem and Westmere, there is only one fixed counter for W-Box.
There is no index which is bigger than UNCORE_PMC_IDX_FIXED.
It is not correct to use >= to check fixed counter.
The code quality issue will bring problem when new counter index is
introduced.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: acme@kernel.org
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1525371913-10597-2-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-03 07:50:26 +02:00
Kan Liang
71b1bf6e97 perf/x86/intel/uncore: Correct fixed counter index check in generic code
[ Upstream commit 4749f81964 ]

There is no index which is bigger than UNCORE_PMC_IDX_FIXED. The only
exception is client IMC uncore, which has been specially handled.
For generic code, it is not correct to use >= to check fixed counter.
The code quality issue will bring problem when a new counter index is
introduced.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: acme@kernel.org
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1525371913-10597-3-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-03 07:50:26 +02:00
Hugh Dickins
aa49e48232 x86/events/intel/ds: Fix bts_interrupt_threshold alignment
commit 2c991e408d upstream.

Markus reported that BTS is sporadically missing the tail of the trace
in the perf_event data buffer: [decode error (1): instruction overflow]
shown in GDB; and bisected it to the conversion of debug_store to PTI.

A little "optimization" crept into alloc_bts_buffer(), which mistakenly
placed bts_interrupt_threshold away from the 24-byte record boundary.
Intel SDM Vol 3B 17.4.9 says "This address must point to an offset from
the BTS buffer base that is a multiple of the BTS record size."

Revert "max" from a byte count to a record count, to calculate the
bts_interrupt_threshold correctly: which turns out to fix problem seen.

Fixes: c1961a4631 ("x86/events/intel/ds: Map debug buffers in cpu_entry_area")
Reported-and-tested-by: Markus T Metzger <markus.t.metzger@intel.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: stable@vger.kernel.org # v4.14+
Link: https://lkml.kernel.org/r/alpine.LSU.2.11.1807141248290.1614@eggly.anvils
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-25 11:25:07 +02:00
Kan Liang
3564366d55 perf/x86/intel/uncore: Add event constraint for BDX PCU
commit bb9fbe1b57 upstream.

Event select bit 7 'Use Occupancy' in PCU Box is not available for
counter 0 on BDX

Add a constraint to fix it.

Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Stephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: ak@linux.intel.com
Link: https://lkml.kernel.org/r/1510668400-301000-1-git-send-email-kan.liang@intel.com
Cc: "Jin, Yao" <yao.jin@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03 11:25:01 +02:00
Kan Liang
af22d1b770 perf/x86/intel: Don't enable freeze-on-smi for PerfMon V1
[ Upstream commit 4e949e9b9d ]

The SMM freeze feature was introduced since PerfMon V2. But the current
code unconditionally enables the feature for all platforms. It can
generate #GP exception, if the related FREEZE_WHILE_SMM bit is set for
the machine with PerfMon V1.

To disable the feature for PerfMon V1, perf needs to
- Remove the freeze_on_smi sysfs entry by moving intel_pmu_attrs to
  intel_pmu, which is only applied to PerfMon V2 and later.
- Check the PerfMon version before flipping the SMM bit when starting CPU

Fixes: 6089327f54 ("perf/x86: Add sysfs entry to freeze counters on SMI")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: ak@linux.intel.com
Cc: eranian@google.com
Cc: acme@redhat.com
Link: https://lkml.kernel.org/r/1524682637-63219-1-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-21 04:02:48 +09:00
Kan Liang
b9e852513f perf/x86/intel: Fix event update for auto-reload
[ Upstream commit d31fc13fdc ]

There is a bug when reading event->count with large PEBS enabled.

Here is an example:

  # ./read_count
  0x71f0
  0x122c0
  0x1000000001c54
  0x100000001257d
  0x200000000bdc5

In fixed period mode, the auto-reload mechanism could be enabled for
PEBS events, but the calculation of event->count does not take the
auto-reload values into account.

Anyone who reads event->count will get the wrong result, e.g x86_pmu_read().

This bug was introduced with the auto-reload mechanism enabled since
commit:

  851559e35f ("perf/x86/intel: Use the PEBS auto reload mechanism when possible")

Introduce intel_pmu_save_and_restart_reload() to calculate the
event->count only for auto-reload.

Since the counter increments a negative counter value and overflows on
the sign switch, giving the interval:

        [-period, 0]

the difference between two consequtive reads is:

 A) value2 - value1;
    when no overflows have happened in between,
 B) (0 - value1) + (value2 - (-period));
    when one overflow happened in between,
 C) (0 - value1) + (n - 1) * (period) + (value2 - (-period));
    when @n overflows happened in between.

Here A) is the obvious difference, B) is the extension to the discrete
interval, where the first term is to the top of the interval and the
second term is from the bottom of the next interval and C) the extension
to multiple intervals, where the middle term is the whole intervals
covered.

The equation for all cases is:

    value2 - value1 + n * period

Previously the event->count is updated right before the sample output.
But for case A, there is no PEBS record ready. It needs to be specially
handled.

Remove the auto-reload code from x86_perf_event_set_period() since
we'll not longer call that function in this case.

Based-on-code-from: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Fixes: 851559e35f ("perf/x86/intel: Use the PEBS auto reload mechanism when possible")
Link: http://lkml.kernel.org/r/1518474035-21006-2-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30 07:52:35 +02:00
Kan Liang
359769ca6d perf/x86/intel: Fix large period handling on Broadwell CPUs
[ Upstream commit f605cfca8c ]

Large fixed period values could be truncated on Broadwell, for example:

  perf record -e cycles -c 10000000000

Here the fixed period is 0x2540BE400, but the period which finally applied is
0x540BE400 - which is wrong.

The reason is that x86_pmu::limit_period() uses an u32 parameter, so the
high 32 bits of 'period' get truncated.

This bug was introduced in:

  commit 294fe0f52a ("perf/x86/intel: Add INST_RETIRED.ALL workarounds")

It's safe to use u64 instead of u32:

 - Although the 'left' is s64, the value of 'left' must be positive when
   calling limit_period().

 - bdw_limit_period() only modifies the lowest 6 bits, it doesn't touch
   the higher 32 bits.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 294fe0f52a ("perf/x86/intel: Add INST_RETIRED.ALL workarounds")
Link: http://lkml.kernel.org/r/1519926894-3520-1-git-send-email-kan.liang@linux.intel.com
[ Rewrote unacceptably bad changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30 07:52:35 +02:00
Kan Liang
017f2ee206 perf/x86/intel: Properly save/restore the PMU state in the NMI handler
[ Upstream commit 82d71ed027 ]

The PMU is disabled in intel_pmu_handle_irq(), but cpuc->enabled is not updated
accordingly.

This is fine in current usage because no-one checks it - but fix it
for future code: for example, the drain_pebs() will be modified to
fix an auto-reload bug.

Properly save/restore the old PMU state.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Cc: kernel test robot <fengguang.wu@intel.com>
Link: http://lkml.kernel.org/r/6f44ee84-56f8-79f1-559b-08e371eaeb78@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30 07:52:34 +02:00
Stephane Eranian
06956ca1aa perf/x86/intel: Fix linear IP of PEBS real_ip on Haswell and later CPUs
[ Upstream commit 71eb9ee959 ]

this patch fix a bug in how the pebs->real_ip is handled in the PEBS
handler. real_ip only exists in Haswell and later processor. It is
actually the eventing IP, i.e., where the event occurred. As opposed
to the pebs->ip which is the PEBS interrupt IP which is always off
by one.

The problem is that the real_ip just like the IP needs to be fixed up
because PEBS does not record all the machine state registers, and
in particular the code segement (cs). This is why we have the set_linear_ip()
function. The problem was that set_linear_ip() was only used on the pebs->ip
and not the pebs->real_ip.

We have profiles which ran into invalid callstacks because of this.
Here is an example:

 .....  0: ffffffffffffff80 recent entry, marker kernel v
 .....  1: 000000000040044d <= user address in kernel space!
 .....  2: fffffffffffffe00 marker enter user v
 .....  3: 000000000040044d
 .....  4: 00000000004004b6 oldest entry

Debugging output in get_perf_callchain():

 [  857.769909] CALLCHAIN: CPU8 ip=40044d regs->cs=10 user_mode(regs)=0

The problem is that the kernel entry in 1: points to a user level
address. How can that be?

The reason is that with PEBS sampling the instruction that caused the event
to occur and the instruction where the CPU was when the interrupt was posted
may be far apart. And sometime during that time window, the privilege level may
change. This happens, for instance, when the PEBS sample is taken close to a
kernel entry point. Here PEBS, eventing IP (real_ip) captured a user level
instruction. But by the time the PMU interrupt fired, the processor had already
entered kernel space. This is why the debug output shows a user address with
user_mode() false.

The problem comes from PEBS not recording the code segment (cs) register.
The register is used in x86_64 to determine if executing in kernel vs user
space. This is okay because the kernel has a software workaround called
set_linear_ip(). But the issue in setup_pebs_sample_data() is that
set_linear_ip() is never called on the real_ip value when it is available
(Haswell and later) and precise_ip > 1.

This patch fixes this problem and eliminates the callchain discrepancy.

The patch restructures the code around set_linear_ip() to minimize the number
of times the IP has to be set.

Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: kan.liang@intel.com
Link: http://lkml.kernel.org/r/1521788507-10231-1-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30 07:52:20 +02:00
Peter Zijlstra
82e91e07e6 perf/x86: Fix possible Spectre-v1 indexing for x86_pmu::event_map()
commit 46b1b57722 upstream.

> arch/x86/events/intel/cstate.c:307 cstate_pmu_event_init() warn: potential spectre issue 'pkg_msr' (local cap)
> arch/x86/events/intel/core.c:337 intel_pmu_event_map() warn: potential spectre issue 'intel_perfmon_event_map'
> arch/x86/events/intel/knc.c:122 knc_pmu_event_map() warn: potential spectre issue 'knc_perfmon_event_map'
> arch/x86/events/intel/p4.c:722 p4_pmu_event_map() warn: potential spectre issue 'p4_general_events'
> arch/x86/events/intel/p6.c:116 p6_pmu_event_map() warn: potential spectre issue 'p6_perfmon_event_map'
> arch/x86/events/amd/core.c:132 amd_pmu_event_map() warn: potential spectre issue 'amd_perfmon_event_map'

Userspace controls @attr, sanitize @attr->config before passing it on
to x86_pmu::event_map().

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-16 10:10:31 +02:00
Peter Zijlstra
6467123872 perf/x86/msr: Fix possible Spectre-v1 indexing in the MSR driver
commit 06ce6e9b6d upstream.

> arch/x86/events/msr.c:178 msr_event_init() warn: potential spectre issue 'msr' (local cap)

Userspace controls @attr, sanitize cfg (attr->config) before using it
to index an array.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-16 10:10:31 +02:00
Peter Zijlstra
4e4bb64df8 perf/x86/cstate: Fix possible Spectre-v1 indexing for pkg_msr
commit a5f81290ce upstream.

> arch/x86/events/intel/cstate.c:307 cstate_pmu_event_init() warn: potential spectre issue 'pkg_msr' (local cap)

Userspace controls @attr, sanitize cfg (attr->config) before using it
to index an array.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-16 10:10:31 +02:00
Peter Zijlstra
df2c71fb5c perf/x86: Fix possible Spectre-v1 indexing for hw_perf_event cache_*
commit ef9ee4ad38 upstream.

> arch/x86/events/core.c:319 set_ext_hw_attr() warn: potential spectre issue 'hw_cache_event_ids[cache_type]' (local cap)
> arch/x86/events/core.c:319 set_ext_hw_attr() warn: potential spectre issue 'hw_cache_event_ids' (local cap)
> arch/x86/events/core.c:328 set_ext_hw_attr() warn: potential spectre issue 'hw_cache_extra_regs[cache_type]' (local cap)
> arch/x86/events/core.c:328 set_ext_hw_attr() warn: potential spectre issue 'hw_cache_extra_regs' (local cap)

Userspace controls @config which contains 3 (byte) fields used for a 3
dimensional array deref.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-16 10:10:31 +02:00
Kan Liang
294a6268bf perf/x86/intel/uncore: Fix multi-domain PCI CHA enumeration bug on Skylake servers
commit 320b0651f3 upstream.

The number of CHAs is miscalculated on multi-domain PCI Skylake server systems,
resulting in an uncore driver initialization error.

Gary Kroening explains:

 "For systems with a single PCI segment, it is sufficient to look for the
  bus number to change in order to determine that all of the CHa's have
  been counted for a single socket.

  However, for multi PCI segment systems, each socket is given a new
  segment and the bus number does NOT change.  So looking only for the
  bus number to change ends up counting all of the CHa's on all sockets
  in the system.  This leads to writing CPU MSRs beyond a valid range and
  causes an error in ivbep_uncore_msr_init_box()."

To fix this bug, query the number of CHAs from the CAPID6 register:
it should read bits 27:0 in the CAPID6 register located at
Device 30, Function 3, Offset 0x9C. These 28 bits form a bit vector
of available LLC slices and the CHAs that manage those slices.

Reported-by: Kroening, Gary <gary.kroening@hpe.com>
Tested-by: Kroening, Gary <gary.kroening@hpe.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: abanman@hpe.com
Cc: dimitri.sivanich@hpe.com
Cc: hpa@zytor.com
Cc: mike.travis@hpe.com
Cc: russ.anderson@hpe.com
Fixes: cd34cd97b7 ("perf/x86/intel/uncore: Add Skylake server uncore support")
Link: http://lkml.kernel.org/r/1520967094-13219-1-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:24:48 +02:00
Dan Carpenter
59dbc2a449 perf/x86/intel: Don't accidentally clear high bits in bdw_limit_period()
commit e5ea9b54a0 upstream.

We intended to clear the lowest 6 bits but because of a type bug we
clear the high 32 bits as well.  Andi says that periods are rarely more
than U32_MAX so this bug probably doesn't have a huge runtime impact.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 294fe0f52a ("perf/x86/intel: Add INST_RETIRED.ALL workarounds")
Link: http://lkml.kernel.org/r/20180317115216.GB4035@mwanda
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:24:48 +02:00
Kan Liang
d244e5897c perf/x86/intel/uncore: Fix Skylake UPI event format
commit 317660940f upstream.

There is no event extension (bit 21) for SKX UPI, so
use 'event' instead of 'event_ext'.

Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: cd34cd97b7 ("perf/x86/intel/uncore: Add Skylake server uncore support")
Link: http://lkml.kernel.org/r/1520004150-4855-1-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28 18:24:47 +02:00
Thomas Gleixner
6b800ce9ee perf/x86/intel: Plug memory leak in intel_pmu_init()
[ Upstream commit 7ad1437d6a ]

A recent commit introduced an extra merge_attr() call in the skylake
branch, which causes a memory leak.

Store the pointer to the extra allocated memory and free it at the end of
the function.

Fixes: a5df70c354 ("perf/x86: Only show format attributes when supported")
Reported-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-03 10:24:31 +01:00
Jia Zhang
325cbb04dc x86/cpu: Rename cpu_data.x86_mask to cpu_data.x86_stepping
commit b399151cb4 upstream.

x86_mask is a confusing name which is hard to associate with the
processor's stepping.

Additionally, correct an indent issue in lib/cpu.c.

Signed-off-by: Jia Zhang <qianyue.zj@alibaba-inc.com>
[ Updated it to more recent kernels. ]
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: tony.luck@intel.com
Link: http://lkml.kernel.org/r/1514771530-70829-1-git-send-email-qianyue.zj@alibaba-inc.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-22 15:42:24 +01:00
Xiao Liang
34c1acc2f7 perf/x86/amd/power: Do not load AMD power module on !AMD platforms
commit 40d4071ce2 upstream.

The AMD power module can be loaded on non AMD platforms, but unload fails
with the following Oops:

 BUG: unable to handle kernel NULL pointer dereference at           (null)
 IP: __list_del_entry_valid+0x29/0x90
 Call Trace:
  perf_pmu_unregister+0x25/0xf0
  amd_power_pmu_exit+0x1c/0xd23 [power]
  SyS_delete_module+0x1a8/0x2b0
  ? exit_to_usermode_loop+0x8f/0xb0
  entry_SYSCALL_64_fastpath+0x20/0x83

Return -ENODEV instead of 0 from the module init function if the CPU does
not match.

Fixes: c7ab62bfbe ("perf/x86/amd/power: Add AMD accumulated power reporting mechanism")
Signed-off-by: Xiao Liang <xiliang@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20180122061252.6394-1-xiliang@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-31 14:03:49 +01:00
Peter Zijlstra
6d8b7d3934 x86,perf: Disable intel_bts when PTI
commit 99a9dc98ba upstream.

The intel_bts driver does not use the 'normal' BTS buffer which is exposed
through the cpu_entry_area but instead uses the memory allocated for the
perf AUX buffer.

This obviously comes apart when using PTI because then the kernel mapping;
which includes that AUX buffer memory; disappears. Fixing this requires to
expose a mapping which is visible in all context and that's not trivial.

As a quick fix disable this driver when PTI is enabled to prevent
malfunction.

Fixes: 385ce0ea4c ("x86/mm/pti: Add Kconfig")
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Reported-by: Robert Święcki <robert@swiecki.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: greg@kroah.com
Cc: hughd@google.com
Cc: luto@amacapital.net
Cc: Vince Weaver <vince@deater.net>
Cc: torvalds@linux-foundation.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180114102713.GB6166@worktop.programming.kicks-ass.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:45:30 +01:00
Peter Zijlstra
b4660939af x86/events/intel/ds: Use the proper cache flush method for mapping ds buffers
commit 42f3bdc5dd upstream.

Thomas reported the following warning:

 BUG: using smp_processor_id() in preemptible [00000000] code: ovsdb-server/4498
 caller is native_flush_tlb_single+0x57/0xc0
 native_flush_tlb_single+0x57/0xc0
 __set_pte_vaddr+0x2d/0x40
 set_pte_vaddr+0x2f/0x40
 cea_set_pte+0x30/0x40
 ds_update_cea.constprop.4+0x4d/0x70
 reserve_ds_buffers+0x159/0x410
 x86_reserve_hardware+0x150/0x160
 x86_pmu_event_init+0x3e/0x1f0
 perf_try_init_event+0x69/0x80
 perf_event_alloc+0x652/0x740
 SyS_perf_event_open+0x3f6/0xd60
 do_syscall_64+0x5c/0x190

set_pte_vaddr is used to map the ds buffers into the cpu entry area, but
there are two problems with that:

 1) The resulting flush is not supposed to be called in preemptible context

 2) The cpu entry area is supposed to be per CPU, but the debug store
    buffers are mapped for all CPUs so these mappings need to be flushed
    globally.

Add the necessary preemption protection across the mapping code and flush
TLBs globally.

Fixes: c1961a4631 ("x86/events/intel/ds: Map debug buffers in cpu_entry_area")
Reported-by: Thomas Zeitlhofer <thomas.zeitlhofer+lkml@ze-it.at>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Thomas Zeitlhofer <thomas.zeitlhofer+lkml@ze-it.at>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Link: https://lkml.kernel.org/r/20180104170712.GB3040@hirez.programming.kicks-ass.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-10 09:31:16 +01:00
Hugh Dickins
8b82023b7f x86/events/intel/ds: Map debug buffers in cpu_entry_area
commit c1961a4631 upstream.

The BTS and PEBS buffers both have their virtual addresses programmed into
the hardware.  This means that any access to them is performed via the page
tables.  The times that the hardware accesses these are entirely dependent
on how the performance monitoring hardware events are set up.  In other
words, there is no way for the kernel to tell when the hardware might
access these buffers.

To avoid perf crashes, place 'debug_store' allocate pages and map them into
the cpu_entry_area.

The PEBS fixup buffer does not need this treatment.

[ tglx: Got rid of the kaiser_add_mapping() complication ]

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: keescook@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:30:59 +01:00
Thomas Gleixner
e0eb34665d x86/cpu_entry_area: Add debugstore entries to cpu_entry_area
commit 10043e02db upstream.

The Intel PEBS/BTS debug store is a design trainwreck as it expects virtual
addresses which must be visible in any execution context.

So it is required to make these mappings visible to user space when kernel
page table isolation is active.

Provide enough room for the buffer mappings in the cpu_entry_area so the
buffers are available in the user space visible page tables.

At the point where the kernel side entry area is populated there is no
buffer available yet, but the kernel PMD must be populated. To achieve this
set the entries for these buffers to non present.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:30:59 +01:00
Will Deacon
5383f45db3 locking/barriers: Convert users of lockless_dereference() to READ_ONCE()
commit 3382290ed2 upstream.

[ Note, this is a Git cherry-pick of the following commit:

    506458efaf ("locking/barriers: Convert users of lockless_dereference() to READ_ONCE()")

  ... for easier x86 PTI code testing and back-porting. ]

READ_ONCE() now has an implicit smp_read_barrier_depends() call, so it
can be used instead of lockless_dereference() without any change in
semantics.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1508840570-22169-4-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-25 14:26:21 +01:00
Andi Kleen
2d8c24ed93 perf/x86: Enable free running PEBS for REGS_USER/INTR
commit 2fe1bc1f50 upstream.

[ Note, this is a Git cherry-pick of the following commit:

    a47ba4d77e ("perf/x86: Enable free running PEBS for REGS_USER/INTR")

  ... for easier x86 PTI code testing and back-porting. ]

Currently free running PEBS is disabled when user or interrupt
registers are requested. Most of the registers are actually
available in the PEBS record and can be supported.

So we just need to check for the supported registers and then
allow it: it is all except for the segment register.

For user registers this only works when the counter is limited
to ring 3 only, so this also needs to be checked.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170831214630.21892-1-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-25 14:26:20 +01:00
Andi Kleen
c35be33486 perf/x86/intel: Hide TSX events when RTM is not supported
commit 58ba4d5a25 upstream.

0day testing reported a perf test regression on Haswell systems without
RTM. Commit a5df70c35 hides the in_tx/in_tx_cp attributes when RTM is not
available, but the TSX events are still available in sysfs. Due to the
missing attributes the event parser fails on those files.

Don't show the TSX events in sysfs when RTM is not available on
Haswell/Broadwell/Skylake.

Fixes: a5df70c354 (perf/x86: Only show format attributes when supported)
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Tested-by: Jin Yao <yao.jin@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20171109000718.14137-1-andi@firstfloor.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-30 08:40:40 +00:00
Linus Torvalds
ead751507d Merge tag 'spdx_identifiers-4.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull initial SPDX identifiers from Greg KH:
 "License cleanup: add SPDX license identifiers to some files

  Many source files in the tree are missing licensing information, which
  makes it harder for compliance tools to determine the correct license.

  By default all files without license information are under the default
  license of the kernel, which is GPL version 2.

  Update the files which contain no license information with the
  'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally
  binding shorthand, which can be used instead of the full boiler plate
  text.

  This patch is based on work done by Thomas Gleixner and Kate Stewart
  and Philippe Ombredanne.

  How this work was done:

  Patches were generated and checked against linux-4.14-rc6 for a subset
  of the use cases:

   - file had no licensing information it it.

   - file was a */uapi/* one with no licensing information in it,

   - file was a */uapi/* one with existing licensing information,

  Further patches will be generated in subsequent months to fix up cases
  where non-standard license headers were used, and references to
  license had to be inferred by heuristics based on keywords.

  The analysis to determine which SPDX License Identifier to be applied
  to a file was done in a spreadsheet of side by side results from of
  the output of two independent scanners (ScanCode & Windriver)
  producing SPDX tag:value files created by Philippe Ombredanne.
  Philippe prepared the base worksheet, and did an initial spot review
  of a few 1000 files.

  The 4.13 kernel was the starting point of the analysis with 60,537
  files assessed. Kate Stewart did a file by file comparison of the
  scanner results in the spreadsheet to determine which SPDX license
  identifier(s) to be applied to the file. She confirmed any
  determination that was not immediately clear with lawyers working with
  the Linux Foundation.

  Criteria used to select files for SPDX license identifier tagging was:

   - Files considered eligible had to be source code files.

   - Make and config files were included as candidates if they contained
     >5 lines of source

   - File already had some variant of a license header in it (even if <5
     lines).

  All documentation files were explicitly excluded.

  The following heuristics were used to determine which SPDX license
  identifiers to apply.

   - when both scanners couldn't find any license traces, file was
     considered to have no license information in it, and the top level
     COPYING file license applied.

     For non */uapi/* files that summary was:

       SPDX license identifier                            # files
       ---------------------------------------------------|-------
       GPL-2.0                                              11139

     and resulted in the first patch in this series.

     If that file was a */uapi/* path one, it was "GPL-2.0 WITH
     Linux-syscall-note" otherwise it was "GPL-2.0". Results of that
     was:

       SPDX license identifier                            # files
       ---------------------------------------------------|-------
       GPL-2.0 WITH Linux-syscall-note                        930

     and resulted in the second patch in this series.

   - if a file had some form of licensing information in it, and was one
     of the */uapi/* ones, it was denoted with the Linux-syscall-note if
     any GPL family license was found in the file or had no licensing in
     it (per prior point). Results summary:

       SPDX license identifier                            # files
       ---------------------------------------------------|------
       GPL-2.0 WITH Linux-syscall-note                       270
       GPL-2.0+ WITH Linux-syscall-note                      169
       ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
       ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
       LGPL-2.1+ WITH Linux-syscall-note                      15
       GPL-1.0+ WITH Linux-syscall-note                       14
       ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
       LGPL-2.0+ WITH Linux-syscall-note                       4
       LGPL-2.1 WITH Linux-syscall-note                        3
       ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
       ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

     and that resulted in the third patch in this series.

   - when the two scanners agreed on the detected license(s), that
     became the concluded license(s).

   - when there was disagreement between the two scanners (one detected
     a license but the other didn't, or they both detected different
     licenses) a manual inspection of the file occurred.

   - In most cases a manual inspection of the information in the file
     resulted in a clear resolution of the license that should apply
     (and which scanner probably needed to revisit its heuristics).

   - When it was not immediately clear, the license identifier was
     confirmed with lawyers working with the Linux Foundation.

   - If there was any question as to the appropriate license identifier,
     the file was flagged for further research and to be revisited later
     in time.

  In total, over 70 hours of logged manual review was done on the
  spreadsheet to determine the SPDX license identifiers to apply to the
  source files by Kate, Philippe, Thomas and, in some cases,
  confirmation by lawyers working with the Linux Foundation.

  Kate also obtained a third independent scan of the 4.13 code base from
  FOSSology, and compared selected files where the other two scanners
  disagreed against that SPDX file, to see if there was new insights.
  The Windriver scanner is based on an older version of FOSSology in
  part, so they are related.

  Thomas did random spot checks in about 500 files from the spreadsheets
  for the uapi headers and agreed with SPDX license identifier in the
  files he inspected. For the non-uapi files Thomas did random spot
  checks in about 15000 files.

  In initial set of patches against 4.14-rc6, 3 files were found to have
  copy/paste license identifier errors, and have been fixed to reflect
  the correct identifier.

  Additionally Philippe spent 10 hours this week doing a detailed manual
  inspection and review of the 12,461 patched files from the initial
  patch version early this week with:

   - a full scancode scan run, collecting the matched texts, detected
     license ids and scores

   - reviewing anything where there was a license detected (about 500+
     files) to ensure that the applied SPDX license was correct

   - reviewing anything where there was no detection but the patch
     license was not GPL-2.0 WITH Linux-syscall-note to ensure that the
     applied SPDX license was correct

  This produced a worksheet with 20 files needing minor correction. This
  worksheet was then exported into 3 different .csv files for the
  different types of files to be modified.

  These .csv files were then reviewed by Greg. Thomas wrote a script to
  parse the csv files and add the proper SPDX tag to the file, in the
  format that the file expected. This script was further refined by Greg
  based on the output to detect more types of files automatically and to
  distinguish between header and source .c files (which need different
  comment types.) Finally Greg ran the script using the .csv files to
  generate the patches.

  Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
  Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
  Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
  Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"

* tag 'spdx_identifiers-4.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  License cleanup: add SPDX license identifier to uapi header files with a license
  License cleanup: add SPDX license identifier to uapi header files with no license
  License cleanup: add SPDX GPL-2.0 license identifier to files with no license
2017-11-02 10:04:46 -07:00
Greg Kroah-Hartman
b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Alexander Shishkin
2eece390bf perf/x86/intel/bts: Fix exclusive event reference leak
Commit:

  d2878d642a ("perf/x86/intel/bts: Disallow use by unprivileged users on paranoid systems")

... adds a privilege check in the exactly wrong place in the event init path:
after the 'LBR exclusive' reference has been taken, and doesn't release it
in the case of insufficient privileges. After this, nobody in the system
gets to use PT or LBR afterwards.

This patch moves the privilege check to where it should have been in the
first place.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: d2878d642a ("perf/x86/intel/bts: Disallow use by unprivileged users on paranoid systems")
Link: http://lkml.kernel.org/r/20171023123533.16973-1-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-24 13:19:27 +02:00
Linus Torvalds
26c923ab19 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Some tooling fixes plus three kernel fixes: a memory leak fix, a
  statistics fix and a crash fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/uncore: Fix memory leaks on allocation failures
  perf/core: Fix cgroup time when scheduling descendants
  perf/core: Avoid freeing static PMU contexts when PMU is unregistered
  tools include uapi bpf.h: Sync kernel ABI header with tooling header
  perf pmu: Unbreak perf record for arm/arm64 with events with explicit PMU
  perf script: Add missing separator for "-F ip,brstack" (and brstackoff)
  perf callchain: Compare dsos (as well) for CCKEY_FUNCTION
2017-10-14 15:16:49 -04:00
Colin Ian King
629eb703d3 perf/x86/intel/uncore: Fix memory leaks on allocation failures
Currently if an allocation fails then the error return paths
don't free up any currently allocated pmus[].boxes and pmus causing
a memory leak.  Add an error clean up exit path that frees these
objects.

Detected by CoverityScan, CID#711632 ("Resource Leak")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-janitors@vger.kernel.org
Fixes: 087bfbb032 ("perf/x86: Add generic Intel uncore PMU support")
Link: http://lkml.kernel.org/r/20171009172655.6132-1-colin.king@canonical.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-10 12:51:07 +02:00
Linus Torvalds
27efed3e83 Merge branch 'core-watchdog-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull watchddog clean-up and fixes from Thomas Gleixner:
 "The watchdog (hard/softlockup detector) code is pretty much broken in
  its current state. The patch series addresses this by removing all
  duct tape and refactoring it into a workable state.

  The reasons why I ask for inclusion that late in the cycle are:

   1) The code causes lockdep splats vs. hotplug locking which get
      reported over and over. Unfortunately there is no easy fix.

   2) The risk of breakage is minimal because it's already broken

   3) As 4.14 is a long term stable kernel, I prefer to have working
      watchdog code in that and the lockdep issues resolved. I wouldn't
      ask you to pull if 4.14 wouldn't be a LTS kernel or if the
      solution would be easy to backport.

   4) The series was around before the merge window opened, but then got
      delayed due to the UP failure caused by the for_each_cpu()
      surprise which we discussed recently.

  Changes vs. V1:

   - Addressed your review points

   - Addressed the warning in the powerpc code which was discovered late

   - Changed two function names which made sense up to a certain point
     in the series. Now they match what they do in the end.

   - Fixed a 'unused variable' warning, which got not detected by the
     intel robot. I triggered it when trying all possible related config
     combinations manually. Randconfig testing seems not random enough.

  The changes have been tested by and reviewed by Don Zickus and tested
  and acked by Micheal Ellerman for powerpc"

* 'core-watchdog-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
  watchdog/core: Put softlockup_threads_initialized under ifdef guard
  watchdog/core: Rename some softlockup_* functions
  powerpc/watchdog: Make use of watchdog_nmi_probe()
  watchdog/core, powerpc: Lock cpus across reconfiguration
  watchdog/core, powerpc: Replace watchdog_nmi_reconfigure()
  watchdog/hardlockup/perf: Fix spelling mistake: "permanetely" -> "permanently"
  watchdog/hardlockup/perf: Cure UP damage
  watchdog/hardlockup: Clean up hotplug locking mess
  watchdog/hardlockup/perf: Simplify deferred event destroy
  watchdog/hardlockup/perf: Use new perf CPU enable mechanism
  watchdog/hardlockup/perf: Implement CPU enable replacement
  watchdog/hardlockup/perf: Implement init time detection of perf
  watchdog/hardlockup/perf: Implement init time perf validation
  watchdog/core: Get rid of the racy update loop
  watchdog/core, powerpc: Make watchdog_nmi_reconfigure() two stage
  watchdog/sysctl: Clean up sysctl variable name space
  watchdog/sysctl: Get rid of the #ifdeffery
  watchdog/core: Clean up header mess
  watchdog/core: Further simplify sysctl handling
  watchdog/core: Get rid of the thread teardown/setup dance
  ...
2017-10-06 08:36:41 -07:00
Kan Liang
29b46dfb13 perf/x86/intel/uncore: Correct num_boxes for IIO and IRP
There are 6 IIO/IRP boxes for CBDMA, PCIe0-2, MCP 0 and MCP 1
separately. Correct the num_boxes.

Signed-off-by: Kan Liang <Kan.liang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ak@linux.intel.com
Cc: peterz@infradead.org
Cc: eranian@google.com
Cc: acme@kernel.org
Link: http://lkml.kernel.org/r/1505149816-12580-1-git-send-email-kan.liang@intel.com
2017-09-25 12:43:56 +02:00
Kan Liang
450a978935 perf/x86/intel/rapl: Add missing CPU IDs
DENVERTON and GEMINI_LAKE support same RAPL counters as Apollo Lake.

Signed-off-by: Kan Liang <Kan.liang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ak@linux.intel.com
Cc: peterz@infradead.org
Cc: piotr.luc@intel.com
Cc: harry.pan@intel.com
Cc: srinivas.pandruvada@linux.intel.com
Link: http://lkml.kernel.org/r/20170908213449.6224-3-kan.liang@intel.com
2017-09-25 09:36:17 +02:00
Kan Liang
1aaccc40a1 perf/x86/msr: Add missing CPU IDs
Goldmont, Glodmont plus and Xeon Phi have MSR_SMI_COUNT as well.

Signed-off-by: Kan Liang <Kan.liang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ak@linux.intel.com
Cc: peterz@infradead.org
Cc: piotr.luc@intel.com
Cc: harry.pan@intel.com
Cc: srinivas.pandruvada@linux.intel.com
Link: http://lkml.kernel.org/r/20170908213449.6224-2-kan.liang@intel.com
2017-09-25 09:36:17 +02:00
Kan Liang
b09c146f8f perf/x86/intel/cstate: Add missing CPU IDs
Skylake server uses the same C-state residency events as Sandy Bridge.

Denverton and Gemini lake use the same C-state residency events as
Apollo Lake.

Signed-off-by: Kan Liang <Kan.liang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ak@linux.intel.com
Cc: peterz@infradead.org
Cc: piotr.luc@intel.com
Cc: harry.pan@intel.com
Cc: srinivas.pandruvada@linux.intel.com
Link: http://lkml.kernel.org/r/20170908213449.6224-1-kan.liang@intel.com
2017-09-25 09:36:17 +02:00
Peter Zijlstra
2406e3b166 perf/x86/intel, watchdog/core: Sanitize PMU HT bug workaround
The lockup_detector_suspend/resume() interface is broken in several ways
especially as it results in recursive locking of the CPU hotplug lock.

Use the new stop/restart interface in the perf NMI watchdog to temporarily
disable and reenable the already active watchdog events. That's enough to
handle it.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194146.247141871@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14 11:41:03 +02:00
Linus Torvalds
f57091767a Merge branch 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cache quality monitoring update from Thomas Gleixner:
 "This update provides a complete rewrite of the Cache Quality
  Monitoring (CQM) facility.

  The existing CQM support was duct taped into perf with a lot of issues
  and the attempts to fix those turned out to be incomplete and
  horrible.

  After lengthy discussions it was decided to integrate the CQM support
  into the Resource Director Technology (RDT) facility, which is the
  obvious choise as in hardware CQM is part of RDT. This allowed to add
  Memory Bandwidth Monitoring support on top.

  As a result the mechanisms for allocating cache/memory bandwidth and
  the corresponding monitoring mechanisms are integrated into a single
  management facility with a consistent user interface"

* 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
  x86/intel_rdt: Turn off most RDT features on Skylake
  x86/intel_rdt: Add command line options for resource director technology
  x86/intel_rdt: Move special case code for Haswell to a quirk function
  x86/intel_rdt: Remove redundant ternary operator on return
  x86/intel_rdt/cqm: Improve limbo list processing
  x86/intel_rdt/mbm: Fix MBM overflow handler during CPU hotplug
  x86/intel_rdt: Modify the intel_pqr_state for better performance
  x86/intel_rdt/cqm: Clear the default RMID during hotcpu
  x86/intel_rdt: Show bitmask of shareable resource with other executing units
  x86/intel_rdt/mbm: Handle counter overflow
  x86/intel_rdt/mbm: Add mbm counter initialization
  x86/intel_rdt/mbm: Basic counting of MBM events (total and local)
  x86/intel_rdt/cqm: Add CPU hotplug support
  x86/intel_rdt/cqm: Add sched_in support
  x86/intel_rdt: Introduce rdt_enable_key for scheduling
  x86/intel_rdt/cqm: Add mount,umount support
  x86/intel_rdt/cqm: Add rmdir support
  x86/intel_rdt: Separate the ctrl bits from rmdir
  x86/intel_rdt/cqm: Add mon_data
  x86/intel_rdt: Prepare for RDT monitor data support
  ...
2017-09-04 13:56:37 -07:00
Linus Torvalds
9657752cb5 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Kernel side changes:

   - Add branch type profiling/tracing support. (Jin Yao)

   - Add the PERF_SAMPLE_PHYS_ADDR ABI to allow the tracing/profiling of
     physical memory addresses, where the PMU supports it. (Kan Liang)

   - Export some PMU capability details in the new
     /sys/bus/event_source/devices/cpu/caps/ sysfs directory. (Andi
     Kleen)

   - Aux data fixes and updates (Will Deacon)

   - kprobes fixes and updates (Masami Hiramatsu)

   - AMD uncore PMU driver fixes and updates (Janakarajan Natarajan)

  On the tooling side, here's a (limited!) list of highlights - there
  were many other changes that I could not list, see the shortlog and
  git history for details:

  UI improvements:

   - Implement a visual marker for fused x86 instructions in the
     annotate TUI browser, available now in 'perf report', more work
     needed to have it available as well in 'perf top' (Jin Yao)

     Further explanation from one of Jin's patches:

             │   ┌──cmpl   $0x0,argp_program_version_hook
       81.93 │   ├──je     20
             │   │  lock   cmpxchg %esi,0x38a9a4(%rip)
             │   │↓ jne    29
             │   │↓ jmp    43
       11.47 │20:└─→cmpxch %esi,0x38a999(%rip)

     That means the cmpl+je is a fused instruction pair and they should
     be considered together.

   - Record the branch type and then show statistics and info about in
     callchain entries (Jin Yao)

     Example from one of Jin's patches:

        # perf record -g -j any,save_type
        # perf report --branch-history --stdio --no-children

        38.50%  div.c:45                [.] main                    div
                |
                ---main div.c:42 (RET CROSS_2M cycles:2)
                   compute_flag div.c:28 (cycles:2)
                   compute_flag div.c:27 (RET CROSS_2M cycles:1)
                   rand rand.c:28 (cycles:1)
                   rand rand.c:28 (RET CROSS_2M cycles:1)
                   __random random.c:298 (cycles:1)
                   __random random.c:297 (COND_BWD CROSS_2M cycles:1)
                   __random random.c:295 (cycles:1)
                   __random random.c:295 (COND_BWD CROSS_2M cycles:1)
                   __random random.c:295 (cycles:1)
                   __random random.c:295 (RET CROSS_2M cycles:9)

  namespaces support:

   - Add initial support for namespaces, using setns to access files in
     namespaces, grabbing their build-ids, etc. (Krister Johansen)

  perf trace enhancements:

   - Beautify pkey_{alloc,free,mprotect} arguments in 'perf trace'
     (Arnaldo Carvalho de Melo)

   - Add initial 'clone' syscall args beautifier in 'perf trace'
     (Arnaldo Carvalho de Melo)

   - Ignore 'fd' and 'offset' args for MAP_ANONYMOUS in 'perf trace'
     (Arnaldo Carvalho de Melo)

   - Beautifiers for the 'cmd' arg of several ioctl types, including:
     sound, DRM, KVM, vhost virtio and perf_events. (Arnaldo Carvalho de
     Melo)

   - Add PERF_SAMPLE_CALLCHAIN and PERF_RECORD_MMAP[2] to 'perf data'
     CTF conversion, allowing CTF trace visualization tools to show
     callchains and to resolve symbols (Geneviève Bastien)

   - Beautify the fcntl syscall, which is an interesting one in the
     sense that infrastructure had to be put in place to change the
     formatters of some arguments according to the value in a previous
     one, i.e. cmd dictates how arg and the syscall return will be
     formatted. (Arnaldo Carvalho de Melo

  perf stat enhancements:

   - Use group read for event groups in 'perf stat', reducing overhead
     when groups are defined in the event specification, i.e. when using
     {} to enclose a list of events, asking them to be read at the same
     time, e.g.: "perf stat -e '{cycles,instructions}'" (Jiri Olsa)

  pipe mode improvements:

   - Process tracing data in 'perf annotate' pipe mode (David
     Carrillo-Cisneros)

   - Add header record types to pipe-mode, now this command:

        $ perf record -o - -e cycles sleep 1 | perf report --stdio --header

     Will show the same as in non-pipe mode, i.e. involving a perf.data
     file (David Carrillo-Cisneros)

  Vendor specific hardware event support updates/enhancements:

   - Update POWER9 vendor events tables (Sukadev Bhattiprolu)

   - Add POWER9 PMU events Sukadev (Bhattiprolu)

   - Support additional POWER8+ PVR in PMU mapfile (Shriya)

   - Add Skylake server uncore JSON vendor events (Andi Kleen)

   - Support exporting Intel PT data to sqlite3 with python perf
     scripts, this is in addition to the postgresql support that was
     already there (Adrian Hunter)"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (253 commits)
  perf symbols: Fix plt entry calculation for ARM and AARCH64
  perf probe: Fix kprobe blacklist checking condition
  perf/x86: Fix caps/ for !Intel
  perf/core, x86: Add PERF_SAMPLE_PHYS_ADDR
  perf/core, pt, bts: Get rid of itrace_started
  perf trace beauty: Beautify pkey_{alloc,free,mprotect} arguments
  tools headers: Sync cpu features kernel ABI headers with tooling headers
  perf tools: Pass full path of FEATURES_DUMP
  perf tools: Robustify detection of clang binary
  tools lib: Allow external definition of CC, AR and LD
  perf tools: Allow external definition of flex and bison binary names
  tools build tests: Don't hardcode gcc name
  perf report: Group stat values on global event id
  perf values: Zero value buffers
  perf values: Fix allocation check
  perf values: Fix thread index bug
  perf report: Add dump_read function
  perf record: Set read_format for inherit_stat
  perf c2c: Fix remote HITM detection for Skylake
  perf tools: Fix static build with newer toolchains
  ...
2017-09-04 08:39:02 -07:00
Peter Zijlstra
5da382eb6e perf/x86: Fix caps/ for !Intel
Move the 'max_precise' capability into generic x86 code where it
belongs. This fixes a sysfs splat on !Intel systems where we fail to set
x86_pmu_caps_group.atts.

Reported-and-tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: hpa@zytor.com
Fixes: 22688d1c20f5 ("x86/perf: Export some PMU attributes in caps/ directory")
Link: http://lkml.kernel.org/r/20170828104650.2u3rsim4jafyjzv2@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 15:09:25 +02:00
Kan Liang
fc7ce9c74c perf/core, x86: Add PERF_SAMPLE_PHYS_ADDR
For understanding how the workload maps to memory channels and hardware
behavior, it's very important to collect address maps with physical
addresses. For example, 3D XPoint access can only be found by filtering
the physical address.

Add a new sample type for physical address.

perf already has a facility to collect data virtual address. This patch
introduces a function to convert the virtual address to physical address.
The function is quite generic and can be extended to any architecture as
long as a virtual address is provided.

 - For kernel direct mapping addresses, virt_to_phys is used to convert
   the virtual addresses to physical address.

 - For user virtual addresses, __get_user_pages_fast is used to walk the
   pages tables for user physical address.

 - This does not work for vmalloc addresses right now. These are not
   resolved, but code to do that could be added.

The new sample type requires collecting the virtual address. The
virtual address will not be output unless SAMPLE_ADDR is applied.

For security, the physical address can only be exposed to root or
privileged user.

Tested-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Cc: mpe@ellerman.id.au
Link: http://lkml.kernel.org/r/1503967969-48278-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 15:09:25 +02:00
Alexander Shishkin
8d4e6c4caa perf/core, pt, bts: Get rid of itrace_started
I just noticed that hw.itrace_started and hw.config are aliased to the
same location. Now, the PT driver happens to use both, which works out
fine by sheer luck:

 - STORE(hw.itrace_start) is ordered before STORE(hw.config), in the
    program order, although there are no compiler barriers to ensure that,

 - to the perf_log_itrace_start() hw.itrace_start looks set at the same
   time as when it is intended to be set because both stores happen in the
   same path,

 - hw.config is never reset to zero in the PT driver.

Now, the use of hw.config by the PT driver makes more sense (it being a
HW PMU) than messing around with itrace_started, which is an awkward API
to begin with.

This patch replaces hw.itrace_started with an attach_state bit and an
API call for the PMU drivers to use to communicate the condition.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20170330153956.25994-1-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 15:09:24 +02:00
Dan Carpenter
eaa2f87c6b x86/ldt: Fix off by one in get_segment_base()
ldt->entries[] is allocated in alloc_ldt_struct().  It has
ldt->nr_entries elements and ldt->nr_entries is capped at LDT_ENTRIES.
So if "idx" is == ldt->nr_entries then we're reading beyond the end of
the buffer.  It seems duplicative to have two limit checks when one
would work just as well so I removed the check against LDT_ENTRIES.

The gdt_page.gdt[] array has GDT_ENTRIES entries.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-janitors@vger.kernel.org
Fixes: d07bdfd322 ("perf/x86: Fix USER/KERNEL tagging of samples properly")
Link: http://lkml.kernel.org/r/20170818102516.gqwm4xdvvuvjw5ho@mwanda
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 11:55:15 +02:00