Commit Graph

176 Commits

Author SHA1 Message Date
James Morse
13e5769deb x86/resctrl: Make resctrl_mounted checks explicit
The rdt_enable_key is switched when resctrl is mounted, and used to prevent
a second mount of the filesystem. It also enables the architecture's context
switch code.

This requires another architecture to have the same set of static keys, as
resctrl depends on them too. The existing users of these static keys are
implicitly also checking if the filesystem is mounted.

Make the resctrl_mounted checks explicit: resctrl can keep track of whether it
has been mounted once. This doesn't need to be combined with whether the arch
code is context switching the CLOSID.

rdt_mon_enable_key is never used just to test that resctrl is mounted, but does
also have this implication. Add a resctrl_mounted to all uses of
rdt_mon_enable_key.

This will allow the static key changing to be moved behind resctrl_arch_ calls.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-17-james.morse@arm.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-16 19:18:32 +01:00
James Morse
6eca639d83 x86/resctrl: Move CLOSID/RMID matching and setting to use helpers
When switching tasks, the CLOSID and RMID that the new task should use
are stored in struct task_struct. For x86 the CLOSID known by resctrl,
the value in task_struct, and the value written to the CPU register are
all the same thing.

MPAM's CPU interface has two different PARTIDs - one for data accesses
the other for instruction fetch. Storing resctrl's CLOSID value in
struct task_struct implies the arch code knows whether resctrl is using
CDP.

Move the matching and setting of the struct task_struct properties to
use helpers. This allows arm64 to store the hardware format of the
register, instead of having to convert it each time.

__rdtgroup_move_task()s use of READ_ONCE()/WRITE_ONCE() ensures torn
values aren't seen as another CPU may schedule the task being moved
while the value is being changed. MPAM has an additional corner-case
here as the PMG bits extend the PARTID space.

If the scheduler sees a new-CLOSID but old-RMID, the task will dirty an
RMID that the limbo code is not watching causing an inaccurate count.

x86's RMID are independent values, so the limbo code will still be
watching the old-RMID in this circumstance.

To avoid this, arm64 needs both the CLOSID/RMID WRITE_ONCE()d together.
Both values must be provided together.

Because MPAM's RMID values are not unique, the CLOSID must be provided
when matching the RMID.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-12-james.morse@arm.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-16 19:18:32 +01:00
James Morse
6eac36bb9e x86/resctrl: Allocate the cleanest CLOSID by searching closid_num_dirty_rmid
MPAM's PMG bits extend its PARTID space, meaning the same PMG value can be used
for different control groups.

This means once a CLOSID is allocated, all its monitoring ids may still be
dirty, and held in limbo.

Instead of allocating the first free CLOSID, on architectures where
CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID is enabled, search
closid_num_dirty_rmid[] to find the cleanest CLOSID.

The CLOSID found is returned to closid_alloc() for the free list
to be updated.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-11-james.morse@arm.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-16 19:18:32 +01:00
James Morse
5d920b6881 x86/resctrl: Use __set_bit()/__clear_bit() instead of open coding
The resctrl CLOSID allocator uses a single 32bit word to track which
CLOSID are free. The setting and clearing of bits is open coded.

Convert the existing open coded bit manipulations of closid_free_map
to use __set_bit() and friends. These don't need to be atomic as this
list is protected by the mutex.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-10-james.morse@arm.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-16 19:18:31 +01:00
James Morse
c4c0376eef x86/resctrl: Allow RMID allocation to be scoped by CLOSID
MPAMs RMID values are not unique unless the CLOSID is considered as well.

alloc_rmid() expects the RMID to be an independent number.

Pass the CLOSID in to alloc_rmid(). Use this to compare indexes when
allocating. If the CLOSID is not relevant to the index, this ends up comparing
the free RMID with itself, and the first free entry will be used. With MPAM the
CLOSID is included in the index, so this becomes a walk of the free RMID
entries, until one that matches the supplied CLOSID is found.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-8-james.morse@arm.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-16 19:18:31 +01:00
James Morse
6791e0ea30 x86/resctrl: Access per-rmid structures by index
x86 systems identify traffic using the CLOSID and RMID. The CLOSID is
used to lookup the control policy, the RMID is used for monitoring. For
x86 these are independent numbers.
Arm's MPAM has equivalent features PARTID and PMG, where the PARTID is
used to lookup the control policy. The PMG in contrast is a small number
of bits that are used to subdivide PARTID when monitoring. The
cache-occupancy monitors require the PARTID to be specified when
monitoring.

This means MPAM's PMG field is not unique. There are multiple PMG-0, one
per allocated CLOSID/PARTID. If PMG is treated as equivalent to RMID, it
cannot be allocated as an independent number. Bitmaps like rmid_busy_llc
need to be sized by the number of unique entries for this resource.

Treat the combined CLOSID and RMID as an index, and provide architecture
helpers to pack and unpack an index. This makes the MPAM values unique.
The domain's rmid_busy_llc and rmid_ptrs[] are then sized by index, as
are domain mbm_local[] and mbm_total[].

x86 can ignore the CLOSID field when packing and unpacking an index, and
report as many indexes as RMID.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-7-james.morse@arm.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-16 19:18:31 +01:00
James Morse
40fc735b78 x86/resctrl: Track the closid with the rmid
x86's RMID are independent of the CLOSID. An RMID can be allocated,
used and freed without considering the CLOSID.

MPAM's equivalent feature is PMG, which is not an independent number,
it extends the CLOSID/PARTID space. For MPAM, only PMG-bits worth of
'RMID' can be allocated for a single CLOSID.
i.e. if there is 1 bit of PMG space, then each CLOSID can have two
monitor groups.

To allow resctrl to disambiguate RMID values for different CLOSID,
everything in resctrl that keeps an RMID value needs to know the CLOSID
too. This will always be ignored on x86.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Xin Hao <xhao@linux.alibaba.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-6-james.morse@arm.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-16 19:18:31 +01:00
James Morse
311639e951 x86/resctrl: Move RMID allocation out of mkdir_rdt_prepare()
RMIDs are allocated for each monitor or control group directory, because
each of these needs its own RMID. For control groups,
rdtgroup_mkdir_ctrl_mon() later goes on to allocate the CLOSID.

MPAM's equivalent of RMID is not an independent number, so can't be
allocated until the CLOSID is known. An RMID allocation for one CLOSID
may fail, whereas another may succeed depending on how many monitor
groups a control group has.

The RMID allocation needs to move to be after the CLOSID has been
allocated.

Move the RMID allocation out of mkdir_rdt_prepare() to occur in its caller,
after the mkdir_rdt_prepare() call. This allows the RMID allocator to
know the CLOSID.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-5-james.morse@arm.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-16 19:18:31 +01:00
James Morse
b1de313979 x86/resctrl: Create helper for RMID allocation and mondata dir creation
When monitoring is supported, each monitor and control group is allocated an
RMID. For control groups, rdtgroup_mkdir_ctrl_mon() later goes on to allocate
the CLOSID.

MPAM's equivalent of RMID are not an independent number, so can't be allocated
until the CLOSID is known. An RMID allocation for one CLOSID may fail, whereas
another may succeed depending on how many monitor groups a control group has.

The RMID allocation needs to move to be after the CLOSID has been allocated.

Move the RMID allocation and mondata dir creation to a helper.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-4-james.morse@arm.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-02-16 19:18:31 +01:00
Babu Moger
fc747eebef x86/resctrl: Remove redundant variable in mbm_config_write_domain()
The kernel test robot reported the following warning after commit

  54e35eb861 ("x86/resctrl: Read supported bandwidth sources from CPUID").

even though the issue is present even in the original commit

  92bd5a1390 ("x86/resctrl: Add interface to write mbm_total_bytes_config")

which added this function. The reported warning is:

  $ make C=1 CHECK=scripts/coccicheck arch/x86/kernel/cpu/resctrl/rdtgroup.o
  ...
  arch/x86/kernel/cpu/resctrl/rdtgroup.c:1621:5-8: Unneeded variable: "ret". Return "0" on line 1655

Remove the local variable 'ret'.

  [ bp: Massage commit message, make mbm_config_write_domain() void. ]

Fixes: 92bd5a1390 ("x86/resctrl: Add interface to write mbm_total_bytes_config")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202401241810.jbd8Ipa1-lkp@intel.com/
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/202401241810.jbd8Ipa1-lkp@intel.com
2024-01-25 00:41:59 +01:00
Babu Moger
54e35eb861 x86/resctrl: Read supported bandwidth sources from CPUID
If the BMEC (Bandwidth Monitoring Event Configuration) feature is
supported, the bandwidth events can be configured. The maximum supported
bandwidth bitmask can be read from CPUID:

  CPUID_Fn80000020_ECX_x03 [Platform QoS Monitoring Bandwidth Event Configuration]
  Bits    Description
  31:7    Reserved
   6:0    Identifies the bandwidth sources that can be tracked.

While at it, move the mask checking to mon_config_write() before
iterating over all the domains. Also, print the valid bitmask when the
user tries to configure invalid event configuration value.

The CPUID details are documented in the Processor Programming Reference
(PPR) Vol 1.1 for AMD Family 19h Model 11h B1 - 55901 Rev 0.25 in the
Link tag.

Fixes: dc2a3e8579 ("x86/resctrl: Add interface to read mbm_total_bytes_config")
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Link: https://lore.kernel.org/r/669896fa512c7451319fa5ca2fdb6f7e015b5635.1705359148.git.babu.moger@amd.com
2024-01-23 16:26:42 +01:00
Babu Moger
4cee14bcb1 x86/resctrl: Display RMID of resource group
In x86, hardware uses RMID to identify a monitoring group. When a user
creates a monitor group these details are not visible. These details
can help resctrl debugging.

Add RMID(mon_hw_id) to the monitor groups display in the resctrl interface.
Users can see these details when resctrl is mounted with "-o debug" option.

Add RFTYPE_MON_BASE that complements existing RFTYPE_CTRL_BASE and
represents files belonging to monitoring groups.

Other architectures do not use "RMID". Use the name mon_hw_id to refer
to "RMID" in an effort to keep the naming generic.

For example:
  $cat /sys/fs/resctrl/mon_groups/mon_grp1/mon_hw_id
  3

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Peter Newman <peternewman@google.com>
Reviewed-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com>
Link: https://lore.kernel.org/r/20231017002308.134480-10-babu.moger@amd.com
2023-10-17 14:05:40 +02:00
Babu Moger
918f211b5e x86/resctrl: Add support for the files of MON groups only
Files unique to monitoring groups have the RFTYPE_MON flag. When a new
monitoring group is created the resctrl files with flags RFTYPE_BASE
(files common to all resource groups) and RFTYPE_MON (files unique to
monitoring groups) are created to support interacting with the new
monitoring group.

A resource group can support both monitoring and control, also termed
a CTRL_MON resource group. CTRL_MON groups should get both monitoring
and control resctrl files but that is not the case. Only the
RFTYPE_BASE and RFTYPE_CTRL files are created for CTRL_MON groups.

Ensure that files with the RFTYPE_MON flag are created for CTRL_MON groups.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Peter Newman <peternewman@google.com>
Reviewed-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com>
Link: https://lore.kernel.org/r/20231017002308.134480-9-babu.moger@amd.com
2023-10-17 14:05:24 +02:00
Babu Moger
ca8dad225e x86/resctrl: Display CLOSID for resource group
In x86, hardware uses CLOSID to identify a control group. When a user
creates a control group this information is not visible to the user. It
can help resctrl debugging.

Add CLOSID(ctrl_hw_id) to the control groups display in the resctrl
interface. Users can see this detail when resctrl is mounted with the
"-o debug" option.

Other architectures do not use "CLOSID". Use the names ctrl_hw_id to refer
to "CLOSID" in an effort to keep the naming generic.

For example:
  $cat /sys/fs/resctrl/ctrl_grp1/ctrl_hw_id
  1

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Peter Newman <peternewman@google.com>
Reviewed-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com>
Link: https://lore.kernel.org/r/20231017002308.134480-8-babu.moger@amd.com
2023-10-17 14:05:14 +02:00
Babu Moger
cb07d71f01 x86/resctrl: Introduce "-o debug" mount option
Add "-o debug" option to mount resctrl filesystem in debug mode.  When
in debug mode resctrl displays files that have the new RFTYPE_DEBUG flag
to help resctrl debugging.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Peter Newman <peternewman@google.com>
Reviewed-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com>
Link: https://lore.kernel.org/r/20231017002308.134480-7-babu.moger@amd.com
2023-10-17 13:07:17 +02:00
Babu Moger
d27567a0eb x86/resctrl: Move default group file creation to mount
The default resource group and its files are created during kernel init
time. Upcoming changes will make some resctrl files optional based on
a mount parameter. If optional files are to be added to the default
group based on the mount option, then each new file needs to be created
separately and call kernfs_activate() again.

Create all files of the default resource group during resctrl mount,
destroyed during unmount, to avoid scattering resctrl file addition
across two separate code flows.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Peter Newman <peternewman@google.com>
Reviewed-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com>
Link: https://lore.kernel.org/r/20231017002308.134480-6-babu.moger@amd.com
2023-10-17 12:50:53 +02:00
Babu Moger
df5f3a1dd8 x86/resctrl: Unwind properly from rdt_enable_ctx()
rdt_enable_ctx() enables the features provided during resctrl mount.

Additions to rdt_enable_ctx() are required to also modify error paths
of rdt_enable_ctx() callers to ensure correct unwinding if errors
are encountered after calling rdt_enable_ctx(). This is error prone.

Introduce rdt_disable_ctx() to refactor the error unwinding of
rdt_enable_ctx() to simplify future additions. This also simplifies
cleanup in rdt_kill_sb().

Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Peter Newman <peternewman@google.com>
Reviewed-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com>
Link: https://lore.kernel.org/r/20231017002308.134480-5-babu.moger@amd.com
2023-10-17 12:49:02 +02:00
Babu Moger
d41592435c x86/resctrl: Rename rftype flags for consistency
resctrl associates rftype flags with its files so that files can be chosen
based on the resource, whether it is info or base, and if it is control
or monitor type file. These flags use the RF_ as well as RFTYPE_ prefixes.

Change the prefix to RFTYPE_ for all these flags to be consistent.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Peter Newman <peternewman@google.com>
Reviewed-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com>
Link: https://lore.kernel.org/r/20231017002308.134480-4-babu.moger@amd.com
2023-10-17 11:59:14 +02:00
Babu Moger
6846dc1a31 x86/resctrl: Simplify rftype flag definitions
The rftype flags are bitmaps used for adding files under the resctrl
filesystem. Some of these bitmap defines have one extra level of
indirection which is not necessary.

Drop the RF_* defines and simplify the macros.

  [ bp: Massage commit message. ]

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Peter Newman <peternewman@google.com>
Reviewed-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com>
Link: https://lore.kernel.org/r/20231017002308.134480-3-babu.moger@amd.com
2023-10-17 11:51:16 +02:00
Babu Moger
fe2a20ea0b x86/resctrl: Add multiple tasks to the resctrl group at once
The resctrl task assignment for monitor or control group needs to be
done one at a time. For example:

  $mount -t resctrl resctrl /sys/fs/resctrl/
  $mkdir /sys/fs/resctrl/ctrl_grp1
  $echo 123 > /sys/fs/resctrl/ctrl_grp1/tasks
  $echo 456 > /sys/fs/resctrl/ctrl_grp1/tasks
  $echo 789 > /sys/fs/resctrl/ctrl_grp1/tasks

This is not user-friendly when dealing with hundreds of tasks.

Support multiple task assignment in one command with tasks ids separated
by commas. For example:

  $echo 123,456,789 > /sys/fs/resctrl/ctrl_grp1/tasks

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Peter Newman <peternewman@google.com>
Reviewed-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Tan Shaopeng <tan.shaopeng@jp.fujitsu.com>
Link: https://lore.kernel.org/r/20231017002308.134480-2-babu.moger@amd.com
2023-10-17 11:27:50 +02:00
Fenghua Yu
4dba8f10b8 x86/resctrl: Add sparse_masks file in info
Add the interface in resctrl FS to show if sparse cache allocation
bit masks are supported on the platform. Reading the file returns
either a "1" if non-contiguous 1s are supported and "0" otherwise.
The file path is /sys/fs/resctrl/info/{resource}/sparse_masks, where
{resource} can be either "L2" or "L3".

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Peter Newman <peternewman@google.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Peter Newman <peternewman@google.com>
Link: https://lore.kernel.org/r/7300535160beba41fd8aa073749ec1ee29b4621f.1696934091.git.maciej.wieczor-retman@intel.com
2023-10-11 21:51:24 +02:00
Maciej Wieczor-Retman
f05fd4ce99 x86/resctrl: Fix remaining kernel-doc warnings
The kernel test robot reported kernel-doc warnings here:

  arch/x86/kernel/cpu/resctrl/rdtgroup.c:915: warning: Function parameter or member 'of' not described in 'rdt_bit_usage_show'
  arch/x86/kernel/cpu/resctrl/rdtgroup.c:915: warning: Function parameter or member 'seq' not described in 'rdt_bit_usage_show'
  arch/x86/kernel/cpu/resctrl/rdtgroup.c:915: warning: Function parameter or member 'v' not described in 'rdt_bit_usage_show'
  arch/x86/kernel/cpu/resctrl/rdtgroup.c:1144: warning: Function parameter or member 'type' not described in '__rdtgroup_cbm_overlaps'
  arch/x86/kernel/cpu/resctrl/rdtgroup.c:1224: warning: Function parameter or member 'rdtgrp' not described in 'rdtgroup_mode_test_exclusive'
  arch/x86/kernel/cpu/resctrl/rdtgroup.c:1261: warning: Function parameter or member 'of' not described in 'rdtgroup_mode_write'
  arch/x86/kernel/cpu/resctrl/rdtgroup.c:1261: warning: Function parameter or member 'buf' not described in 'rdtgroup_mode_write'
  arch/x86/kernel/cpu/resctrl/rdtgroup.c:1261: warning: Function parameter or member 'nbytes' not described in 'rdtgroup_mode_write'
  arch/x86/kernel/cpu/resctrl/rdtgroup.c:1261: warning: Function parameter or member 'off' not described in 'rdtgroup_mode_write'
  arch/x86/kernel/cpu/resctrl/rdtgroup.c:1370: warning: Function parameter or member 'of' not described in 'rdtgroup_size_show'
  arch/x86/kernel/cpu/resctrl/rdtgroup.c:1370: warning: Function parameter or member 's' not described in 'rdtgroup_size_show'
  arch/x86/kernel/cpu/resctrl/rdtgroup.c:1370: warning: Function parameter or member 'v' not described in 'rdtgroup_size_show'

The first two functions are missing an argument description while the
other three are file callbacks and don't require a kernel-doc comment.

Closes: https://lore.kernel.org/oe-kbuild-all/202310070434.mD8eRNAz-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Newman <peternewman@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20231011064843.246592-1-maciej.wieczor-retman@intel.com
2023-10-11 09:44:41 +02:00
Peter Newman
8da2b938eb x86/resctrl: Implement rename op for mon groups
To change the resources allocated to a large group of tasks, such as an
application container, a container manager must write all of the tasks'
IDs into the tasks file interface of the new control group. This is
challenging when the container's task list is always changing.

In addition, if the container manager is using monitoring groups to
separately track the bandwidth of containers assigned to the same
control group, when moving a container, it must first move the
container's tasks to the default monitoring group of the new control
group before it can move these tasks into the container's replacement
monitoring group under the destination control group. This is
undesirable because it makes bandwidth usage during the move
unattributable to the correct tasks and resets monitoring event counters
and cache usage information for the group.

Implement the rename operation only for resctrlfs monitor groups to
enable users to move a monitoring group from one control group to
another. This effects a change in resources allocated to all the tasks
in the monitoring group while otherwise leaving the monitoring data
intact.

Signed-off-by: Peter Newman <peternewman@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20230419125015.693566-3-peternewman@google.com
2023-06-07 12:40:36 +02:00
Peter Newman
c45c06d4ae x86/resctrl: Factor rdtgroup lock for multi-file ops
rdtgroup_kn_lock_live() can only release a kernfs reference for a single
file before waiting on the rdtgroup_mutex, limiting its usefulness for
operations on multiple files, such as rename.

Factor the work needed to respectively break and unbreak active
protection on an individual file into rdtgroup_kn_{get,put}().

No functional change.

Signed-off-by: Peter Newman <peternewman@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20230419125015.693566-2-peternewman@google.com
2023-06-07 12:15:18 +02:00
Shawn Wang
2997d94b5d x86/resctrl: Only show tasks' pid in current pid namespace
When writing a task id to the "tasks" file in an rdtgroup,
rdtgroup_tasks_write() treats the pid as a number in the current pid
namespace. But when reading the "tasks" file, rdtgroup_tasks_show() shows
the list of global pids from the init namespace, which is confusing and
incorrect.

To be more robust, let the "tasks" file only show pids in the current pid
namespace.

Fixes: e02737d5b8 ("x86/intel_rdt: Add tasks files")
Signed-off-by: Shawn Wang <shawnwang@linux.alibaba.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Acked-by: Fenghua Yu <fenghua.yu@intel.com>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/all/20230116071246.97717-1-shawnwang@linux.alibaba.com/
2023-05-30 20:57:39 +02:00
Shawn Wang
0424a7dfe9 x86/resctrl: Clear staged_config[] before and after it is used
As a temporary storage, staged_config[] in rdt_domain should be cleared
before and after it is used. The stale value in staged_config[] could
cause an MSR access error.

Here is a reproducer on a system with 16 usable CLOSIDs for a 15-way L3
Cache (MBA should be disabled if the number of CLOSIDs for MB is less than
16.) :
	mount -t resctrl resctrl -o cdp /sys/fs/resctrl
	mkdir /sys/fs/resctrl/p{1..7}
	umount /sys/fs/resctrl/
	mount -t resctrl resctrl /sys/fs/resctrl
	mkdir /sys/fs/resctrl/p{1..8}

An error occurs when creating resource group named p8:
    unchecked MSR access error: WRMSR to 0xca0 (tried to write 0x00000000000007ff) at rIP: 0xffffffff82249142 (cat_wrmsr+0x32/0x60)
    Call Trace:
     <IRQ>
     __flush_smp_call_function_queue+0x11d/0x170
     __sysvec_call_function+0x24/0xd0
     sysvec_call_function+0x89/0xc0
     </IRQ>
     <TASK>
     asm_sysvec_call_function+0x16/0x20

When creating a new resource control group, hardware will be configured
by the following process:
    rdtgroup_mkdir()
      rdtgroup_mkdir_ctrl_mon()
        rdtgroup_init_alloc()
          resctrl_arch_update_domains()

resctrl_arch_update_domains() iterates and updates all resctrl_conf_type
whose have_new_ctrl is true. Since staged_config[] holds the same values as
when CDP was enabled, it will continue to update the CDP_CODE and CDP_DATA
configurations. When group p8 is created, get_config_index() called in
resctrl_arch_update_domains() will return 16 and 17 as the CLOSIDs for
CDP_CODE and CDP_DATA, which will be translated to an invalid register -
0xca0 in this scenario.

Fix it by clearing staged_config[] before and after it is used.

[reinette: re-order commit tags]

Fixes: 75408e4350 ("x86/resctrl: Allow different CODE/DATA configurations to be staged")
Suggested-by: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Shawn Wang <shawnwang@linux.alibaba.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/2fad13f49fbe89687fc40e9a5a61f23a28d1507a.1673988935.git.reinette.chatre%40intel.com
2023-03-15 15:19:43 -07:00
Linus Torvalds
7fef099702 x86/resctl: fix scheduler confusion with 'current'
The implementation of 'current' on x86 is very intentionally special: it
is a very common thing to look up, and it uses 'this_cpu_read_stable()'
to get the current thread pointer efficiently from per-cpu storage.

And the keyword in there is 'stable': the current thread pointer never
changes as far as a single thread is concerned.  Even if when a thread
is preempted, or moved to another CPU, or even across an explicit call
'schedule()' that thread will still have the same value for 'current'.

It is, after all, the kernel base pointer to thread-local storage.
That's why it's stable to begin with, but it's also why it's important
enough that we have that special 'this_cpu_read_stable()' access for it.

So this is all done very intentionally to allow the compiler to treat
'current' as a value that never visibly changes, so that the compiler
can do CSE and combine multiple different 'current' accesses into one.

However, there is obviously one very special situation when the
currently running thread does actually change: inside the scheduler
itself.

So the scheduler code paths are special, and do not have a 'current'
thread at all.  Instead there are _two_ threads: the previous and the
next thread - typically called 'prev' and 'next' (or prev_p/next_p)
internally.

So this is all actually quite straightforward and simple, and not all
that complicated.

Except for when you then have special code that is run in scheduler
context, that code then has to be aware that 'current' isn't really a
valid thing.  Did you mean 'prev'? Did you mean 'next'?

In fact, even if then look at the code, and you use 'current' after the
new value has been assigned to the percpu variable, we have explicitly
told the compiler that 'current' is magical and always stable.  So the
compiler is quite free to use an older (or newer) value of 'current',
and the actual assignment to the percpu storage is not relevant even if
it might look that way.

Which is exactly what happened in the resctl code, that blithely used
'current' in '__resctrl_sched_in()' when it really wanted the new
process state (as implied by the name: we're scheduling 'into' that new
resctl state).  And clang would end up just using the old thread pointer
value at least in some configurations.

This could have happened with gcc too, and purely depends on random
compiler details.  Clang just seems to have been more aggressive about
moving the read of the per-cpu current_task pointer around.

The fix is trivial: just make the resctl code adhere to the scheduler
rules of using the prev/next thread pointer explicitly, instead of using
'current' in a situation where it just wasn't valid.

That same code is then also used outside of the scheduler context (when
a thread resctl state is explicitly changed), and then we will just pass
in 'current' as that pointer, of course.  There is no ambiguity in that
case.

The fix may be trivial, but noticing and figuring out what went wrong
was not.  The credit for that goes to Stephane Eranian.

Reported-by: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/lkml/20230303231133.1486085-1-eranian@google.com/
Link: https://lore.kernel.org/lkml/alpine.LFD.2.01.0908011214330.3304@localhost.localdomain/
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Tested-by: Stephane Eranian <eranian@google.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-03-08 11:48:11 -08:00
Borislav Petkov (AMD)
793207bad7 x86/resctrl: Fix a silly -Wunused-but-set-variable warning
clang correctly complains

  arch/x86/kernel/cpu/resctrl/rdtgroup.c:1456:6: warning: variable \
     'h' set but not used [-Wunused-but-set-variable]
          u32 h;
              ^

but it can't know whether this use is innocuous or really a problem.
There's a reason why those warning switches are behind a W=1 and not
enabled by default - yes, one needs to do:

  make W=1 CC=clang HOSTCC=clang arch/x86/kernel/cpu/resctrl/

with clang 14 in order to trigger it.

I would normally not take a silly fix like that but this one is simple
and doesn't make the code uglier so...

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/202301242015.kbzkVteJ-lkp@intel.com
2023-01-26 11:15:20 +01:00
Babu Moger
4fe61bff5a x86/resctrl: Add interface to write mbm_local_bytes_config
The event configuration for mbm_local_bytes can be changed by the
user by writing to the configuration file
/sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config.

The event configuration settings are domain specific and will affect all
the CPUs in the domain.

Following are the types of events supported:

  ====  ===========================================================
  Bits   Description
  ====  ===========================================================
  6      Dirty Victims from the QOS domain to all types of memory
  5      Reads to slow memory in the non-local NUMA domain
  4      Reads to slow memory in the local NUMA domain
  3      Non-temporal writes to non-local NUMA domain
  2      Non-temporal writes to local NUMA domain
  1      Reads to memory in the non-local NUMA domain
  0      Reads to memory in the local NUMA domain
  ====  ===========================================================

For example, to change the mbm_local_bytes_config to count all the non-temporal
writes on domain 0, the bits 2 and 3 needs to be set which is 1100b (in hex
0xc).
Run the command:

  $echo  0=0xc > /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config

To change the mbm_local_bytes to count only reads to local NUMA domain 1,
the bit 0 needs to be set which 1b (in hex 0x1). Run the command:

  $echo  1=0x1 > /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-13-babu.moger@amd.com
2023-01-23 17:40:32 +01:00
Babu Moger
92bd5a1390 x86/resctrl: Add interface to write mbm_total_bytes_config
The event configuration for mbm_total_bytes can be changed by the user by
writing to the file /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config.

The event configuration settings are domain specific and affect all the
CPUs in the domain.

Following are the types of events supported:

  ====  ===========================================================
  Bits   Description
  ====  ===========================================================
  6      Dirty Victims from the QOS domain to all types of memory
  5      Reads to slow memory in the non-local NUMA domain
  4      Reads to slow memory in the local NUMA domain
  3      Non-temporal writes to non-local NUMA domain
  2      Non-temporal writes to local NUMA domain
  1      Reads to memory in the non-local NUMA domain
  0      Reads to memory in the local NUMA domain
  ====  ===========================================================

For example:

To change the mbm_total_bytes to count only reads on domain 0, the bits
0, 1, 4 and 5 needs to be set, which is 110011b (in hex 0x33).
Run the command:

  $echo  0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config

To change the mbm_total_bytes to count all the slow memory reads on domain 1,
the bits 4 and 5 needs to be set which is 110000b (in hex 0x30).
Run the command:

  $echo  1=0x30 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-12-babu.moger@amd.com
2023-01-23 17:40:30 +01:00
Babu Moger
73afb2d3ce x86/resctrl: Add interface to read mbm_local_bytes_config
The event configuration can be viewed by the user by reading the configuration
file /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config.  The event
configuration settings are domain specific and will affect all the CPUs in the
domain.

Following are the types of events supported:

  ====  ===========================================================
  Bits   Description
  ====  ===========================================================
  6      Dirty Victims from the QOS domain to all types of memory
  5      Reads to slow memory in the non-local NUMA domain
  4      Reads to slow memory in the local NUMA domain
  3      Non-temporal writes to non-local NUMA domain
  2      Non-temporal writes to local NUMA domain
  1      Reads to memory in the non-local NUMA domain
  0      Reads to memory in the local NUMA domain
  ====  ===========================================================

By default, the mbm_local_bytes_config is set to 0x15 to count all the local
event types.

For example:

  $cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
  0=0x15;1=0x15;2=0x15;3=0x15

In this case, the event mbm_local_bytes is configured with 0x15 on
domains 0 to 3.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-11-babu.moger@amd.com
2023-01-23 17:40:27 +01:00
Babu Moger
dc2a3e8579 x86/resctrl: Add interface to read mbm_total_bytes_config
The event configuration can be viewed by the user by reading the
configuration file /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config.  The
event configuration settings are domain specific and will affect all the CPUs in
the domain.

Following are the types of events supported:

  ====  ===========================================================
  Bits   Description
  ====  ===========================================================
  6      Dirty Victims from the QOS domain to all types of memory
  5      Reads to slow memory in the non-local NUMA domain
  4      Reads to slow memory in the local NUMA domain
  3      Non-temporal writes to non-local NUMA domain
  2      Non-temporal writes to local NUMA domain
  1      Reads to memory in the non-local NUMA domain
  0      Reads to memory in the local NUMA domain
  ====  ===========================================================

By default, the mbm_total_bytes_config is set to 0x7f to count all the
event types.

For example:

  $cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
  0=0x7f;1=0x7f;2=0x7f;3=0x7f

In this case, the event mbm_total_bytes is configured with 0x7f on
domains 0 to 3.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-10-babu.moger@amd.com
2023-01-23 17:40:24 +01:00
Babu Moger
d507f83ced x86/resctrl: Support monitor configuration
Add a new field in struct mon_evt to support Bandwidth Monitoring Event
Configuration (BMEC) and also update the "mon_features" display.

The resctrl file "mon_features" will display the supported events
and files that can be used to configure those events if monitor
configuration is supported.

Before the change:

  $ cat /sys/fs/resctrl/info/L3_MON/mon_features
  llc_occupancy
  mbm_total_bytes
  mbm_local_bytes

After the change when BMEC is supported:

  $ cat /sys/fs/resctrl/info/L3_MON/mon_features
  llc_occupancy
  mbm_total_bytes
  mbm_total_bytes_config
  mbm_local_bytes
  mbm_local_bytes_config

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-9-babu.moger@amd.com
2023-01-23 17:40:21 +01:00
Babu Moger
5b6fac3fa4 x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation
The QoS slow memory configuration details are available via
CPUID_Fn80000020_EDX_x02. Detect the available details and
initialize the rest to defaults.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-7-babu.moger@amd.com
2023-01-23 17:38:44 +01:00
Babu Moger
fc3b618c87 x86/resctrl: Replace smp_call_function_many() with on_each_cpu_mask()
on_each_cpu_mask() runs the function on each CPU specified by cpumask,
which may include the local processor.

Replace smp_call_function_many() with on_each_cpu_mask() to simplify
the code.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20230113152039.770054-2-babu.moger@amd.com
2023-01-23 17:38:04 +01:00
Peter Newman
fe1f071438 x86/resctrl: Fix task CLOSID/RMID update race
When the user moves a running task to a new rdtgroup using the task's
file interface or by deleting its rdtgroup, the resulting change in
CLOSID/RMID must be immediately propagated to the PQR_ASSOC MSR on the
task(s) CPUs.

x86 allows reordering loads with prior stores, so if the task starts
running between a task_curr() check that the CPU hoisted before the
stores in the CLOSID/RMID update then it can start running with the old
CLOSID/RMID until it is switched again because __rdtgroup_move_task()
failed to determine that it needs to be interrupted to obtain the new
CLOSID/RMID.

Refer to the diagram below:

CPU 0                                   CPU 1
-----                                   -----
__rdtgroup_move_task():
  curr <- t1->cpu->rq->curr
                                        __schedule():
                                          rq->curr <- t1
                                        resctrl_sched_in():
                                          t1->{closid,rmid} -> {1,1}
  t1->{closid,rmid} <- {2,2}
  if (curr == t1) // false
   IPI(t1->cpu)

A similar race impacts rdt_move_group_tasks(), which updates tasks in a
deleted rdtgroup.

In both cases, use smp_mb() to order the task_struct::{closid,rmid}
stores before the loads in task_curr().  In particular, in the
rdt_move_group_tasks() case, simply execute an smp_mb() on every
iteration with a matching task.

It is possible to use a single smp_mb() in rdt_move_group_tasks(), but
this would require two passes and a means of remembering which
task_structs were updated in the first loop. However, benchmarking
results below showed too little performance impact in the simple
approach to justify implementing the two-pass approach.

Times below were collected using `perf stat` to measure the time to
remove a group containing a 1600-task, parallel workload.

CPU: Intel(R) Xeon(R) Platinum P-8136 CPU @ 2.00GHz (112 threads)

  # mkdir /sys/fs/resctrl/test
  # echo $$ > /sys/fs/resctrl/test/tasks
  # perf bench sched messaging -g 40 -l 100000

task-clock time ranges collected using:

  # perf stat rmdir /sys/fs/resctrl/test

Baseline:                     1.54 - 1.60 ms
smp_mb() every matching task: 1.57 - 1.67 ms

  [ bp: Massage commit message. ]

Fixes: ae28d1aae4 ("x86/resctrl: Use an IPI instead of task_work_add() to update PQR_ASSOC MSR")
Fixes: 0efc89be94 ("x86/intel_rdt: Update task closid immediately on CPU in rmdir and unmount")
Signed-off-by: Peter Newman <peternewman@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20221220161123.432120-1-peternewman@google.com
2023-01-10 19:47:30 +01:00
James Morse
d80975e264 x86/resctrl: Add resctrl_rmid_realloc_limit to abstract x86's boot_cpu_data
resctrl_rmid_realloc_threshold can be set by user-space. The maximum
value is specified by the architecture.

Currently max_threshold_occ_write() reads the maximum value from
boot_cpu_data.x86_cache_size, which is not portable to another
architecture.

Add resctrl_rmid_realloc_limit to describe the maximum size in bytes
that user-space can set the threshold to.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jamie Iles <quic_jiles@quicinc.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Xin Hao <xhao@linux.alibaba.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20220902154829.30399-21-james.morse@arm.com
2022-09-23 14:24:16 +02:00
James Morse
ae2328b529 x86/resctrl: Rename and change the units of resctrl_cqm_threshold
resctrl_cqm_threshold is stored in a hardware specific chunk size,
but exposed to user-space as bytes.

This means the filesystem parts of resctrl need to know how the hardware
counts, to convert the user provided byte value to chunks. The interface
between the architecture's resctrl code and the filesystem ought to
treat everything as bytes.

Change the unit of resctrl_cqm_threshold to bytes. resctrl_arch_rmid_read()
still returns its value in chunks, so this needs converting to bytes.
As all the users have been touched, rename the variable to
resctrl_rmid_realloc_threshold, which describes what the value is for.

Neither r->num_rmid nor hw_res->mon_scale are guaranteed to be a power
of 2, so the existing code introduces a rounding error from resctrl's
theoretical fraction of the cache usage. This behaviour is kept as it
ensures the user visible value matches the value read from hardware
when the rmid will be reallocated.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jamie Iles <quic_jiles@quicinc.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Xin Hao <xhao@linux.alibaba.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20220902154829.30399-20-james.morse@arm.com
2022-09-23 14:23:41 +02:00
James Morse
b58d4eb1f1 x86/resctrl: Remove architecture copy of mbps_val
The resctrl arch code provides a second configuration array mbps_val[]
for the MBA software controller.

Since resctrl switched over to allocating and freeing its own array
when needed, nothing uses the arch code version.

Remove it.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jamie Iles <quic_jiles@quicinc.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Xin Hao <xhao@linux.alibaba.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20220902154829.30399-11-james.morse@arm.com
2022-09-22 17:37:16 +02:00
James Morse
6ce1560d35 x86/resctrl: Switch over to the resctrl mbps_val list
Updates to resctrl's software controller follow the same path as
other configuration updates, but they don't modify the hardware state.
rdtgroup_schemata_write() uses parse_line() and the resource's
parse_ctrlval() function to stage the configuration.
resctrl_arch_update_domains() then updates the mbps_val[] array
instead, and resctrl_arch_update_domains() skips the rdt_ctrl_update()
call that would update hardware.

This complicates the interface between resctrl's filesystem parts
and architecture specific code. It should be possible for mba_sc
to be completely implemented by the filesystem parts of resctrl. This
would allow it to work on a second architecture with no additional code.
resctrl_arch_update_domains() using the mbps_val[] array prevents this.

Change parse_bw() to write the configuration value directly to the
mbps_val[] array in the domain structure. Change rdtgroup_schemata_write()
to skip the call to resctrl_arch_update_domains(), meaning all the
mba_sc specific code in resctrl_arch_update_domains() can be removed.
On the read-side, show_doms() and update_mba_bw() are changed to read
the mbps_val[] array from the domain structure. With this,
resctrl_arch_get_config() no longer needs to consider mba_sc resources.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jamie Iles <quic_jiles@quicinc.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Xin Hao <xhao@linux.alibaba.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20220902154829.30399-10-james.morse@arm.com
2022-09-22 17:34:08 +02:00
James Morse
781096d971 x86/resctrl: Create mba_sc configuration in the rdt_domain
To support resctrl's MBA software controller, the architecture must provide
a second configuration array to hold the mbps_val[] from user-space.

This complicates the interface between the architecture specific code and
the filesystem portions of resctrl that will move to /fs/, to allow
multiple architectures to support resctrl.

Make the filesystem parts of resctrl create an array for the mba_sc
values. The software controller can be changed to use this, allowing
the architecture code to only consider the values configured in hardware.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jamie Iles <quic_jiles@quicinc.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Xin Hao <xhao@linux.alibaba.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20220902154829.30399-9-james.morse@arm.com
2022-09-22 17:17:59 +02:00
James Morse
b045c21586 x86/resctrl: Abstract and use supports_mba_mbps()
To determine whether the mba_MBps option to resctrl should be supported,
resctrl tests the boot CPUs' x86_vendor.

This isn't portable, and needs abstracting behind a helper so this check
can be part of the filesystem code that moves to /fs/.

Re-use the tests set_mba_sc() does to determine if the mba_sc is supported
on this system. An 'alloc_capable' test is added so that support for the
controls isn't implied by the 'delay_linear' property, which is always
true for MPAM. Because mbm_update() only update mba_sc if the mbm_local
counters are enabled, supports_mba_mbps() checks is_mbm_local_enabled().
(instead of using is_mbm_enabled(), which checks both).

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jamie Iles <quic_jiles@quicinc.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Xin Hao <xhao@linux.alibaba.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20220902154829.30399-8-james.morse@arm.com
2022-09-22 16:10:11 +02:00
James Morse
1644dfe727 x86/resctrl: Remove set_mba_sc()s control array re-initialisation
set_mba_sc() enables the 'software controller' to regulate the bandwidth
based on the byte counters. This can be managed entirely in the parts
of resctrl that move to /fs/, without any extra support from the
architecture specific code. set_mba_sc() is called by rdt_enable_ctx()
during mount and unmount. It currently resets the arch code's ctrl_val[]
and mbps_val[] arrays.

The ctrl_val[] was already reset when the domain was created, and by
reset_all_ctrls() when the filesystem was last unmounted. Doing the work
in set_mba_sc() is not necessary as the values are already at their
defaults due to the creation of the domain, or were previously reset
during umount(), or are about to reset during umount().

Add a reset of the mbps_val[] in reset_all_ctrls(), allowing the code in
set_mba_sc() that reaches in to the architecture specific structures to
be removed.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jamie Iles <quic_jiles@quicinc.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Xin Hao <xhao@linux.alibaba.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20220902154829.30399-7-james.morse@arm.com
2022-09-22 16:08:20 +02:00
James Morse
798fd4b9ac x86/resctrl: Add domain offline callback for resctrl work
Because domains are exposed to user-space via resctrl, the filesystem
must update its state when CPU hotplug callbacks are triggered.

Some of this work is common to any architecture that would support
resctrl, but the work is tied up with the architecture code to
free the memory.

Move the monitor subdir removal and the cancelling of the mbm/limbo
works into a new resctrl_offline_domain() call. These bits are not
specific to the architecture. Grouping them in one function allows
that code to be moved to /fs/ and re-used by another architecture.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jamie Iles <quic_jiles@quicinc.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Xin Hao <xhao@linux.alibaba.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20220902154829.30399-6-james.morse@arm.com
2022-09-22 15:42:40 +02:00
James Morse
3a7232cdf1 x86/resctrl: Add domain online callback for resctrl work
Because domains are exposed to user-space via resctrl, the filesystem
must update its state when CPU hotplug callbacks are triggered.

Some of this work is common to any architecture that would support
resctrl, but the work is tied up with the architecture code to
allocate the memory.

Move domain_setup_mon_state(), the monitor subdir creation call and the
mbm/limbo workers into a new resctrl_online_domain() call. These bits
are not specific to the architecture. Grouping them in one function
allows that code to be moved to /fs/ and re-used by another architecture.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jamie Iles <quic_jiles@quicinc.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Xin Hao <xhao@linux.alibaba.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20220902154829.30399-4-james.morse@arm.com
2022-09-22 15:13:27 +02:00
James Morse
bab6ee7368 x86/resctrl: Merge mon_capable and mon_enabled
mon_enabled and mon_capable are always set as a pair by
rdt_get_mon_l3_config().

There is no point having two values.

Merge them together.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jamie Iles <quic_jiles@quicinc.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Xin Hao <xhao@linux.alibaba.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20220902154829.30399-3-james.morse@arm.com
2022-09-22 14:43:08 +02:00
James Morse
4d269ed485 x86/resctrl: Kill off alloc_enabled
rdt_resources_all[] used to have extra entries for L2CODE/L2DATA.
These were hidden from resctrl by the alloc_enabled value.

Now that the L2/L2CODE/L2DATA resources have been merged together,
alloc_enabled doesn't mean anything, it always has the same value as
alloc_capable which indicates allocation is supported by this resource.

Remove alloc_enabled and its helpers.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jamie Iles <quic_jiles@quicinc.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Xin Hao <xhao@linux.alibaba.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20220902154829.30399-2-james.morse@arm.com
2022-09-22 14:34:33 +02:00
Yury Norov
3a5ff1f6dd x86: Replace cpumask_weight() with cpumask_empty() where appropriate
In some cases, x86 code calls cpumask_weight() to check if any bit of a
given cpumask is set.

This can be done more efficiently with cpumask_empty() because
cpumask_empty() stops traversing the cpumask as soon as it finds first set
bit, while cpumask_weight() counts all bits unconditionally.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Link: https://lore.kernel.org/r/20220210224933.379149-17-yury.norov@gmail.com
2022-04-10 22:35:38 +02:00
Greg Kroah-Hartman
f2eb478f2f kernfs: move struct kernfs_root out of the public view.
There is no need to have struct kernfs_root be part of kernfs.h for
the whole kernel to see and poke around it.  Move it internal to kernfs
code and provide a helper function, kernfs_root_to_node(), to handle the
one field that kernfs users were directly accessing from the structure.

Cc: Imran Khan <imran.f.khan@oracle.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220222070713.3517679-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-23 15:46:34 +01:00
James Morse
111136e69c x86/resctrl: Make resctrl_arch_get_config() return its value
resctrl_arch_get_config() has no return, but does pass a single value
back via one of its arguments.

Return the value instead.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210811163831.14917-1-james.morse@arm.com
2021-08-11 18:42:53 +02:00