Commit Graph

20375 Commits

Author SHA1 Message Date
James Morse
7028840552 x86/resctrl: Allow an architecture to disable pseudo lock
Pseudo-lock relies on knowledge of the micro-architecture to disable
prefetchers etc.

On arm64 these controls are typically secure only, meaning Linux can't access
them. Arm's cache-lockdown feature works in a very different way. Resctrl's
pseudo-lock isn't going to be used on arm64 platforms.

Add a Kconfig symbol that can be selected by the architecture. This enables or
disables building of the pseudo_lock.c file, and replaces the functions with
stubs. An additional IS_ENABLED() check is needed in rdtgroup_mode_write() so
that attempting to enable pseudo-lock reports an "Unknown or unsupported mode"
to user-space via the last_cmd_status file.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20250311183715.16445-25-james.morse@arm.com
2025-03-12 12:24:25 +01:00
James Morse
7d0ec14c64 x86/resctrl: Add resctrl_arch_ prefix to pseudo lock functions
resctrl's pseudo lock has some copy-to-cache and measurement functions that
are micro-architecture specific.

For example, pseudo_lock_fn() is not at all portable.

Label these 'resctrl_arch_' so they stay under /arch/x86.  To expose these
functions to the filesystem code they need an entry in a header file, and
can't be marked static.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20250311183715.16445-24-james.morse@arm.com
2025-03-12 12:24:22 +01:00
James Morse
c32a7d7777 x86/resctrl: Move mbm_cfg_mask to struct rdt_resource
The mbm_cfg_mask field lists the bits that user-space can set when configuring
an event. This value is output via the last_cmd_status file.

Once the filesystem parts of resctrl are moved to live in /fs/, the struct
rdt_hw_resource is inaccessible to the filesystem code. Because this value is
output to user-space, it has to be accessible to the filesystem code.

Move it to struct rdt_resource.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20250311183715.16445-23-james.morse@arm.com
2025-03-12 12:24:09 +01:00
James Morse
37bae17567 x86/resctrl: Move mba_mbps_default_event init to filesystem code
mba_mbps_default_event is initialised based on whether mbm_local or mbm_total
is supported. In the case of both, it is initialised to mbm_local.
mba_mbps_default_event is initialised in core.c's get_rdt_mon_resources(),
while all the readers are in rdtgroup.c.

After this code is split into architecture-specific and filesystem code,
get_rdt_mon_resources() remains part of the architecture code, which would
mean mba_mbps_default_event has to be exposed by the filesystem code.

Move the initialisation to the filesystem's resctrl_mon_resource_init().

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20250311183715.16445-22-james.morse@arm.com
2025-03-12 12:23:52 +01:00
James Morse
650680d651 x86/resctrl: Change mon_event_config_{read,write}() to be arch helpers
mon_event_config_{read,write}() are called via IPI and access model specific
registers to do their work.

To support another architecture, this needs abstracting.

Rename mon_event_config_{read,write}() to have a "resctrl_arch_" prefix, and
move their struct mon_config_info parameter into <linux/resctrl.h>.  This
allows another architecture to supply an implementation of these.

As struct mon_config_info is now exposed globally, give it a 'resctrl_'
prefix. MPAM systems need access to the domain to do this work, add the
resource and domain to struct resctrl_mon_config_info.

Co-developed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20250311183715.16445-21-james.morse@arm.com
2025-03-12 12:23:49 +01:00
James Morse
d81826f87a x86/resctrl: Add resctrl_arch_is_evt_configurable() to abstract BMEC
When BMEC is supported the resctrl event can be configured in a number of
ways. This depends on architecture support. rdt_get_mon_l3_config() modifies
the struct mon_evt and calls resctrl_file_fflags_init() to create the files
that allow the configuration.

Splitting this into separate architecture and filesystem parts would require
the struct mon_evt and resctrl_file_fflags_init() to be exposed.

Instead, add resctrl_arch_is_evt_configurable(), and use this from
resctrl_mon_resource_init() to initialise struct mon_evt and call
resctrl_file_fflags_init().

resctrl_arch_is_evt_configurable() calls rdt_cpu_has() so it doesn't obviously
benefit from being inlined. Putting it in core.c will allow rdt_cpu_has() to
eventually become static.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20250311183715.16445-20-james.morse@arm.com
2025-03-12 12:23:45 +01:00
James Morse
d012b66a16 x86/resctrl: Move the is_mbm_*_enabled() helpers to asm/resctrl.h
The architecture specific parts of resctrl provide helpers like
is_mbm_total_enabled() and is_mbm_local_enabled() to hide accesses to the
rdt_mon_features bitmap.

Exposing a group of helpers between the architecture and filesystem code is
preferable to a single unsigned-long like rdt_mon_features. Helpers can be more
readable and have a well defined behaviour, while allowing architectures to hide
more complex behaviour.

Once the filesystem parts of resctrl are moved, these existing helpers can no
longer live in internal.h. Move them to include/linux/resctrl.h Once these are
exposed to the wider kernel, they should have a 'resctrl_arch_' prefix, to fit
the rest of the arch<->fs interface.

Move and rename the helpers that touch rdt_mon_features directly. is_mbm_event()
and is_mbm_enabled() are only called from rdtgroup.c, so can be moved into that
file.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20250311183715.16445-19-james.morse@arm.com
2025-03-12 12:23:33 +01:00
James Morse
88464bff03 x86/resctrl: Rewrite and move the for_each_*_rdt_resource() walkers
The for_each_*_rdt_resource() helpers walk the architecture's array of
structures, using the resctrl visible part as an iterator. These became
over-complex when the structures were split into a filesystem and
architecture-specific struct. This approach avoided the need to touch every
call site, and was done before there was a helper to retrieve a resource by
rid.

Once the filesystem parts of resctrl are moved to /fs/, both the arch's
resource array, and the definition of those structures is no longer
accessible. To support resctrl, each architecture would have to provide
equally complex macros.

Rewrite the macro to make use of resctrl_arch_get_resource(), and move these
to include/linux/resctrl.h so existing x86 arch code continues to use them.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20250311183715.16445-18-james.morse@arm.com
2025-03-12 12:23:30 +01:00
James Morse
4b6bdbf27f x86/resctrl: Move monitor init work to a resctrl init call
rdt_get_mon_l3_config() is called from the arch's resctrl_arch_late_init(),
and initialises both architecture specific fields, such as hw_res->mon_scale
and resctrl filesystem fields by calling dom_data_init().

To separate the filesystem and architecture parts of resctrl, this function
needs splitting up.

Add resctrl_mon_resource_init() to do the filesystem specific work, and call
it from resctrl_init(). This runs later, but is still before the filesystem is
mounted and the rmid_ptrs[] array can be used.

  [ bp: Massage commit message. ]

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20250311183715.16445-17-james.morse@arm.com
2025-03-12 12:23:21 +01:00
James Morse
011842727f x86/resctrl: Move monitor exit work to a resctrl exit call
rdt_put_mon_l3_config() is called via the architecture's resctrl_arch_exit()
call, and appears to free the rmid_ptrs[] and closid_num_dirty_rmid[] arrays.
In reality this code is marked __exit, and is removed by the linker as resctrl
can't be built as a module.

To separate the filesystem and architecture parts of resctrl, this free()ing
work needs to be triggered by the filesystem, as these structures belong to
the filesystem code.

Rename rdt_put_mon_l3_config() to resctrl_mon_resource_exit() and call it from
resctrl_exit(). The kfree() is currently dependent on r->mon_capable.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20250311183715.16445-16-james.morse@arm.com
2025-03-12 12:23:13 +01:00
James Morse
9be68b144a x86/resctrl: Add an arch helper to reset one resource
On umount(), resctrl resets each resource back to its default configuration.
It only ever does this for all resources in one go.

reset_all_ctrls() is architecture specific as it works with struct
rdt_hw_resource.

Make reset_all_ctrls() an arch helper that resets one resource.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20250311183715.16445-15-james.morse@arm.com
2025-03-12 12:23:10 +01:00
James Morse
f16adbaf92 x86/resctrl: Move resctrl types to a separate header
When resctrl is fully factored into core and per-arch code, each arch will
need to use some resctrl common definitions in order to define its own
specializations and helpers.  Following conventional practice, it would be
desirable to put the dependent arch definitions in an <asm/resctrl.h> header
that is included by the common <linux/resctrl.h> header.  However, this can
make it awkward to avoid a circular dependency between <linux/resctrl.h> and
the arch header.

To avoid such dependencies, move the affected common types and constants into
a new header that does not need to depend on <linux/resctrl.h> or on the arch
headers.

The same logic applies to the monitor-configuration defines, move these too.

Some kind of enumeration for events is needed between the filesystem and
architecture code. Take the x86 definition as its convenient for x86.

The definition of enum resctrl_event_id is needed to allow the architecture
code to define resctrl_arch_mon_ctx_alloc() and resctrl_arch_mon_ctx_free().

The definition of enum resctrl_res_level is needed to allow the architecture
code to define resctrl_arch_set_cdp_enabled() and
resctrl_arch_get_cdp_enabled().

The bits for mbm_local_bytes_config et al are ABI, and must be the same on all
architectures. These are documented in Documentation/arch/x86/resctrl.rst

The maintainers entry for these headers was missed when resctrl.h was created.
Add a wildcard entry to match both resctrl.h and resctrl_types.h.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20250311183715.16445-14-james.morse@arm.com
2025-03-12 12:23:00 +01:00
James Morse
e3d5138cef x86/resctrl: Move rdt_find_domain() to be visible to arch and fs code
rdt_find_domain() finds a domain given a resource and a cache-id.  This is
used by both the architecture code and the filesystem code.

After the filesystem code moves to live in /fs/, this helper is either
duplicated by all architectures, or needs exposing by the filesystem code.

Add the declaration to the global header file. As it's now globally visible,
and has only a handful of callers, swap the 'rdt' for 'resctrl'. Move the
function to live with its caller in ctrlmondata.c as the filesystem code will
not have anything corresponding to core.c.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20250311183715.16445-13-james.morse@arm.com
2025-03-12 12:22:56 +01:00
James Morse
8079565d17 x86/resctrl: Expose resctrl fs's init function to the rest of the kernel
rdtgroup_init() needs exposing to the rest of the kernel so that arch code can
call it once it lives in core code. As this is one of the few functions
exposed, rename it to have "resctrl" in the name. The same goes for the exit
call.

Rename x86's arch code init functions for RDT to have an arch prefix to make
it clear these are part of the architecture code.

Co-developed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20250311183715.16445-12-james.morse@arm.com
2025-03-12 12:22:54 +01:00
James Morse
6f06aee356 x86/resctrl: Remove rdtgroup from update_cpu_closid_rmid()
update_cpu_closid_rmid() takes a struct rdtgroup as an argument, which it uses
to update the local CPUs default pqr values. This is a problem once the
resctrl parts move out to /fs/, as the arch code cannot poke around inside
struct rdtgroup.

Rename update_cpu_closid_rmid() as resctrl_arch_sync_cpus_defaults() to be
used as the target of an IPI, and pass the effective CLOSID and RMID in a new
struct.

Co-developed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20250311183715.16445-11-james.morse@arm.com
2025-03-12 12:22:51 +01:00
James Morse
aebd5354dd x86/resctrl: Add helper for setting CPU default properties
rdtgroup_rmdir_ctrl() and rdtgroup_rmdir_mon() set the per-CPU pqr_state for
CPUs that were part of the rmdir()'d group.

Another architecture might not have a 'pqr_state', its hardware may need the
values in a different format. MPAM's equivalent of RMID values are not unique,
and always need the CLOSID to be provided too.

There is only one caller that modifies a single value, (rdtgroup_rmdir_mon()).
MPAM always needs both CLOSID and RMID for the hardware value as these are
written to the same system register.

As rdtgroup_rmdir_mon() has the CLOSID on hand, only provide a helper to set
both values. These values are read by __resctrl_sched_in(), but may be written
by a different CPU without any locking, add READ/WRTE_ONCE() to avoid torn
values.

Co-developed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20250311183715.16445-10-james.morse@arm.com
2025-03-12 12:22:48 +01:00
James Morse
dbc58f7eec x86/resctrl: Generate default_ctrl instead of sharing it
The struct rdt_resource default_ctrl is used by both the architecture code for
resetting the hardware controls, and sometimes by the filesystem code as the
default value for the schema, unless the bandwidth software controller is in
use.

Having the default exposed by the architecture code causes unnecessary
duplication for each architecture as the default value must be specified, but
can be derived from other schema properties. Now that the maximum bandwidth is
explicitly described, resctrl can derive the default value from the schema
format and the other resource properties.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20250311183715.16445-9-james.morse@arm.com
2025-03-12 12:22:44 +01:00
James Morse
634ebb98b9 x86/resctrl: Add max_bw to struct resctrl_membw
__rdt_get_mem_config_amd() and __get_mem_config_intel() both use the
default_ctrl property as a maximum value. This is because the MBA schema works
differently between these platforms. Doing this complicates determining
whether the default_ctrl property belongs to the arch code, or can be derived
from the schema format.

Deriving the maximum or default value from the schema format would avoid the
architecture code having to tell resctrl such obvious things as the maximum
percentage is 100, and the maximum bitmap is all ones.

Maximum bandwidth is always going to vary per platform. Add max_bw as
a special case. This is currently used for the maximum MBA percentage on Intel
platforms, but can be removed from the architecture code if 'percentage'
becomes a schema format resctrl supports directly.

This value isn't needed for other schema formats.

This will allow the default_ctrl to be generated from the schema properties
when it is needed.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20250311183715.16445-8-james.morse@arm.com
2025-03-12 12:22:41 +01:00
James Morse
43312b8ea1 x86/resctrl: Remove data_width and the tabular format
The resctrl architecture code provides a data_width for the controls of each
resource. This is used to zero pad all control values in the schemata file so
they appear in columns. The same is done with the resource names to complete
the visual effect. e.g.

  | SMBA:0=2048
  |   L3:0=00ff

AMD platforms discover their maximum bandwidth for the MB resource from
firmware, but hard-code the data_width to 4. If the maximum bandwidth requires
more digits - the tabular format is silently broken.  This is also broken when
the mba_MBps mount option is used as the field width isn't updated. If new
schema are added resctrl will need to be able to determine the maximum width.
The benefit of this pretty-printing is questionable.

Instead of handling runtime discovery of the data_width for AMD platforms,
remove the feature. These fields are always zero padded so should be harmless
to remove if the whole field has been treated as a number.  In the above
example, this would now look like this:

  | SMBA:0=2048
  |   L3:0=ff

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20250311183715.16445-7-james.morse@arm.com
2025-03-12 12:22:36 +01:00
James Morse
bb9343c8f2 x86/resctrl: Use schema type to determine the schema format string
Resctrl's architecture code gets to specify a format string that is
used when printing schema entries. This is expected to be one of two
values that the filesystem code supports.

Setting this format string allows the architecture code to change
the ABI resctrl presents to user-space.

Instead, use the schema format enum to choose which format string to
use.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20250311183715.16445-6-james.morse@arm.com
2025-03-12 12:22:33 +01:00
James Morse
c24f5eab6b x86/resctrl: Use schema type to determine how to parse schema values
Resctrl's architecture code gets to specify a function pointer that is used
when parsing schema entries. This is expected to be one of two helpers from
the filesystem code.

Setting this function pointer allows the architecture code to change the ABI
resctrl presents to user-space, and forces resctrl to expose these helpers.

Instead, add a schema format enum to choose which schema parser to use. This
allows the helpers to be made static and the structs used for passing
arguments moved out of shared headers.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20250311183715.16445-5-james.morse@arm.com
2025-03-12 12:22:30 +01:00
James Morse
131dab13a8 x86/resctrl: Remove fflags from struct rdt_resource
The resctrl arch code specifies whether a resource controls a cache or memory
using the fflags field. This field is then used by resctrl to determine which
files should be exposed in the filesystem.

Allowing the architecture to pick this value means the RFTYPE_ flags have to
be in a shared header, and allows an architecture to create a combination that
resctrl does not support.

Remove the fflags field, and pick the value based on the resource id.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20250311183715.16445-4-james.morse@arm.com
2025-03-12 12:22:26 +01:00
James Morse
3c02153113 x86/resctrl: Add a helper to avoid reaching into the arch code resource list
Resctrl occasionally wants to know something about a specific resource, in
these cases it reaches into the arch code's rdt_resources_all[] array.

Once the filesystem parts of resctrl are moved to /fs/, this means it will
need visibility of the architecture specific struct rdt_hw_resource
definition, and the array of all resources.  All architectures would also need
a r_resctrl member in this struct.

Instead, abstract this via a helper to allow architectures to do different
things here. Move the level enum to the resctrl header and add a helper to
retrieve the struct rdt_resource by 'rid'.

resctrl_arch_get_resource() should not return NULL for any value in the enum,
it may instead return a dummy resource that is !alloc_enabled && !mon_enabled.

Co-developed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20250311183715.16445-3-james.morse@arm.com
2025-03-12 12:22:22 +01:00
James Morse
a121798ae6 x86/resctrl: Fix allocation of cleanest CLOSID on platforms with no monitors
Commit

  6eac36bb9e ("x86/resctrl: Allocate the cleanest CLOSID by searching closid_num_dirty_rmid")

added logic that causes resctrl to search for the CLOSID with the fewest dirty
cache lines when creating a new control group, if requested by the arch code.
This depends on the values read from the llc_occupancy counters. The logic is
applicable to architectures where the CLOSID effectively forms part of the
monitoring identifier and so do not allow complete freedom to choose an unused
monitoring identifier for a given CLOSID.

This support missed that some platforms may not have these counters.  This
causes a NULL pointer dereference when creating a new control group as the
array was not allocated by dom_data_init().

As this feature isn't necessary on platforms that don't have cache occupancy
monitors, add this to the check that occurs when a new control group is
allocated.

Fixes: 6eac36bb9e ("x86/resctrl: Allocate the cleanest CLOSID by searching closid_num_dirty_rmid")
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20250311183715.16445-2-james.morse@arm.com
2025-03-12 12:21:48 +01:00
Greg Kroah-Hartman
993a47bd7b Merge 6.14-rc6 into driver-core-next
We need the driver core fix in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-10 17:37:25 +01:00
Florent Revest
e3e89178a9 x86/microcode/AMD: Fix out-of-bounds on systems with CPU-less NUMA nodes
Currently, load_microcode_amd() iterates over all NUMA nodes, retrieves their
CPU masks and unconditionally accesses per-CPU data for the first CPU of each
mask.

According to Documentation/admin-guide/mm/numaperf.rst:

  "Some memory may share the same node as a CPU, and others are provided as
  memory only nodes."

Therefore, some node CPU masks may be empty and wouldn't have a "first CPU".

On a machine with far memory (and therefore CPU-less NUMA nodes):
- cpumask_of_node(nid) is 0
- cpumask_first(0) is CONFIG_NR_CPUS
- cpu_data(CONFIG_NR_CPUS) accesses the cpu_info per-CPU array at an
  index that is 1 out of bounds

This does not have any security implications since flashing microcode is
a privileged operation but I believe this has reliability implications by
potentially corrupting memory while flashing a microcode update.

When booting with CONFIG_UBSAN_BOUNDS=y on an AMD machine that flashes
a microcode update. I get the following splat:

  UBSAN: array-index-out-of-bounds in arch/x86/kernel/cpu/microcode/amd.c:X:Y
  index 512 is out of range for type 'unsigned long[512]'
  [...]
  Call Trace:
   dump_stack
   __ubsan_handle_out_of_bounds
   load_microcode_amd
   request_microcode_amd
   reload_store
   kernfs_fop_write_iter
   vfs_write
   ksys_write
   do_syscall_64
   entry_SYSCALL_64_after_hwframe

Change the loop to go over only NUMA nodes which have CPUs before determining
whether the first CPU on the respective node needs microcode update.

  [ bp: Massage commit message, fix typo. ]

Fixes: 7ff6edf4fe ("x86/microcode/AMD: Fix mixed steppings support")
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250310144243.861978-1-revest@chromium.org
2025-03-10 16:02:54 +01:00
Vladis Dronov
65be5c95d0 x86/sgx: Warn explicitly if X86_FEATURE_SGX_LC is not enabled
The kernel requires X86_FEATURE_SGX_LC to be able to create SGX enclaves,
not just X86_FEATURE_SGX.

There is quite a number of hardware which has X86_FEATURE_SGX but not
X86_FEATURE_SGX_LC. A kernel running on such hardware does not create
the /dev/sgx_enclave file and does so silently.

Explicitly warn if X86_FEATURE_SGX_LC is not enabled to properly notify
users that the kernel disabled the SGX driver.

The X86_FEATURE_SGX_LC, a.k.a. SGX Launch Control, is a CPU feature
that enables LE (Launch Enclave) hash MSRs to be writable (with
additional opt-in required in the 'feature control' MSR) when running
enclaves, i.e. using a custom root key rather than the Intel proprietary
key for enclave signing.

I've hit this issue myself and have spent some time researching where
my /dev/sgx_enclave file went on SGX-enabled hardware.

Related links:

  https://github.com/intel/linux-sgx/issues/837
  https://patchwork.kernel.org/project/platform-driver-x86/patch/20180827185507.17087-3-jarkko.sakkinen@linux.intel.com/

[ mingo: Made the error message a bit more verbose, and added other cases
         where the kernel fails to create the /dev/sgx_enclave device node. ]

Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Kai Huang <kai.huang@intel.com>
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250309172215.21777-2-vdronov@redhat.com
2025-03-10 12:29:18 +01:00
Sebastian Andrzej Siewior
14daa3bca2 x86: Use RCU in all users of __module_address().
__module_address() can be invoked within a RCU section, there is no
requirement to have preemption disabled.

Replace the preempt_disable() section around __module_address() with
RCU.

Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250108090457.512198-23-bigeasy@linutronix.de
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
2025-03-10 11:54:45 +01:00
Borislav Petkov (AMD)
058a6bec37 x86/microcode/AMD: Add some forgotten models to the SHA check
Add some more forgotten models to the SHA check.

Fixes: 50cef76d5c ("x86/microcode/AMD: Load only SHA256-checksummed patches")
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Toralf Förster <toralf.foerster@gmx.de>
Link: https://lore.kernel.org/r/20250307220256.11816-1-bp@kernel.org
2025-03-08 20:09:37 +01:00
Ingo Molnar
14296d0e85 Merge branch 'linus' into x86/urgent, to pick up dependent patches
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-03-08 20:09:27 +01:00
Ingo Molnar
f23ecef20a Merge branch 'locking/urgent' into locking/core, to pick up locking fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-03-08 00:54:06 +01:00
Linus Torvalds
042751d353 Merge tag 'x86-urgent-2025-03-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 fixes from Ingo Molnar:

 - Fix CPUID leaf 0x2 parsing bugs

 - Sanitize very early boot parameters to avoid crash

 - Fix size overflows in the SGX code

 - Make CALL_NOSPEC use consistent

* tag 'x86-urgent-2025-03-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Sanitize boot params before parsing command line
  x86/sgx: Fix size overflows in sgx_encl_create()
  x86/cpu: Properly parse CPUID leaf 0x2 TLB descriptor 0x63
  x86/cpu: Validate CPUID leaf 0x2 EDX output
  x86/cacheinfo: Validate CPUID leaf 0x2 EDX output
  x86/speculation: Add a conditional CS prefix to CALL_NOSPEC
  x86/speculation: Simplify and make CALL_NOSPEC consistent
2025-03-07 10:05:32 -10:00
Andrew Cooper
14cb5d8306 x86/amd_nb: Use rdmsr_safe() in amd_get_mmconfig_range()
Xen doesn't offer MSR_FAM10H_MMIO_CONF_BASE to all guests.  This results
in the following warning:

  unchecked MSR access error: RDMSR from 0xc0010058 at rIP: 0xffffffff8101d19f (xen_do_read_msr+0x7f/0xa0)
  Call Trace:
   xen_read_msr+0x1e/0x30
   amd_get_mmconfig_range+0x2b/0x80
   quirk_amd_mmconfig_area+0x28/0x100
   pnp_fixup_device+0x39/0x50
   __pnp_add_device+0xf/0x150
   pnp_add_device+0x3d/0x100
   pnpacpi_add_device_handler+0x1f9/0x280
   acpi_ns_get_device_callback+0x104/0x1c0
   acpi_ns_walk_namespace+0x1d0/0x260
   acpi_get_devices+0x8a/0xb0
   pnpacpi_init+0x50/0x80
   do_one_initcall+0x46/0x2e0
   kernel_init_freeable+0x1da/0x2f0
   kernel_init+0x16/0x1b0
   ret_from_fork+0x30/0x50
   ret_from_fork_asm+0x1b/0x30

based on quirks for a "PNP0c01" device.  Treating MMCFG as disabled is the
right course of action, so no change is needed there.

This was most likely exposed by fixing the Xen MSR accessors to not be
silently-safe.

Fixes: 3fac3734c4 ("xen/pv: support selecting safe/unsafe msr accesses")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250307002846.3026685-1-andrew.cooper3@citrix.com
2025-03-07 13:28:31 +01:00
Maksim Davydov
c929d08df8 x86/split_lock: Fix the delayed detection logic
If the warning mode with disabled mitigation mode is used, then on each
CPU where the split lock occurred detection will be disabled in order to
make progress and delayed work will be scheduled, which then will enable
detection back.

Now it turns out that all CPUs use one global delayed work structure.
This leads to the fact that if a split lock occurs on several CPUs
at the same time (within 2 jiffies), only one CPU will schedule delayed
work, but the rest will not.

The return value of schedule_delayed_work_on() would have shown this,
but it is not checked in the code.

A diagram that can help to understand the bug reproduction:

 - sld_update_msr() enables/disables SLD on both CPUs on the same core

 - schedule_delayed_work_on() internally checks WORK_STRUCT_PENDING_BIT.
   If a work has the 'pending' status, then schedule_delayed_work_on()
   will return an error code and, most importantly, the work will not
   be placed in the workqueue.

Let's say we have a multicore system on which split_lock_mitigate=0 and
a multithreaded application is running that calls splitlock in multiple
threads. Due to the fact that sld_update_msr() affects the entire core
(both CPUs), we will consider 2 CPUs from different cores. Let the 2
threads of this application schedule to CPU0 (core 0) and to CPU 2
(core 1), then:

|                                 ||                                   |
|             CPU 0 (core 0)      ||          CPU 2 (core 1)           |
|_________________________________||___________________________________|
|                                 ||                                   |
| 1) SPLIT LOCK occured           ||                                   |
|                                 ||                                   |
| 2) split_lock_warn()            ||                                   |
|                                 ||                                   |
| 3) sysctl_sld_mitigate == 0     ||                                   |
|    (work = &sl_reenable)        ||                                   |
|                                 ||                                   |
| 4) schedule_delayed_work_on()   ||                                   |
|    (reenable will be called     ||                                   |
|     after 2 jiffies on CPU 0)   ||                                   |
|                                 ||                                   |
| 5) disable SLD for core 0       ||                                   |
|                                 ||                                   |
|    -------------------------    ||                                   |
|                                 ||                                   |
|                                 || 6) SPLIT LOCK occured             |
|                                 ||                                   |
|                                 || 7) split_lock_warn()              |
|                                 ||                                   |
|                                 || 8) sysctl_sld_mitigate == 0       |
|                                 ||    (work = &sl_reenable,          |
|                                 ||     the same address as in 3) )   |
|                                 ||                                   |
|            2 jiffies            || 9) schedule_delayed_work_on()     |
|                                 ||    fials because the work is in   |
|                                 ||    the pending state since 4).    |
|                                 ||    The work wasn't placed to the  |
|                                 ||    workqueue. reenable won't be   |
|                                 ||    called on CPU 2                |
|                                 ||                                   |
|                                 || 10) disable SLD for core 0        |
|                                 ||                                   |
|                                 ||     From now on SLD will          |
|                                 ||     never be reenabled on core 1  |
|                                 ||                                   |
|    -------------------------    ||                                   |
|                                 ||                                   |
|    11) enable SLD for core 0 by ||                                   |
|        __split_lock_reenable    ||                                   |
|                                 ||                                   |

If the application threads can be scheduled to all processor cores,
then over time there will be only one core left, on which SLD will be
enabled and split lock will be able to be detected; and on all other
cores SLD will be disabled all the time.

Most likely, this bug has not been noticed for so long because
sysctl_sld_mitigate default value is 1, and in this case a semaphore
is used that does not allow 2 different cores to have SLD disabled at
the same time, that is, strictly only one work is placed in the
workqueue.

In order to fix the warning mode with disabled mitigation mode,
delayed work has to be per-CPU. Implement it.

Fixes: 727209376f ("x86/split_lock: Add sysctl to control the misery mode")
Signed-off-by: Maksim Davydov <davydov-max@yandex-team.ru>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20250115131704.132609-1-davydov-max@yandex-team.ru
2025-03-07 13:01:02 +01:00
Ingo Molnar
82354fce16 Merge branch 'sched/urgent' into sched/core, to pick up dependent commits
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-03-06 22:26:36 +01:00
Rafael J. Wysocki
f96d92fcbb Merge tag 'amd-pstate-v6.15-2025-03-06' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux
Merge amd-pstate updates for 6.15 (3/6/25) from Mario Limonciello:

"A lot of code optimization to avoid cases where call paths will end up
 calling the same writes multiple times and needlessly caching variables.
 To accomplish this some of the writes are now made into an atomically
 written "perf" variable. Locking has been overhauled to ensure it only
 applies to the necessary functions. Tracing has been adjusted to ensure
 trace events only are used right before writing out to the hardware."

NOTE: This is a redo of amd-pstate-v6.15-2025-03-03 with a fixed Fixes tag.

* tag 'amd-pstate-v6.15-2025-03-06' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux: (29 commits)
  cpufreq/amd-pstate: Drop actions in amd_pstate_epp_cpu_offline()
  cpufreq/amd-pstate: Stop caching EPP
  cpufreq/amd-pstate: Rework CPPC enabling
  cpufreq/amd-pstate: Drop debug statements for policy setting
  cpufreq/amd-pstate: Update cppc_req_cached for shared mem EPP writes
  cpufreq/amd-pstate: Move all EPP tracing into *_update_perf and *_set_epp functions
  cpufreq/amd-pstate: Cache CPPC request in shared mem case too
  cpufreq/amd-pstate: Replace all AMD_CPPC_* macros with masks
  cpufreq/amd-pstate-ut: Adjust variable scope
  cpufreq/amd-pstate-ut: Run on all of the correct CPUs
  cpufreq/amd-pstate-ut: Drop SUCCESS and FAIL enums
  cpufreq/amd-pstate-ut: Allow lowest nonlinear and lowest to be the same
  cpufreq/amd-pstate-ut: Use _free macro to free put policy
  cpufreq/amd-pstate: Drop `cppc_cap1_cached`
  cpufreq/amd-pstate: Overhaul locking
  cpufreq/amd-pstate: Move perf values into a union
  cpufreq/amd-pstate: Drop min and max cached frequencies
  cpufreq/amd-pstate: Show a warning when a CPU fails to setup
  cpufreq/amd-pstate: Invalidate cppc_req_cached during suspend
  cpufreq/amd-pstate: Fix the clamping of perf values
  ...
2025-03-06 21:28:48 +01:00
Mario Limonciello
b4cc466b97 cpufreq/amd-pstate: Replace all AMD_CPPC_* macros with masks
Bitfield masks are easier to follow and less error prone.

Reviewed-by: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2025-03-06 13:01:25 -06:00
Eric Biggers
d021985504 x86/fpu: Improve crypto performance by making kernel-mode FPU reliably usable in softirqs
Background:
===========

Currently kernel-mode FPU is not always usable in softirq context on
x86, since softirqs can nest inside a kernel-mode FPU section in task
context, and nested use of kernel-mode FPU is not supported.

Therefore, x86 SIMD-optimized code that can be called in softirq context
has to sometimes fall back to non-SIMD code.  There are two options for
the fallback, both of which are pretty terrible:

  (a) Use a scalar fallback.  This can be 10-100x slower than vectorized
      code because it cannot use specialized instructions like AES, SHA,
      or carryless multiplication.

  (b) Execute the request asynchronously using a kworker.  In other
      words, use the "crypto SIMD helper" in crypto/simd.c.

Currently most of the x86 en/decryption code (skcipher and aead
algorithms) uses option (b), since this avoids the slow scalar fallback
and it is easier to wire up.  But option (b) is still really bad for its
own reasons:

  - Punting the request to a kworker is bad for performance too.

  - It forces the algorithm to be marked as asynchronous
    (CRYPTO_ALG_ASYNC), preventing it from being used by crypto API
    users who request a synchronous algorithm.  That's another huge
    performance problem, which is especially unfortunate for users who
    don't even do en/decryption in softirq context.

  - It makes all en/decryption operations take a detour through
    crypto/simd.c.  That involves additional checks and an additional
    indirect call, which slow down en/decryption for *everyone*.

Fortunately, the skcipher and aead APIs are only usable in task and
softirq context in the first place.  Thus, if kernel-mode FPU were to be
reliably usable in softirq context, no fallback would be needed.
Indeed, other architectures such as arm, arm64, and riscv have already
done this.

Changes implemented:
====================

Therefore, this patch updates x86 accordingly to reliably support
kernel-mode FPU in softirqs.

This is done by just disabling softirq processing in kernel-mode FPU
sections (when hardirqs are not already disabled), as that prevents the
nesting that was problematic.

This will delay some softirqs slightly, but only ones that would have
otherwise been nested inside a task context kernel-mode FPU section.
Any such softirqs would have taken the slow fallback path before if they
tried to do any en/decryption.  Now these softirqs will just run at the
end of the task context kernel-mode FPU section (since local_bh_enable()
runs pending softirqs) and will no longer take the slow fallback path.

Alternatives considered:
========================

- Make kernel-mode FPU sections fully preemptible.  This would require
  growing task_struct by another struct fpstate which is more than 2K.

- Make softirqs save/restore the kernel-mode FPU state to a per-CPU
  struct fpstate when nested use is detected.  Somewhat interesting, but
  seems unnecessary when a simpler solution exists.

Performance results:
====================

I did some benchmarks with AES-XTS encryption of 16-byte messages (which is
unrealistically small, but this makes it easier to see the overhead of
kernel-mode FPU...).  The baseline was 384 MB/s.  Removing the use of
crypto/simd.c, which this work makes possible, increases it to 487 MB/s,
a +27% improvement in throughput.

CPU was AMD Ryzen 9 9950X (Zen 5).  No debugging options were enabled.

[ mingo: Prettified the changelog and added performance results. ]

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Link: https://lore.kernel.org/r/20250304204954.3901-1-ebiggers@kernel.org
2025-03-06 12:44:09 +01:00
Jiri Olsa
fa6192adc3 uprobes/x86: Harden uretprobe syscall trampoline check
Jann reported a possible issue when trampoline_check_ip returns
address near the bottom of the address space that is allowed to
call into the syscall if uretprobes are not set up:

   https://lore.kernel.org/bpf/202502081235.5A6F352985@keescook/T/#m9d416df341b8fbc11737dacbcd29f0054413cbbf

Though the mmap minimum address restrictions will typically prevent
creating mappings there, let's make sure uretprobe syscall checks
for that.

Fixes: ff474a78ce ("uprobe: Add uretprobe syscall to speed up return probe")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250212220433.3624297-1-jolsa@kernel.org
2025-03-06 12:22:45 +01:00
Jarkko Sakkinen
0d3e0dfd68 x86/sgx: Fix size overflows in sgx_encl_create()
The total size calculated for EPC can overflow u64 given the added up page
for SECS.  Further, the total size calculated for shmem can overflow even
when the EPC size stays within limits of u64, given that it adds the extra
space for 128 byte PCMD structures (one for each page).

Address this by pre-evaluating the micro-architectural requirement of
SGX: the address space size must be power of two. This is eventually
checked up by ECREATE but the pre-check has the additional benefit of
making sure that there is some space for additional data.

Fixes: 888d249117 ("x86/sgx: Add SGX_IOC_ENCLAVE_CREATE")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lore.kernel.org/r/20250305050006.43896-1-jarkko@kernel.org

Closes: https://lore.kernel.org/linux-sgx/c87e01a0-e7dd-4749-a348-0980d3444f04@stanley.mountain/
2025-03-05 09:51:41 +01:00
Linus Torvalds
bb2281fb05 Merge tag 'x86_microcode_for_v6.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull AMD microcode loading fixes from Borislav Petkov:

 - Load only sha256-signed microcode patch blobs

 - Other good cleanups

* tag 'x86_microcode_for_v6.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/AMD: Load only SHA256-checksummed patches
  x86/microcode/AMD: Add get_patch_level()
  x86/microcode/AMD: Get rid of the _load_microcode_amd() forward declaration
  x86/microcode/AMD: Merge early_apply_microcode() into its single callsite
  x86/microcode/AMD: Remove unused save_microcode_in_initrd_amd() declarations
  x86/microcode/AMD: Remove ugly linebreak in __verify_patch_section() signature
2025-03-04 19:05:53 -10:00
Uros Bizjak
6d536cad0d x86/percpu: Fix __per_cpu_hot_end marker
Make __per_cpu_hot_end marker point to the end of the percpu cache
hot data, not to the end of the percpu cache hot section.

This fixes CONFIG_MPENTIUM4 case where X86_L1_CACHE_SHIFT
is set to 7 (128 bytes).

Also update assert message accordingly.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Link: https://lore.kernel.org/r/20250304173455.89361-1-ubizjak@gmail.com

Closes: https://lore.kernel.org/lkml/Z8a-NVJs-pm5W-mG@gmail.com/
2025-03-04 20:30:33 +01:00
Brian Gerst
06aa03056f x86/smp: Move this_cpu_off to percpu hot section
No functional change.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250303165246.2175811-12-brgerst@gmail.com
2025-03-04 20:30:33 +01:00
Brian Gerst
f3856cd343 x86/stackprotector: Move __stack_chk_guard to percpu hot section
No functional change.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250303165246.2175811-11-brgerst@gmail.com
2025-03-04 20:30:33 +01:00
Brian Gerst
a1e4cc0155 x86/percpu: Move current_task to percpu hot section
No functional change.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250303165246.2175811-10-brgerst@gmail.com
2025-03-04 20:30:33 +01:00
Brian Gerst
385f72c83e x86/percpu: Move top_of_stack to percpu hot section
No functional change.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250303165246.2175811-9-brgerst@gmail.com
2025-03-04 20:30:33 +01:00
Brian Gerst
c6a0918072 x86/irq: Move irq stacks to percpu hot section
No functional change.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250303165246.2175811-8-brgerst@gmail.com
2025-03-04 20:30:33 +01:00
Brian Gerst
c8f1ac2bd7 x86/softirq: Move softirq_pending to percpu hot section
No functional change.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250303165246.2175811-7-brgerst@gmail.com
2025-03-04 20:30:33 +01:00
Brian Gerst
839be1619f x86/retbleed: Move call depth to percpu hot section
No functional change.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250303165246.2175811-6-brgerst@gmail.com
2025-03-04 20:30:33 +01:00
Brian Gerst
01c7bc5198 x86/smp: Move cpu number to percpu hot section
No functional change.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250303165246.2175811-5-brgerst@gmail.com
2025-03-04 20:30:33 +01:00