commit 9505b98ccd upstream.
pxa_cpufreq_init_voltages() is marked __init but usually inlined into
the non-__init pxa_cpufreq_init() function. When building with clang,
it can stay as a standalone function in a discarded section, and produce
this warning:
WARNING: vmlinux.o(.text+0x616a00): Section mismatch in reference from the function pxa_cpufreq_init() to the function .init.text:pxa_cpufreq_init_voltages()
The function pxa_cpufreq_init() references
the function __init pxa_cpufreq_init_voltages().
This is often because pxa_cpufreq_init lacks a __init
annotation or the annotation of pxa_cpufreq_init_voltages is wrong.
Fixes: 50e77fcd79 ("ARM: pxa: remove __init from cpufreq_driver->init()")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Robert Jarzmik <robert.jarzmik@free.fr>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0334906c06 upstream.
Commit 5ad7346b4a ("cpufreq: kryo: Add module remove and exit") made
it possible to build the kryo cpufreq driver as a module, but it failed
to release all the resources, i.e. OPP tables, when the module is
unloaded.
This patch fixes it by releasing the OPP tables, by calling
dev_pm_opp_put_supported_hw() for them, from the
qcom_cpufreq_kryo_remove() routine. The array of pointers to the OPP
tables is also allocated dynamically now in qcom_cpufreq_kryo_probe(),
as the pointers will be required while releasing the resources.
Compile tested only.
Cc: 4.18+ <stable@vger.kernel.org> # v4.18+
Fixes: 5ad7346b4a ("cpufreq: kryo: Add module remove and exit")
Reviewed-by: Georgi Djakov <georgi.djakov@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 625c85a62c upstream.
The cpufreq_global_kobject is created using kobject_create_and_add()
helper, which assigns the kobj_type as dynamic_kobj_ktype and show/store
routines are set to kobj_attr_show() and kobj_attr_store().
These routines pass struct kobj_attribute as an argument to the
show/store callbacks. But all the cpufreq files created using the
cpufreq_global_kobject expect the argument to be of type struct
attribute. Things work fine currently as no one accesses the "attr"
argument. We may not see issues even if the argument is used, as struct
kobj_attribute has struct attribute as its first element and so they
will both get same address.
But this is logically incorrect and we should rather use struct
kobj_attribute instead of struct global_attr in the cpufreq core and
drivers and the show/store callbacks should take struct kobj_attribute
as argument instead.
This bug is caught using CFI CLANG builds in android kernel which
catches mismatch in function prototypes for such callbacks.
Reported-by: Donghee Han <dh.han@samsung.com>
Reported-by: Sangkyu Kim <skwith.kim@samsung.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2f66196208 ]
cpuinfo_cur_freq gets current CPU frequency as detected by hardware
while scaling_cur_freq last known CPU frequency. Some platforms may not
allow checking the CPU frequency of an offline CPU or the associated
resources may have been released via cpufreq_exit when the CPU gets
offlined, in which case the policy would have been invalidated already.
If we attempt to get current frequency from the hardware, it may result
in hang or crash.
For example on Juno, I see:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000188
[0000000000000188] pgd=0000000000000000
Internal error: Oops: 96000004 [#1] PREEMPT SMP
Modules linked in:
CPU: 5 PID: 4202 Comm: cat Not tainted 4.20.0-08251-ga0f2c0318a15-dirty #87
Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform
pstate: 40000005 (nZcv daif -PAN -UAO)
pc : scmi_cpufreq_get_rate+0x34/0xb0
lr : scmi_cpufreq_get_rate+0x34/0xb0
Call trace:
scmi_cpufreq_get_rate+0x34/0xb0
__cpufreq_get+0x34/0xc0
show_cpuinfo_cur_freq+0x24/0x78
show+0x40/0x60
sysfs_kf_seq_show+0xc0/0x148
kernfs_seq_show+0x44/0x50
seq_read+0xd4/0x480
kernfs_fop_read+0x15c/0x208
__vfs_read+0x60/0x188
vfs_read+0x94/0x150
ksys_read+0x6c/0xd8
__arm64_sys_read+0x24/0x30
el0_svc_common+0x78/0x100
el0_svc_handler+0x38/0x78
el0_svc+0x8/0xc
---[ end trace 3d1024e58f77f6b2 ]---
So fix the issue by checking if the policy is invalid early in
__cpufreq_get before attempting to get the current frequency.
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 0e141d1c65 upstream.
The scmi-cpufreq driver calls the arch_set_freq_scale() callback on
frequency changes to provide scale-invariant load-tracking signals to
the scheduler. However, in the slow path, it does so while specifying
the current and max frequencies in different units, hence resulting in a
broken freq_scale factor.
Fix this by passing all frequencies in KHz, as stored in the CPUFreq
frequency table.
Fixes: 99d6bdf338 (cpufreq: add support for CPU DVFS based on SCMI message protocol)
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: 4.17+ <stable@vger.kernel.org> # v4.17+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d98ccfc394 ]
Currently the ti-cpufreq driver blindly registers a 'ti-cpufreq' to force
the driver to probe on any platforms where the driver is built in.
However, this should only happen on platforms that actually can make use
of the driver. There is already functionality in place to match the
SoC compatible so let's factor this out into a separate call and
make sure we find a match before creating the ti-cpufreq platform device.
Reviewed-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Dave Gerlach <d-gerlach@ti.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 51c99dd2c0 ]
We can not call dev_pm_opp_of_cpumask_remove_table() freely anymore
since the latest OPP core updates as that uses reference counting to
free resources. There are cases where no static OPPs are added (using
DT) for a platform and trying to remove the OPP table may end up
decrementing refcount which is already zero and hence generating
warnings.
Lets track if we were able to add static OPPs or not and then only
remove the table based on that. Some reshuffling of code is also done to
do that.
Reported-by: Niklas Cassel <niklas.cassel@linaro.org>
Tested-by: Niklas Cassel <niklas.cassel@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit da5e79bc70 upstream.
If the policy limits change between invocations of cs_dbs_update(),
the requested frequency value stored in dbs_info may not be updated
and the function may use a stale value of it next time. Moreover, if
idle periods are takem into account by cs_dbs_update(), the requested
frequency value stored in dbs_info may be below the min policy limit,
which is incorrect.
To fix these problems, always update the requested frequency value
in dbs_info along with the local copy of it when the previous
requested frequency is beyond the policy limits and avoid decreasing
the requested frequency below the min policy limit when taking
idle periods into account.
Fixes: abb6627910 (cpufreq: conservative: Fix next frequency selection)
Fixes: 00bfe05889 (cpufreq: conservative: Decrease frequency faster for deferred updates)
Reported-by: Waldemar Rymarkiewicz <waldemarx.rymarkiewicz@intel.com>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Waldemar Rymarkiewicz <waldemarx.rymarkiewicz@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is currently a warning when building the Kryo cpufreq driver into
the kernel image:
WARNING: vmlinux.o(.text+0x8aa424): Section mismatch in reference from
the function qcom_cpufreq_kryo_probe() to the function
.init.text:qcom_cpufreq_kryo_get_msm_id()
The function qcom_cpufreq_kryo_probe() references
the function __init qcom_cpufreq_kryo_get_msm_id().
This is often because qcom_cpufreq_kryo_probe lacks a __init
annotation or the annotation of qcom_cpufreq_kryo_get_msm_id is wrong.
Remove the '__init' annotation from qcom_cpufreq_kryo_get_msm_id
so that there is no more mismatch warning.
Additionally, Nick noticed that the remove function was marked as
'__init' when it should really be marked as '__exit'.
Fixes: 46e2856b8e (cpufreq: Add Kryo CPU scaling driver)
Fixes: 5ad7346b4a (cpufreq: kryo: Add module remove and exit)
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 4.18+ <stable@vger.kernel.org> # 4.18+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull ARM SoC driver updates from Olof Johansson:
"Some of the larger changes this merge window:
- Removal of drivers for Exynos5440, a Samsung SoC that never saw
widespread use.
- Uniphier support for USB3 and SPI reset handling
- Syste control and SRAM drivers and bindings for Allwinner platforms
- Qualcomm AOSS (Always-on subsystem) reset controller drivers
- Raspberry Pi hwmon driver for voltage
- Mediatek pwrap (pmic) support for MT6797 SoC"
* tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (52 commits)
drivers/firmware: psci_checker: stash and use topology_core_cpumask for hotplug tests
soc: fsl: cleanup Kconfig menu
soc: fsl: dpio: Convert DPIO documentation to .rst
staging: fsl-mc: Remove remaining files
staging: fsl-mc: Move DPIO from staging to drivers/soc/fsl
staging: fsl-dpaa2: eth: move generic FD defines to DPIO
soc: fsl: qe: gpio: Add qe_gpio_set_multiple
usb: host: exynos: Remove support for Exynos5440
clk: samsung: Remove support for Exynos5440
soc: sunxi: Add the A13, A23 and H3 system control compatibles
reset: uniphier: add reset control support for SPI
cpufreq: exynos: Remove support for Exynos5440
ata: ahci-platform: Remove support for Exynos5440
soc: imx6qp: Use GENPD_FLAG_ALWAYS_ON for PU errata
soc: mediatek: pwrap: add mt6351 driver for mt6797 SoCs
soc: mediatek: pwrap: add pwrap driver for mt6797 SoCs
soc: mediatek: pwrap: fix cipher init setting error
dt-bindings: pwrap: mediatek: add pwrap support for MT6797
reset: uniphier: add USB3 core reset control
dt-bindings: reset: uniphier: add USB3 core reset support
...
Pull more power management updates from Rafael Wysocki:
"These fix the main idle loop and the menu cpuidle governor, clean up
the latter, fix a mistake in the PCI bus type's support for system
suspend and resume, fix the ondemand and conservative cpufreq
governors, address a build issue in the system wakeup framework and
make the ACPI C-states desciptions less confusing.
Specifics:
- Make the idle loop handle stopped scheduler tick correctly (Rafael
Wysocki).
- Prevent the menu cpuidle governor from letting CPUs spend too much
time in shallow idle states when it is invoked with scheduler tick
stopped and clean it up somewhat (Rafael Wysocki).
- Avoid invoking the platform firmware to make the platform enter the
ACPI S3 sleep state with suspended PCIe root ports which may
confuse the firmware and cause it to crash (Rafael Wysocki).
- Fix sysfs-related race in the ondemand and conservative cpufreq
governors which may cause the system to crash if the governor
module is removed during an update of CPU frequency limits (Henry
Willard).
- Select SRCU when building the system wakeup framework to avoid a
build issue in it (zhangyi).
- Make the descriptions of ACPI C-states vendor-neutral to avoid
confusion (Prarit Bhargava)"
* tag 'pm-4.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpuidle: menu: Handle stopped tick more aggressively
sched: idle: Avoid retaining the tick when it has been stopped
PCI / ACPI / PM: Resume all bridges on suspend-to-RAM
cpuidle: menu: Update stale polling override comment
cpufreq: governor: Avoid accessing invalid governor_data
x86/ACPI/cstate: Make APCI C1 FFH MWAIT C-state description vendor-neutral
cpuidle: menu: Fix white space
PM / sleep: wakeup: Fix build error caused by missing SRCU support
Pull powerpc updates from Michael Ellerman:
"Notable changes:
- A fix for a bug in our page table fragment allocator, where a page
table page could be freed and reallocated for something else while
still in use, leading to memory corruption etc. The fix reuses
pt_mm in struct page (x86 only) for a powerpc only refcount.
- Fixes to our pkey support. Several are user-visible changes, but
bring us in to line with x86 behaviour and/or fix outright bugs.
Thanks to Florian Weimer for reporting many of these.
- A series to improve the hvc driver & related OPAL console code,
which have been seen to cause hardlockups at times. The hvc driver
changes in particular have been in linux-next for ~month.
- Increase our MAX_PHYSMEM_BITS to 128TB when SPARSEMEM_VMEMMAP=y.
- Remove Power8 DD1 and Power9 DD1 support, neither chip should be in
use anywhere other than as a paper weight.
- An optimised memcmp implementation using Power7-or-later VMX
instructions
- Support for barrier_nospec on some NXP CPUs.
- Support for flushing the count cache on context switch on some IBM
CPUs (controlled by firmware), as a Spectre v2 mitigation.
- A series to enhance the information we print on unhandled signals
to bring it into line with other arches, including showing the
offending VMA and dumping the instructions around the fault.
Thanks to: Aaro Koskinen, Akshay Adiga, Alastair D'Silva, Alexey
Kardashevskiy, Alexey Spirkov, Alistair Popple, Andrew Donnellan,
Aneesh Kumar K.V, Anju T Sudhakar, Arnd Bergmann, Bartosz Golaszewski,
Benjamin Herrenschmidt, Bharat Bhushan, Bjoern Noetel, Boqun Feng,
Breno Leitao, Bryant G. Ly, Camelia Groza, Christophe Leroy, Christoph
Hellwig, Cyril Bur, Dan Carpenter, Daniel Klamt, Darren Stevens, Dave
Young, David Gibson, Diana Craciun, Finn Thain, Florian Weimer,
Frederic Barrat, Gautham R. Shenoy, Geert Uytterhoeven, Geoff Levand,
Guenter Roeck, Gustavo Romero, Haren Myneni, Hari Bathini, Joel
Stanley, Jonathan Neuschäfer, Kees Cook, Madhavan Srinivasan, Mahesh
Salgaonkar, Markus Elfring, Mathieu Malaterre, Mauro S. M. Rodrigues,
Michael Hanselmann, Michael Neuling, Michael Schmitz, Mukesh Ojha,
Murilo Opsfelder Araujo, Nicholas Piggin, Parth Y Shah, Paul
Mackerras, Paul Menzel, Ram Pai, Randy Dunlap, Rashmica Gupta, Reza
Arbab, Rodrigo R. Galvao, Russell Currey, Sam Bobroff, Scott Wood,
Shilpasri G Bhat, Simon Guo, Souptick Joarder, Stan Johnson, Thiago
Jung Bauermann, Tyrel Datwyler, Vaibhav Jain, Vasant Hegde, Venkat
Rao, zhong jiang"
* tag 'powerpc-4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (234 commits)
powerpc/mm/book3s/radix: Add mapping statistics
powerpc/uaccess: Enable get_user(u64, *p) on 32-bit
powerpc/mm/hash: Remove unnecessary do { } while(0) loop
powerpc/64s: move machine check SLB flushing to mm/slb.c
powerpc/powernv/idle: Fix build error
powerpc/mm/tlbflush: update the mmu_gather page size while iterating address range
powerpc/mm: remove warning about ‘type’ being set
powerpc/32: Include setup.h header file to fix warnings
powerpc: Move `path` variable inside DEBUG_PROM
powerpc/powermac: Make some functions static
powerpc/powermac: Remove variable x that's never read
cxl: remove a dead branch
powerpc/powermac: Add missing include of header pmac.h
powerpc/kexec: Use common error handling code in setup_new_fdt()
powerpc/xmon: Add address lookup for percpu symbols
powerpc/mm: remove huge_pte_offset_and_shift() prototype
powerpc/lib: Use patch_site to patch copy_32 functions once cache is enabled
powerpc/pseries: Fix endianness while restoring of r3 in MCE handler.
powerpc/fadump: merge adjacent memory ranges to reduce PT_LOAD segements
powerpc/fadump: handle crash memory ranges array index overflow
...
If cppc_cpufreq.ko is deleted at the same time that tuned-adm is
changing profiles, there is a small chance that a race can occur
between cpufreq_dbs_governor_exit() and cpufreq_dbs_governor_limits()
resulting in a system failure when the latter tries to use
policy->governor_data that has been freed by the former.
This patch uses gov_dbs_data_mutex to synchronize access.
Fixes: e788892ba3 (cpufreq: governor: Get rid of governor events)
Signed-off-by: Henry Willard <henry.willard@oracle.com>
[ rjw: Subject, minor white space adjustment ]
Cc: 4.8+ <stable@vger.kernel.org> # 4.8+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When HWP is active turbo active ratio is not used, so we should allow
policy max frequency above turbo activation ratio to be set. When HWP is
not active, then any policy max frequency above turbo activation ratio
can result upto max one-core turbo frequency.
This fix helps better thermal control in turbo region when other methods
like "Running Average Power Limit" is not available to use.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Dynamic boosting of HWP performance on IO wake showed significant
improvement to IO workloads. This series was intended for Skylake Xeon
platforms only and feature was enabled by default based on CPU model
number.
But some Xeon platforms reused the Skylake desktop CPU model number. This
caused some undesirable side effects to some graphics workloads. Since
they are heavily IO bound, the increase in CPU performance decreased the
power available for GPU to do its computing and hence decrease in graphics
benchmark performance.
For example on a Skylake desktop, GpuTest benchmark showed average FPS
reduction from 529 to 506.
This change makes sure that HWP boost feature is only enabled for Skylake
server platforms by using ACPI FADT preferred PM Profile. If some desktop
users wants to get benefit of boost, they can still enable boost from
intel_pstate sysfs attribute "hwp_dynamic_boost".
Fixes: 41ab43c9c8 (cpufreq: intel_pstate: enable boost for Skylake Xeon)
Link: https://bugs.freedesktop.org/show_bug.cgi?id=107410
Reported-by: Eero Tamminen <eero.t.tamminen@intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
With lockdep turned on, the following circular lock dependency problem
was reported:
[ 57.470040] ======================================================
[ 57.502900] WARNING: possible circular locking dependency detected
[ 57.535208] 4.18.0-0.rc3.1.el8+7.x86_64+debug #1 Tainted: G
[ 57.577761] ------------------------------------------------------
[ 57.609714] tuned/1505 is trying to acquire lock:
[ 57.633808] 00000000559deec5 (cpu_hotplug_lock.rw_sem){++++}, at: store+0x27/0x120
[ 57.672880]
[ 57.672880] but task is already holding lock:
[ 57.702184] 000000002136ca64 (kn->count#118){++++}, at: kernfs_fop_write+0x1d0/0x410
[ 57.742176]
[ 57.742176] which lock already depends on the new lock.
[ 57.742176]
[ 57.785220]
[ 57.785220] the existing dependency chain (in reverse order) is:
:
[ 58.932512] other info that might help us debug this:
[ 58.932512]
[ 58.973344] Chain exists of:
[ 58.973344] cpu_hotplug_lock.rw_sem --> subsys mutex#5 --> kn->count#118
[ 58.973344]
[ 59.030795] Possible unsafe locking scenario:
[ 59.030795]
[ 59.061248] CPU0 CPU1
[ 59.085377] ---- ----
[ 59.108160] lock(kn->count#118);
[ 59.124935] lock(subsys mutex#5);
[ 59.156330] lock(kn->count#118);
[ 59.186088] lock(cpu_hotplug_lock.rw_sem);
[ 59.208541]
[ 59.208541] *** DEADLOCK ***
In the cpufreq_register_driver() function, the lock sequence is:
cpus_read_lock --> kn->count
For the cpufreq sysfs store method, the lock sequence is:
kn->count --> cpus_read_lock
These sequences are actually safe as they are taking a share lock on
cpu_hotplug_lock. However, the current lockdep code doesn't check for
share locking when detecting circular lock dependency. Fixing that
could be a substantial effort.
Instead, we can work around this problem by using cpus_read_trylock()
in the store method which is much simpler. The chance of not getting
the read lock is very small. If that happens, the userspace application
that writes the sysfs file will get an error.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
systrace used for tracing for Android systems has carried a patch for
many years in the Android tree that traces when the cpufreq limits
change. With the help of this information, systrace can know when the
policy limits change and can visually display the data. Lets add
upstream support for the same.
Signed-off-by: Ruchi Kandoi <kandoiruchi@google.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Exynos5440 drivers removal
The Exynos5440 (quad-core A15 with GMAC, PCIe, SATA) was targeting
server platforms but it did not make it to the market really. There are
no development boards with it and probably there are no real products
neither. The development for Exynos5440 ended in 2013 and since then
the platform is in maintenance mode.
Removing Exynos5440 makes our life slightly easier: less maintenance,
smaller code, reduced number of quirks, no need to preserve DTB
backward-compatibility.
The Device Tree sources and some of the drivers for Exynos5440 were
already removed. This removes remaining drivers.
* tag 'samsung-drivers-exynos5440-4.19' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux:
usb: host: exynos: Remove support for Exynos5440
clk: samsung: Remove support for Exynos5440
cpufreq: exynos: Remove support for Exynos5440
ata: ahci-platform: Remove support for Exynos5440
Signed-off-by: Olof Johansson <olof@lixom.net>
On HWP platforms with Turbo 3.0, the HWP capability max ratio shows the
maximum ratio of that core, which can be different than other cores. If
we show the correct maximum frequency in cpufreq sysfs via
cpuinfo_max_freq and scaling_max_freq then, user can know which cores
can run faster for pinning some high priority tasks.
Currently the max turbo frequency is shown as max frequency, which is
the max of all cores, even if some cores can't reach that frequency
even for single threaded workload.
But it is possible that max ratio in HWP capabilities is set as 0xFF or
some high invalid value (E.g. One KBL NUC). Since the actual performance
can never exceed 1 core turbo frequency from MSR TURBO_RATIO_LIMIT, we
use this as a bound check.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently, intel_pstate doesn't register if _PSS is not present on
HP Proliant systems, because it expects the firmware to take over
CPU performance scaling in that case. However, if ACPI PCCH is
present, the firmware expects the kernel to use it for CPU
performance scaling and the pcc-cpufreq driver is loaded for that.
Unfortunately, the firmware interface used by that driver is not
scalable for fundamental reasons, so pcc-cpufreq is way suboptimal
on systems with more than just a few CPUs. In fact, it is better to
avoid using it at all.
For this reason, modify intel_pstate to look for ACPI PCCH if _PSS
is not present and register if it is there. Also prevent the
pcc-cpufreq driver from trying to initialize itself if intel_pstate
has been registered already.
Fixes: fbbcdc0744 (intel_pstate: skip the driver if ACPI has power mgmt option)
Reported-by: Andreas Herrmann <aherrmann@suse.com>
Reviewed-by: Andreas Herrmann <aherrmann@suse.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Andreas Herrmann <aherrmann@suse.com>
Cc: 4.16+ <stable@vger.kernel.org> # 4.16+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The firmware interface used by the pcc-cpufreq driver is
fundamentally not scalable and using it for dynamic CPU performance
scaling on systems with many CPUs leads to degraded performance.
For this reason, disable dynamic CPU performance scaling on systems
with pcc-cpufreq where the number of CPUs present at the driver init
time is greater than 4. Also make the driver print corresponding
complaints to the kernel log.
Reported-by: Andreas Herrmann <aherrmann@suse.com>
Tested-by: Andreas Herrmann <aherrmann@suse.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Per Section 8.4.7.1.3 of ACPI 6.2, the platform provides performance
feedback via set of performance counters. To determine the actual
performance level delivered over time, OSPM may read a set of
performance counters from the Reference Performance Counter Register
and the Delivered Performance Counter Register.
OSPM calculates the delivered performance over a given time period by
taking a beginning and ending snapshot of both the reference and
delivered performance counters, and calculating:
delivered_perf = reference_perf X (delta of delivered_perf counter / delta of reference_perf counter).
Implement the above and hook this up to the cpufreq->get method.
Signed-off-by: George Cherian <george.cherian@cavium.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Prashanth Prakash <pprakash@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The Exynos5440 is not actively developed, there are no development
boards available and probably there are no real products with it.
Remove wide-tree support for Exynos5440.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>
Reviewed-by: Rob Herring <robh@kernel.org>
POWER9 does not support global pstate requests for the chip. So remove
the timer logic which slowly ramps down the global pstate in P9
platforms.
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
[mpe: Drop NULL check before kfree(policy->driver_data)]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The cooling device should be part of the i.MX cpufreq driver, but it
cannot be removed for the sake of DT stability. So turn the cooling
device registration into a separate function and perform the
registration only if the CPU OF node does not have the #cooling-cells
property.
Use of_cpufreq_power_cooling_register in imx_thermal code to link the
cooling device to the device tree node provided.
This makes it possible to bind the cpufreq cooling device to a custom
thermal zone via a cooling-maps entry like:
cooling-maps {
map0 {
trip = <&board_alert>;
cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
};
};
Assuming a cpu node exists with label "cpu0" and #cooling-cells
property.
Signed-off-by: Bastian Stender <bst@pengutronix.de>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We should return if get_cpu_device() fails or it leads to a NULL
dereference. Also dev_pm_opp_of_get_opp_desc_node() returns NULL on
error, it never returns error pointers.
Fixes: 46e2856b8e (cpufreq: Add Kryo CPU scaling driver)
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When scaling max/min settings are changed, internally they are converted
to a ratio using the max turbo 1 core turbo frequency. This works fine
when 1 core max is same irrespective of the core. But under Turbo 3.0,
this will not be the case. For example:
Core 0: max turbo pstate: 43 (4.3GHz)
Core 1: max turbo pstate: 45 (4.5GHz)
In this case 1 core turbo ratio will be maximum of all, so it will be
45 (4.5GHz). Suppose scaling max is set to 4GHz (ratio 40) for all cores
,then on core one it will be
= max_state * policy->max / max_freq;
= 43 * (4000000/4500000) = 38 (3.8GHz)
= 38
which is 200MHz less than the desired.
On core2, it will be correctly set to ratio 40 (4GHz). Same holds true
for scaling min frequency limit. So this requires usage of correct turbo
max frequency for core one, which in this case is 4.3GHz. So we need to
adjust per CPU cpu->pstate.turbo_freq using the maximum HWP ratio of that
core.
This change uses the HWP capability of a core to adjust max turbo
frequency. But since Broadwell HWP doesn't use ratios in the HWP
capabilities, we have to use legacy max 1 core turbo ratio. This is not
a problem as the HWP capabilities don't differ among cores in Broadwell.
We need to check for non Broadwell CPU model for applying this change,
though.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 4.6+ <stable@vger.kernel.org> # 4.6+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add device remove and module exit code to make the driver
functioning as a loadable module.
Fixes: ac28927659 (cpufreq: kryo: allow building as a loadable module)
Signed-off-by: Ilia Lin <ilia.lin@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In event of error returned by the nvmem_cell_read() non-pointer value
may be dereferenced. Fix this with error handling.
Additionally free the allocated speedbin buffer, as per the API.
Fixes: 9ce36edd1a52 (cpufreq: Add Kryo CPU scaling driver)
Signed-off-by: Ilia Lin <ilia.lin@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull more power management updates from Rafael Wysocki:
"These revert a recent PM core change that introduced a regression, fix
the build when the recently added Kryo cpufreq driver is selected, add
support for devices attached to multiple power domains to the generic
power domains (genpd) framework, add support for iowait boosting on
systens with hardware-managed P-states (HWP) enabled to the
intel_pstate driver, modify the behavior of the wakeup_count device
attribute in sysfs, fix a few issues and clean up some ugliness,
mostly in cpufreq (core and drivers) and in the cpupower utility.
Specifics:
- Revert a recent PM core change that attempted to fix an issue
related to device links, but introduced a regression (Rafael
Wysocki)
- Fix build when the recently added cpufreq driver for Kryo
processors is selected by making it possible to build that driver
as a module (Arnd Bergmann)
- Fix the long idle detection mechanism in the out-of-band (ondemand
and conservative) cpufreq governors (Chen Yu)
- Add support for devices in multiple power domains to the generic
power domains (genpd) framework (Ulf Hansson)
- Add support for iowait boosting on systems with hardware-managed
P-states (HWP) enabled to the intel_pstate driver and make it use
that feature on systems with Skylake Xeon processors as it is
reported to improve performance significantly on those systems
(Srinivas Pandruvada)
- Fix and update the acpi_cpufreq, ti-cpufreq and imx6q cpufreq
drivers (Colin Ian King, Suman Anna, Sébastien Szymanski)
- Change the behavior of the wakeup_count device attribute in sysfs
to expose the number of events when the device might have aborted
system suspend in progress (Ravi Chandra Sadineni)
- Fix two minor issues in the cpupower utility (Abhishek Goel, Colin
Ian King)"
* tag 'pm-4.18-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "PM / runtime: Fixup reference counting of device link suppliers at probe"
cpufreq: imx6q: check speed grades for i.MX6ULL
cpufreq: governors: Fix long idle detection logic in load calculation
cpufreq: intel_pstate: enable boost for Skylake Xeon
PM / wakeup: Export wakeup_count instead of event_count via sysfs
PM / Domains: Add dev_pm_domain_attach_by_id() to manage multi PM domains
PM / Domains: Add support for multi PM domains per device to genpd
PM / Domains: Split genpd_dev_pm_attach()
PM / Domains: Don't attach devices in genpd with multi PM domains
PM / Domains: dt: Allow power-domain property to be a list of specifiers
cpufreq: intel_pstate: New sysfs entry to control HWP boost
cpufreq: intel_pstate: HWP boost performance on IO wakeup
cpufreq: intel_pstate: Add HWP boost utility and sched util hooks
cpufreq: ti-cpufreq: Use devres managed API in probe()
cpufreq: ti-cpufreq: Fix an incorrect error return value
cpufreq: ACPI: make function acpi_cpufreq_fast_switch() static
cpufreq: kryo: allow building as a loadable module
cpupower : Fix header name to read idle state name
cpupower: fix spelling mistake: "logilename" -> "logfilename"
Pull more overflow updates from Kees Cook:
"The rest of the overflow changes for v4.18-rc1.
This includes the explicit overflow fixes from Silvio, further
struct_size() conversions from Matthew, and a bug fix from Dan.
But the bulk of it is the treewide conversions to use either the
2-factor argument allocators (e.g. kmalloc(a * b, ...) into
kmalloc_array(a, b, ...) or the array_size() macros (e.g. vmalloc(a *
b) into vmalloc(array_size(a, b)).
Coccinelle was fighting me on several fronts, so I've done a bunch of
manual whitespace updates in the patches as well.
Summary:
- Error path bug fix for overflow tests (Dan)
- Additional struct_size() conversions (Matthew, Kees)
- Explicitly reported overflow fixes (Silvio, Kees)
- Add missing kvcalloc() function (Kees)
- Treewide conversions of allocators to use either 2-factor argument
variant when available, or array_size() and array3_size() as needed
(Kees)"
* tag 'overflow-v4.18-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (26 commits)
treewide: Use array_size in f2fs_kvzalloc()
treewide: Use array_size() in f2fs_kzalloc()
treewide: Use array_size() in f2fs_kmalloc()
treewide: Use array_size() in sock_kmalloc()
treewide: Use array_size() in kvzalloc_node()
treewide: Use array_size() in vzalloc_node()
treewide: Use array_size() in vzalloc()
treewide: Use array_size() in vmalloc()
treewide: devm_kzalloc() -> devm_kcalloc()
treewide: devm_kmalloc() -> devm_kmalloc_array()
treewide: kvzalloc() -> kvcalloc()
treewide: kvmalloc() -> kvmalloc_array()
treewide: kzalloc_node() -> kcalloc_node()
treewide: kzalloc() -> kcalloc()
treewide: kmalloc() -> kmalloc_array()
mm: Introduce kvcalloc()
video: uvesafb: Fix integer overflow in allocation
UBIFS: Fix potential integer overflow in allocation
leds: Use struct_size() in allocation
Convert intel uncore to struct_size
...
Pull ARM SoC driver updates from Olof Johansson:
"This contains platform-related driver updates for ARM and ARM64.
Highlights:
- ARM SCMI (System Control & Management Interface) driver cleanups
- Hisilicon support for LPC bus w/ ACPI
- Reset driver updates for several platforms: Uniphier,
- Rockchip power domain bindings and hardware descriptions for
several SoCs.
- Tegra memory controller reset improvements"
* tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (59 commits)
ARM: tegra: fix compile-testing PCI host driver
soc: rockchip: power-domain: add power domain support for px30
dt-bindings: power: add binding for px30 power domains
dt-bindings: power: add PX30 SoCs header for power-domain
soc: rockchip: power-domain: add power domain support for rk3228
dt-bindings: power: add binding for rk3228 power domains
dt-bindings: power: add RK3228 SoCs header for power-domain
soc: rockchip: power-domain: add power domain support for rk3128
dt-bindings: power: add binding for rk3128 power domains
dt-bindings: power: add RK3128 SoCs header for power-domain
soc: rockchip: power-domain: add power domain support for rk3036
dt-bindings: power: add binding for rk3036 power domains
dt-bindings: power: add RK3036 SoCs header for power-domain
dt-bindings: memory: tegra: Remove Tegra114 SATA and AFI reset definitions
memory: tegra: Remove Tegra114 SATA and AFI reset definitions
memory: tegra: Register SMMU after MC driver became ready
soc: mediatek: remove unneeded semicolon
soc: mediatek: add a fixed wait for SRAM stable
soc: mediatek: introduce a CAPS flag for scp_domain_data
soc: mediatek: reuse regmap_read_poll_timeout helpers
...
According to current code implementation, detecting the long
idle period is done by checking if the interval between two
adjacent utilization update handlers is long enough. Although
this mechanism can detect if the idle period is long enough
(no utilization hooks invoked during idle period), it might
not cover a corner case: if the task has occupied the CPU
for too long which causes no context switches during that
period, then no utilization handler will be launched until this
high prio task is scheduled out. As a result, the idle_periods
field might be calculated incorrectly because it regards the
100% load as 0% and makes the conservative governor who uses
this field confusing.
Change the detection to compare the idle_time with sampling_rate
directly.
Reported-by: Artem S. Tashkinov <t.artem@mailcity.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>