[ Upstream commit 38b16d6cfe ]
As the potential failure of the allocation, kmemdup() may return NULL.
Then, 'bin_attr_data_vault.private' will be NULL, but
'bin_attr_data_vault.size' is not 0, which is not consistent.
Therefore, it is better to check the return value of kmemdup() to
avoid the confusion.
Fixes: 0ba13c763a ("thermal/int340x_thermal: Export GDDV")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4cf2ddf16e ]
Starting with commit d92ed2c9d3 ("thermal: imx: Use driver's local
data to decide whether to run a measurement") this driver stared using
irq_enabled flag to make decision to power on/off the thermal
core. This triggered a regression, where after reaching critical
temperature, alarm IRQ handler set irq_enabled to false, disabled
thermal core and was not able read temperature and disable cooling
sequence.
In case the cooling device is "CPU/GPU freq", the system will run with
reduce performance until next reboot.
To solve this issue, we need to move all parts implementing hand made
runtime power management and let it handle actual runtime PM framework.
Fixes: d92ed2c9d3 ("thermal: imx: Use driver's local data to decide whether to run a measurement")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Tested-by: Petr Beneš <petr.benes@ysoft.com>
Link: https://lore.kernel.org/r/20211117103426.81813-1-o.rempel@pengutronix.de
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 99b63316c3 ]
During the suspend is in process, thermal_zone_device_update bails out
thermal zone re-evaluation for any sensor trip violation without
setting next valid trip to that sensor. It assumes during resume
it will re-evaluate same thermal zone and update trip. But when it is
in suspend temperature goes down and on resume path while updating
thermal zone if temperature is less than previously violated trip,
thermal zone set trip function evaluates the same previous high and
previous low trip as new high and low trip. Since there is no change
in high/low trip, it bails out from thermal zone set trip API without
setting any trip. It leads to a case where sensor high trip or low
trip is disabled forever even though thermal zone has a valid high
or low trip.
During thermal zone device init, reset thermal zone previous high
and low trip. It resolves above mentioned scenario.
Signed-off-by: Manaf Meethalavalappu Pallikunhi <manafm@codeaurora.org>
Reviewed-by: Thara Gopinath <thara.gopinath@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 96cfe05051 upstream.
of_parse_thermal_zones() parses the thermal-zones node and registers a
thermal_zone device for each subnode. However, if a thermal zone is
consuming a thermal sensor and that thermal sensor device hasn't probed
yet, an attempt to set trip_point_*_temp for that thermal zone device
can cause a NULL pointer dereference. Fix it.
console:/sys/class/thermal/thermal_zone87 # echo 120000 > trip_point_0_temp
...
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
...
Call trace:
of_thermal_set_trip_temp+0x40/0xc4
trip_point_temp_store+0xc0/0x1dc
dev_attr_store+0x38/0x88
sysfs_kf_write+0x64/0xc0
kernfs_fop_write_iter+0x108/0x1d0
vfs_write+0x2f4/0x368
ksys_write+0x7c/0xec
__arm64_sys_write+0x20/0x30
el0_svc_common.llvm.7279915941325364641+0xbc/0x1bc
do_el0_svc+0x28/0xa0
el0_svc+0x14/0x24
el0_sync_handler+0x88/0xec
el0_sync+0x1c0/0x200
While at it, fix the possible NULL pointer dereference in other
functions as well: of_thermal_get_temp(), of_thermal_set_emul_temp(),
of_thermal_get_trend().
Suggested-by: David Collins <quic_collinsd@quicinc.com>
Signed-off-by: Subbaraman Narayanamurthy <quic_subbaram@quicinc.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8b4bd25667 upstream.
After upgrading to Linux 5.13.3 I noticed my laptop would shutdown due
to overheat (when it should not). It turned out this was due to commit
fe6a6de669 ("thermal/drivers/int340x/processor_thermal: Fix tcc setting").
What happens is this drivers uses a global variable to keep track of the
tcc offset (tcc_offset_save) and uses it on resume. The issue is this
variable is initialized to 0, but is only set in
tcc_offset_degree_celsius_store, i.e. when the tcc offset is explicitly
set by userspace. If that does not happen, the resume path will set the
offset to 0 (in my case the h/w default being 3, the offset would become
too low after a suspend/resume cycle).
The issue did not arise before commit fe6a6de669, as the function
setting the offset would return if the offset was 0. This is no longer
the case (rightfully).
Fix this by not applying the offset if it wasn't saved before, reverting
back to the old logic. A better approach will come later, but this will
be easier to apply to stable kernels.
The logic to restore the offset after a resume was there long before
commit fe6a6de669, but as a value of 0 was considered invalid I'm
referencing the commit that made the issue possible in the Fixes tag
instead.
Fixes: fe6a6de669 ("thermal/drivers/int340x/processor_thermal: Fix tcc setting")
Cc: stable@vger.kernel.org
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Srinivas Pandruvada <srinivas.pI andruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20210909085613.5577-2-atenart@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1bb30b20b4 ]
After printing the list of thermal governors, then this function prints
a newline character. The problem is that "size" has not been updated
after printing the last governor. This means that it can write one
character (the NUL terminator) beyond the end of the buffer.
Get rid of the "size" variable and just use "PAGE_SIZE - count" directly.
Fixes: 1b4f48494e ("thermal: core: group functions related to governor handling")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210916131342.GB25094@kili
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5e5c9f9a75 ]
Zone device is enabled after thermal_zone_of_sensor_register() completion,
but it's not disabled before senor is unregistered, leaving temperature
polling active. This results in accessing a disabled zone device and
produces a warning about this problem. Stop zone device before
unregistering it in order to fix this "use-after-free" problem.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210616190417.32214-3-digetx@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3ae5950db6 ]
With -Wshadow:
drivers/thermal/rcar_gen3_thermal.c: In function ‘rcar_gen3_thermal_probe’:
drivers/thermal/rcar_gen3_thermal.c:310:13: warning: declaration of ‘rcar_gen3_ths_tj_1’ shadows a global declaration [-Wshadow]
310 | const int *rcar_gen3_ths_tj_1 = of_device_get_match_data(dev);
| ^~~~~~~~~~~~~~~~~~
drivers/thermal/rcar_gen3_thermal.c:246:18: note: shadowed declaration is here
246 | static const int rcar_gen3_ths_tj_1 = 126;
| ^~~~~~~~~~~~~~~~~~
To add to the confusion, the local variable has a different type.
Fix the shadowing by renaming the local variable to ths_tj_1.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/9ea7e65d0331daba96f9a7925cb3d12d2170efb1.1623076804.git.geert+renesas@glider.be
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2ad8ccc17d ]
The thermal pressure signal gives information to the scheduler about
reduced CPU capacity due to thermal. It is based on a value stored in
a per-cpu 'thermal_pressure' variable. The online CPUs will get the
new value there, while the offline won't. Unfortunately, when the CPU
is back online, the value read from per-cpu variable might be wrong
(stale data). This might affect the scheduler decisions, since it
sees the CPU capacity differently than what is actually available.
Fix it by making sure that all online+offline CPUs would get the
proper value in their per-cpu variable when thermal framework sets
capping.
Fixes: f12e4f66ab ("thermal/cpu-cooling: Update thermal pressure in case of a maximum frequency capping")
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lore.kernel.org/r/20210614191030.22241-1-lukasz.luba@arm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit eb8500b874 upstream.
After commit 81ad4276b5 ("Thermal: Ignore invalid trip points") all
user_space governor notifications via RW trip point is broken in intel
thermal drivers. This commits marks trip_points with value of 0 during
call to thermal_zone_device_register() as invalid. RW trip points can be
0 as user space will set the correct trip temperature later.
During driver init, x86_package_temp and all int340x drivers sets RW trip
temperature as 0. This results in all these trips marked as invalid by
the thermal core.
To fix this initialize RW trips to THERMAL_TEMP_INVALID instead of 0.
Cc: <stable@vger.kernel.org>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210430122343.1789899-1-srinivas.pandruvada@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 34ab17cc6c upstream.
Slab OOB issue is scanned by KASAN in cpu_power_to_freq().
If power is limited below the power of OPP0 in EM table,
it will cause slab out-of-bound issue with negative array
index.
Return the lowest frequency if limited power cannot found
a suitable OPP in EM table to fix this issue.
Backtrace:
[<ffffffd02d2a37f0>] die+0x104/0x5ac
[<ffffffd02d2a5630>] bug_handler+0x64/0xd0
[<ffffffd02d288ce4>] brk_handler+0x160/0x258
[<ffffffd02d281e5c>] do_debug_exception+0x248/0x3f0
[<ffffffd02d284488>] el1_dbg+0x14/0xbc
[<ffffffd02d75d1d4>] __kasan_report+0x1dc/0x1e0
[<ffffffd02d75c2e0>] kasan_report+0x10/0x20
[<ffffffd02d75def8>] __asan_report_load8_noabort+0x18/0x28
[<ffffffd02e6fce5c>] cpufreq_power2state+0x180/0x43c
[<ffffffd02e6ead80>] power_actor_set_power+0x114/0x1d4
[<ffffffd02e6fac24>] allocate_power+0xaec/0xde0
[<ffffffd02e6f9f80>] power_allocator_throttle+0x3ec/0x5a4
[<ffffffd02e6ea888>] handle_thermal_trip+0x160/0x294
[<ffffffd02e6edd08>] thermal_zone_device_check+0xe4/0x154
[<ffffffd02d351cb4>] process_one_work+0x5e4/0xe28
[<ffffffd02d352f44>] worker_thread+0xa4c/0xfac
[<ffffffd02d360124>] kthread+0x33c/0x358
[<ffffffd02d289940>] ret_from_fork+0xc/0x18
Fixes: 371a3bc79c ("thermal/drivers/cpufreq_cooling: Fix wrong frequency converted from power")
Signed-off-by: brian-sy yang <brian-sy.yang@mediatek.com>
Signed-off-by: Michael Kao <michael.kao@mediatek.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Cc: stable@vger.kernel.org #v5.7
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20201229050831.19493-1-michael.kao@mediatek.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2046a24ae1 ]
There is a possible chance that some cooling device stats buffer
allocation fails due to very high cooling device max state value.
Later cooling device update sysfs can try to access stats data
for the same cooling device. It will lead to NULL pointer
dereference issue.
Add a NULL pointer check before accessing thermal cooling device
stats data. It fixes the following bug
[ 26.812833] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000004
[ 27.122960] Call trace:
[ 27.122963] do_raw_spin_lock+0x18/0xe8
[ 27.122966] _raw_spin_lock+0x24/0x30
[ 27.128157] thermal_cooling_device_stats_update+0x24/0x98
[ 27.128162] cur_state_store+0x88/0xb8
[ 27.128166] dev_attr_store+0x40/0x58
[ 27.128169] sysfs_kf_write+0x50/0x68
[ 27.133358] kernfs_fop_write+0x12c/0x1c8
[ 27.133362] __vfs_write+0x54/0x160
[ 27.152297] vfs_write+0xcc/0x188
[ 27.157132] ksys_write+0x78/0x108
[ 27.162050] ksys_write+0xf8/0x108
[ 27.166968] __arm_smccc_hvc+0x158/0x4b0
[ 27.166973] __arm_smccc_hvc+0x9c/0x4b0
[ 27.186005] el0_svc+0x8/0xc
Signed-off-by: Manaf Meethalavalappu Pallikunhi <manafm@codeaurora.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/1607367181-24589-1-git-send-email-manafm@codeaurora.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit a51afb1331 upstream.
freq_qos_update_request() returns 1 if the effective constraint value
has changed, 0 if the effective constraint value has not changed, or a
negative error code on failures.
The frequency constraints for CPUs can be set by different parts of the
kernel. If the maximum frequency constraint set by other parts of the
kernel are set at a lower value than the one corresponding to cooling
state 0, then we will never be able to cool down the system as
freq_qos_update_request() will keep on returning 0 and we will skip
updating cpufreq_state and thermal pressure.
Fix that by doing the updates even in the case where
freq_qos_update_request() returns 0, as we have effectively set the
constraint to a new value even if the consolidated value of the
actual constraint is unchanged because of external factors.
Cc: v5.7+ <stable@vger.kernel.org> # v5.7+
Reported-by: Thara Gopinath <thara.gopinath@linaro.org>
Fixes: f12e4f66ab ("thermal/cpu-cooling: Update thermal pressure in case of a maximum frequency capping")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Thara Gopinath<thara.gopinath@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/b2b7e84944937390256669df5a48ce5abba0c1ef.1613540713.git.viresh.kumar@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Step wise governor increases the mitigation level when the temperature
goes above a threshold and will decrease the mitigation when the
temperature falls below the threshold. If it were a case, where the
temperature hovers around a threshold, the mitigation will be applied
and removed at every iteration. This reaction to the temperature is
inefficient for performance.
The use of hysteresis temperature could avoid this ping-pong of
mitigation by relaxing the mitigation to happen only when the
temperature goes below this lower hysteresis value.
Signed-off-by: Ram Chandrasekar <rkumbako@codeaurora.org>
Signed-off-by: Lina Iyer <ilina@codeaurora.org>
It has been observed that on OMAP4430 (ES2.0, ES2.1 and ES2.3) the enabled
notifier causes errors on the DTEMP readout values:
ti-soc-thermal 4a002260.bandgap: in range ADC val: 52
ti-soc-thermal 4a002260.bandgap: in range ADC val: 64
ti-soc-thermal 4a002260.bandgap: in range ADC val: 64
ti-soc-thermal 4a002260.bandgap: out of range ADC val: 0
thermal thermal_zone0: failed to read out thermal zone (-5)
ti-soc-thermal 4a002260.bandgap: out of range ADC val: 0
thermal thermal_zone0: failed to read out thermal zone (-5)
ti-soc-thermal 4a002260.bandgap: out of range ADC val: 4
thermal thermal_zone0: failed to read out thermal zone (-5)
ti-soc-thermal 4a002260.bandgap: in range ADC val: 100
raw 100 translates to 133 Celsius on omap4-sdp, triggering shutdown due to
critical temperature.
When the notifier is disable for OMAP4430 the DTEMP values are stable:
ti-soc-thermal 4a002260.bandgap: in range ADC val: 56
ti-soc-thermal 4a002260.bandgap: in range ADC val: 56
ti-soc-thermal 4a002260.bandgap: in range ADC val: 57
ti-soc-thermal 4a002260.bandgap: in range ADC val: 57
ti-soc-thermal 4a002260.bandgap: in range ADC val: 56
Fixes: 5093402e5b ("thermal: ti-soc-thermal: Enable addition power management")
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Tested-by: Tony Lindgren <tony@atomide.com>
Acked-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20201029100335.27665-1-peter.ujfalusi@ti.com
Pull thermal updates from Daniel Lezcano:
- Fix Kconfig typo "acces" -> "access" (Colin Ian King)
- Use dev_error_probe() to simplify the error handling on imx and imx8
platforms (Anson Huang)
- Use dedicated kobj_to_dev() instead of container_of() in the sysfs
core code (Tian Tao)
- Fix coding style by adding braces to a one line conditional statement
on rcar (Geert Uytterhoeven)
- Add DT binding documentation for the r8a774e1 platform and update the
Kconfig description supporting RZ/G2 SoCs (Lad Prabhakar)
- Simplify the return expression of stm_thermal_prepare on the stm32
platform (Qinglang Miao)
- Fix the unit in the function documentation for the idle injection
cooling device (Zhuguang Qing)
- Remove an unecessary mutex_init() in the core code (Qinglang Miao)
- Add support for keep alive events in the core code and the specific
int340x (Srinivas Pandruvada)
- Remove unused thermal zone variable in devfreq and cpufreq cooling
devices (Zhuguang Qing)
- Add the A100's THS controller support (Yangtao Li)
- Add power management on the omap3's bandgap sensor (Adam Ford)
- Fix a missing nlmsg_free in the netlink core error path (Jing
Xiangfeng)
* tag 'thermal-v5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux:
thermal: core: Adding missing nlmsg_free() in thermal_genl_sampling_temp()
thermal: ti-soc-thermal: Enable addition power management
thermal: sun8i: Add A100's THS controller support
thermal: sun8i: add TEMP_CALIB_MASK for calibration data in sun50i_h6_ths_calibrate
dt-bindings: thermal: sun8i: Add binding for A100's THS controller
thermal: cooling: Remove unused variable *tz
thermal: int340x: Add keep alive response method
thermal: core: Add new event for sending keep alive notifications
thermal: int340x: Provide notification for OEM variable change
thermal: core: remove unnecessary mutex_init()
thermal/idle_inject: Fix comment of idle_duration_us and name of latency_ns
thermal: Kconfig: Update description for RCAR_GEN3_THERMAL config
thermal: stm32: simplify the return expression of stm_thermal_prepare()
dt-bindings: thermal: rcar-gen3-thermal: Add r8a774e1 support
thermal: rcar_thermal: Add missing braces to conditional statement
thermal: Use kobj_to_dev() instead of container_of()
thermal: imx8mm: Use dev_err_probe() to simplify error handling
thermal: imx: Use dev_err_probe() to simplify error handling
drivers: thermal: Kconfig: fix spelling mistake "acces" -> "access"