cti_enable_hw() and cti_disable_hw() are called from an atomic context
so shouldn't use runtime PM because it can result in a sleep when
communicating with firmware.
Since commit 3c66563378 ("Revert "firmware: arm_scmi: Add clock
management to the SCMI power domain""), this causes a hang on Juno when
running the Perf Coresight tests or running this command:
perf record -e cs_etm//u -- ls
This was also missed until the revert commit because pm_runtime_put()
was called with the wrong device until commit 692c9a499b ("coresight:
cti: Correct the parameter for pm_runtime_put")
With lock and scheduler debugging enabled the following is output:
coresight cti_sys0: cti_enable_hw -- dev:cti_sys0 parent: 20020000.cti
BUG: sleeping function called from invalid context at drivers/base/power/runtime.c:1151
in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 330, name: perf-exec
preempt_count: 2, expected: 0
RCU nest depth: 0, expected: 0
INFO: lockdep is turned off.
irq event stamp: 0
hardirqs last enabled at (0): [<0000000000000000>] 0x0
hardirqs last disabled at (0): [<ffff80000822b394>] copy_process+0xa0c/0x1948
softirqs last enabled at (0): [<ffff80000822b394>] copy_process+0xa0c/0x1948
softirqs last disabled at (0): [<0000000000000000>] 0x0
CPU: 3 PID: 330 Comm: perf-exec Not tainted 6.0.0-00053-g042116d99298 #7
Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Sep 13 2022
Call trace:
dump_backtrace+0x134/0x140
show_stack+0x20/0x58
dump_stack_lvl+0x8c/0xb8
dump_stack+0x18/0x34
__might_resched+0x180/0x228
__might_sleep+0x50/0x88
__pm_runtime_resume+0xac/0xb0
cti_enable+0x44/0x120
coresight_control_assoc_ectdev+0xc0/0x150
coresight_enable_path+0xb4/0x288
etm_event_start+0x138/0x170
etm_event_add+0x48/0x70
event_sched_in.isra.122+0xb4/0x280
merge_sched_in+0x1fc/0x3d0
visit_groups_merge.constprop.137+0x16c/0x4b0
ctx_sched_in+0x114/0x1f0
perf_event_sched_in+0x60/0x90
ctx_resched+0x68/0xb0
perf_event_exec+0x138/0x508
begin_new_exec+0x52c/0xd40
load_elf_binary+0x6b8/0x17d0
bprm_execve+0x360/0x7f8
do_execveat_common.isra.47+0x218/0x238
__arm64_sys_execve+0x48/0x60
invoke_syscall+0x4c/0x110
el0_svc_common.constprop.4+0xfc/0x120
do_el0_svc+0x34/0xc0
el0_svc+0x40/0x98
el0t_64_sync_handler+0x98/0xc0
el0t_64_sync+0x170/0x174
Fix the issue by removing the runtime PM calls completely. They are not
needed here because it must have already been done when building the
path for a trace.
Fixes: 835d722ba1 ("coresight: cti: Initial CoreSight CTI Driver")
Cc: stable <stable@kernel.org>
Reported-by: Aishwarya TCV <Aishwarya.TCV@arm.com>
Reported-by: Cristian Marussi <Cristian.Marussi@arm.com>
Suggested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: James Clark <james.clark@arm.com>
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Tested-by: Mike Leach <mike.leach@linaro.org>
[ Fix build warnings ]
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20221025131032.1149459-1-suzuki.poulose@arm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
cti_enable_hw() and cti_disable_hw() are called from an atomic context
so shouldn't use runtime PM because it can result in a sleep when
communicating with firmware.
Since commit 3c66563378 ("Revert "firmware: arm_scmi: Add clock
management to the SCMI power domain""), this causes a hang on Juno when
running the Perf Coresight tests or running this command:
perf record -e cs_etm//u -- ls
This was also missed until the revert commit because pm_runtime_put()
was called with the wrong device until commit 692c9a499b ("coresight:
cti: Correct the parameter for pm_runtime_put")
With lock and scheduler debugging enabled the following is output:
coresight cti_sys0: cti_enable_hw -- dev:cti_sys0 parent: 20020000.cti
BUG: sleeping function called from invalid context at drivers/base/power/runtime.c:1151
in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 330, name: perf-exec
preempt_count: 2, expected: 0
RCU nest depth: 0, expected: 0
INFO: lockdep is turned off.
irq event stamp: 0
hardirqs last enabled at (0): [<0000000000000000>] 0x0
hardirqs last disabled at (0): [<ffff80000822b394>] copy_process+0xa0c/0x1948
softirqs last enabled at (0): [<ffff80000822b394>] copy_process+0xa0c/0x1948
softirqs last disabled at (0): [<0000000000000000>] 0x0
CPU: 3 PID: 330 Comm: perf-exec Not tainted 6.0.0-00053-g042116d99298 #7
Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Sep 13 2022
Call trace:
dump_backtrace+0x134/0x140
show_stack+0x20/0x58
dump_stack_lvl+0x8c/0xb8
dump_stack+0x18/0x34
__might_resched+0x180/0x228
__might_sleep+0x50/0x88
__pm_runtime_resume+0xac/0xb0
cti_enable+0x44/0x120
coresight_control_assoc_ectdev+0xc0/0x150
coresight_enable_path+0xb4/0x288
etm_event_start+0x138/0x170
etm_event_add+0x48/0x70
event_sched_in.isra.122+0xb4/0x280
merge_sched_in+0x1fc/0x3d0
visit_groups_merge.constprop.137+0x16c/0x4b0
ctx_sched_in+0x114/0x1f0
perf_event_sched_in+0x60/0x90
ctx_resched+0x68/0xb0
perf_event_exec+0x138/0x508
begin_new_exec+0x52c/0xd40
load_elf_binary+0x6b8/0x17d0
bprm_execve+0x360/0x7f8
do_execveat_common.isra.47+0x218/0x238
__arm64_sys_execve+0x48/0x60
invoke_syscall+0x4c/0x110
el0_svc_common.constprop.4+0xfc/0x120
do_el0_svc+0x34/0xc0
el0_svc+0x40/0x98
el0t_64_sync_handler+0x98/0xc0
el0t_64_sync+0x170/0x174
Fix the issue by removing the runtime PM calls completely. They are not
needed here because it must have already been done when building the
path for a trace.
Fixes: 835d722ba1 ("coresight: cti: Initial CoreSight CTI Driver")
Reported-by: Aishwarya TCV <Aishwarya.TCV@arm.com>
Reported-by: Cristian Marussi <Cristian.Marussi@arm.com>
Suggested-by: Suzuki Poulose <Suzuki.Poulose@arm.com>
Signed-off-by: James Clark <james.clark@arm.com>
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Tested-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20221005131452.1506328-1-james.clark@arm.com
With lockdeps enabled, we get the following warning:
======================================================
WARNING: possible circular locking dependency detected
------------------------------------------------------
kworker/u12:1/53 is trying to acquire lock:
ffff80000adce220 (coresight_mutex){+.+.}-{4:4}, at: coresight_set_assoc_ectdev_mutex+0x3c/0x5c
but task is already holding lock:
ffff80000add1f60 (ect_mutex){+.+.}-{4:4}, at: cti_probe+0x318/0x394
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (ect_mutex){+.+.}-{4:4}:
__mutex_lock_common+0xd8/0xe60
mutex_lock_nested+0x44/0x50
cti_add_assoc_to_csdev+0x4c/0x184
coresight_register+0x2f0/0x314
tmc_probe+0x33c/0x414
-> #0 (coresight_mutex){+.+.}-{4:4}:
__lock_acquire+0x1a20/0x32d0
lock_acquire+0x160/0x308
__mutex_lock_common+0xd8/0xe60
mutex_lock_nested+0x44/0x50
coresight_set_assoc_ectdev_mutex+0x3c/0x5c
cti_update_conn_xrefs+0x6c/0xf8
cti_probe+0x33c/0x394
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(ect_mutex);
lock(coresight_mutex);
lock(ect_mutex);
lock(coresight_mutex);
*** DEADLOCK ***
4 locks held by kworker/u12:1/53:
#0: ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1fc/0x63c
#1: (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x228/0x63c
#2: (&dev->mutex){....}-{4:4}, at: __device_attach+0x48/0x1a8
#3: (ect_mutex){+.+.}-{4:4}, at: cti_probe+0x318/0x394
To fix the same, call cti_add_assoc_to_csdev without the holding
coresight_mutex and confine the locking while setting the associated
ect / cti device using coresight_set_assoc_ectdev_mutex().
Fixes: 177af8285b ("coresight: cti: Enable CTI associated with devices")
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20220721130329.3787211-1-sudeep.holla@arm.com
Functions that are annotated __exit are discarded for built-in drivers,
but the .remove callback in a device driver must still be kept around
to allow bind/unbind operations.
There is now a linker warning for the discarded symbol references:
`tmc_remove' referenced in section `.data' of drivers/hwtracing/coresight/coresight-tmc-core.o: defined in discarded section `.exit.text' of drivers/hwtracing/coresight/coresight-tmc-core.o
`tpiu_remove' referenced in section `.data' of drivers/hwtracing/coresight/coresight-tpiu.o: defined in discarded section `.exit.text' of drivers/hwtracing/coresight/coresight-tpiu.o
`etb_remove' referenced in section `.data' of drivers/hwtracing/coresight/coresight-etb10.o: defined in discarded section `.exit.text' of drivers/hwtracing/coresight/coresight-etb10.o
`static_funnel_remove' referenced in section `.data' of drivers/hwtracing/coresight/coresight-funnel.o: defined in discarded section `.exit.text' of drivers/hwtracing/coresight/coresight-funnel.o
`dynamic_funnel_remove' referenced in section `.data' of drivers/hwtracing/coresight/coresight-funnel.o: defined in discarded section `.exit.text' of drivers/hwtracing/coresight/coresight-funnel.o
`static_replicator_remove' referenced in section `.data' of drivers/hwtracing/coresight/coresight-replicator.o: defined in discarded section `.exit.text' of drivers/hwtracing/coresight/coresight-replicator.o
`dynamic_replicator_remove' referenced in section `.data' of drivers/hwtracing/coresight/coresight-replicator.o: defined in discarded section `.exit.text' of drivers/hwtracing/coresight/coresight-replicator.o
`catu_remove' referenced in section `.data' of drivers/hwtracing/coresight/coresight-catu.o: defined in discarded section `.exit.text' of drivers/hwtracing/coresight/coresight-catu.o
Remove all those annotations.
Fixes: 8b0cf82677 ("coresight: stm: Allow to build coresight-stm as a module")
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Link: https://lore.kernel.org/r/20201208182651.1597945-3-mathieu.poirier@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>