To address IO performance commit f9e5b33934
("mmc: host: Improve I/O read/write performance for GL9763E")
limited LPM negotiation to runtime suspend state.
The problem is that it only flips the switch in the runtime PM
resume/suspend logic.
Disable LPM negotiation in gl9763e_add_host.
This helps in two ways:
1. It was found that the LPM switch stays in the same position after
warm reboot. Having it set in init helps with consistency.
2. Disabling LPM during the first runtime resume leaves us susceptible
to the performance issue in the time window between boot and the
first runtime suspend.
Fixes: f9e5b33934 ("mmc: host: Improve I/O read/write performance for GL9763E")
Cc: stable@vger.kernel.org
Signed-off-by: Kornel Dulęba <korneld@chromium.org>
Reviewed-by: Sven van Ashbrook <svenva@chromium.org>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20231114115516.1585361-1-korneld@chromium.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
If a task completion notification (TCN) is received when there is no
outstanding task, the cqhci driver issues a "spurious TCN" warning. This
was observed to happen right after CQE error recovery.
When an error interrupt is received the driver runs recovery logic.
It halts the controller, clears all pending tasks, and then re-enables
it. On some platforms, like Intel Jasper Lake, a stale task completion
event was observed, regardless of the CQHCI_CLEAR_ALL_TASKS bit being set.
This results in either:
a) Spurious TC completion event for an empty slot.
b) Corrupted data being passed up the stack, as a result of premature
completion for a newly added task.
Rather than add a quirk for affected controllers, ensure tasks are cleared
by toggling CQHCI_ENABLE, which would happen anyway if
cqhci_clear_all_tasks() timed out. This is simpler and should be safe and
effective for all controllers.
Fixes: a4080225f5 ("mmc: cqhci: support for command queue enabled host")
Cc: stable@vger.kernel.org
Reported-by: Kornel Dulęba <korneld@chromium.org>
Tested-by: Kornel Dulęba <korneld@chromium.org>
Co-developed-by: Kornel Dulęba <korneld@chromium.org>
Signed-off-by: Kornel Dulęba <korneld@chromium.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Link: https://lore.kernel.org/r/20231103084720.6886-7-adrian.hunter@intel.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Merge the mmc fixes for v6.6-rc[n] into the next branch, to allow them to
get tested together with the new mmc changes that are targeted for v6.7.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
During board verification, there is a need to test the various supported
eMMC/SD speed modes. However, since the framework chooses the best mode
supported by the card and the host controller's caps, this currently
necessitates changing the devicetree for every iteration.
Allow the various speed mode host capabilities to be modified via
debugfs in order to allow easier hardware verification. The values to
be written are the raw MMC_CAP* values from include/linux/mmc/host.h.
This is rather low-level, and these defines are not guaranteed to be
stable, but it is perhaps good enough for the intended use case.
MMC_CAP_AGGRESSIVE_PM can also be set, in order to be able to
re-initialize the card without having to physically remove and re-insert
it.
/sys/kernel/debug/mmc0# grep timing ios
timing spec: 9 (mmc HS200)
// Turn on MMC_CAP_AGGRESSIVE_PM and re-trigger runtime suspend
/sys/kernel/debug/mmc0# echo $(($(cat caps) | (1 << 7))) > caps
/sys/kernel/debug/mmc0# echo on > /sys/bus/mmc/devices/mmc0\:0001/power/control
/sys/kernel/debug/mmc0# echo auto > /sys/bus/mmc/devices/mmc0\:0001/power/control
// MMC_CAP2_HS200_1_8V_SDR
/sys/kernel/debug/mmc0# echo $(($(cat caps2) & ~(1 << 5))) > caps2
/sys/kernel/debug/mmc0# echo on > /sys/bus/mmc/devices/mmc0\:0001/power/control
/sys/kernel/debug/mmc0# grep timing ios
timing spec: 8 (mmc DDR52)
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Link: https://lore.kernel.org/r/20230929-mmc-caps-v2-2-11a4c2d94f15@axis.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The STM32 SDMMC peripheral (at least for the STM32F429, STM32F469 and
STM32F746, which are all the currently supported devices using periphid
0x00880180) requires DMA to be performed in peripheral flow controller
mode. From the STM32F74/5 reference manual, section 35.3.2:
"SDMMC host allows only to use the DMA in peripheral flow controller
mode. DMA stream used to serve SDMMC must be configured in peripheral
flow controller mode"
This patch adds a variant option to control peripheral flow control and
enables it for the STM32 variant.
Signed-off-by: Ben Wolsieffer <Ben.Wolsieffer@hefring.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20230928135644.1489691-1-ben.wolsieffer@hefring.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Merge the mmc fixes for v6.6-rc[n] into the next branch, to allow them to
get tested together with the new mmc changes that are targeted for v6.7.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The OEMID is an 8-bit binary number rather than 16-bit as the current code
parses for. The OEMID occupies bits [111:104] in the CID register, see the
eMMC spec JESD84-B51 paragraph 7.2.3. It seems that the 16-bit comes from
the legacy MMC specs (v3.31 and before).
Let's fix the parsing by simply move to use 8-bit instead of 16-bit. This
means we ignore the impact on some of those old MMC cards that may be out
there, but on the other hand this shouldn't be a problem as the OEMID seems
not be an important feature for these cards.
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230927071500.1791882-1-avri.altman@wdc.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Merge the mmc fixes for v6.6-rc[n] into the next branch, to allow them to
get tested together with the new mmc changes that are targeted for v6.7.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
By dynamically adjusting the host->hsq_depth, based upon the buffer size
being 4k and that we get at least two I/O write requests in flight, we can
improve the throughput a bit. This is typical for a random I/O write
pattern.
More precisely, by dynamically changing the number of requests in flight
from 2 to 5, we can on some platforms observe ~4-5% increase in throughput.
Signed-off-by: Wenchao Chen <wenchao.chen@unisoc.com>
Link: https://lore.kernel.org/r/20230919074707.25517-3-wenchao.chen@unisoc.com
[Ulf: Re-wrote the commitmsg, minor adjustment to the code - all to clarify.]
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
To allow dynamical updates of the current number of used in-flight
requests, let's move away from using a hard-coded value to a use a
corresponding variable in the struct mmc_host.
This can be valuable when optimizing for certain I/O request sequences, as
shown by subsequent changes.
Signed-off-by: Wenchao Chen <wenchao.chen@unisoc.com>
Link: https://lore.kernel.org/r/20230919074707.25517-2-wenchao.chen@unisoc.com
[Ulf: Re-wrote the commitmsg to clarify the change]
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Userspace has currently no way of checking the internal R1 response error
bits for some commands. This is a problem for some commands, like RPMB for
example. Typically, we may detect that the busy completion has successfully
ended, while in fact the card did not complete the requested operation.
To fix the problem, let's always poll with CMD13 for these commands and
during the polling, let's also aggregate the R1 response bits. Before
completing the ioctl request, let's propagate the R1 response bits too.
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Co-developed-by: Christian Loehle <christian.loehle@arm.com>
Signed-off-by: Christian Loehle <christian.loehle@arm.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230913112921.553019-1-ulf.hansson@linaro.org
To improve the r/w performance of GL9763E, the current driver inhibits LPM
negotiation while the device is active.
This prevents a large number of SoCs from suspending, notably x86 systems
which commonly use S0ix as the suspend mechanism - for example, Intel
Alder Lake and Raptor Lake processors.
Failure description:
1. Userspace initiates s2idle suspend (e.g. via writing to
/sys/power/state)
2. This switches the runtime_pm device state to active, which disables
LPM negotiation, then calls the "regular" suspend callback
3. With LPM negotiation disabled, the bus cannot enter low-power state
4. On a large number of SoCs, if the bus not in a low-power state, S0ix
cannot be entered, which in turn prevents the SoC from entering
suspend.
Fix by re-enabling LPM negotiation in the device's suspend callback.
Suggested-by: Stanislaw Kardach <skardach@google.com>
Fixes: f9e5b33934 ("mmc: host: Improve I/O read/write performance for GL9763E")
Cc: stable@vger.kernel.org
Signed-off-by: Sven van Ashbrook <svenva@chromium.org>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20230831160055.v3.1.I7ed1ca09797be2dd76ca914c57d88b32d24dac88@changeid
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Pull tty/serial driver updates from Greg KH:
"Here is the big set of tty and serial driver changes for 6.6-rc1.
Lots of cleanups in here this cycle, and some driver updates. Short
summary is:
- Jiri's continued work to make the tty code and apis be a bit more
sane with regards to modern kernel coding style and types
- cpm_uart driver updates
- n_gsm updates and fixes
- meson driver updates
- sc16is7xx driver updates
- 8250 driver updates for different hardware types
- qcom-geni driver fixes
- tegra serial driver change
- stm32 driver updates
- synclink_gt driver cleanups
- tty structure size reduction
All of these have been in linux-next this week with no reported
issues. The last bit of cleanups from Jiri and the tty structure size
reduction came in last week, a bit late but as they were just style
changes and size reductions, I figured they should get into this merge
cycle so that others can work on top of them with no merge conflicts"
* tag 'tty-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (199 commits)
tty: shrink the size of struct tty_struct by 40 bytes
tty: n_tty: deduplicate copy code in n_tty_receive_buf_real_raw()
tty: n_tty: extract ECHO_OP processing to a separate function
tty: n_tty: unify counts to size_t
tty: n_tty: use u8 for chars and flags
tty: n_tty: simplify chars_in_buffer()
tty: n_tty: remove unsigned char casts from character constants
tty: n_tty: move newline handling to a separate function
tty: n_tty: move canon handling to a separate function
tty: n_tty: use MASK() for masking out size bits
tty: n_tty: make n_tty_data::num_overrun unsigned
tty: n_tty: use time_is_before_jiffies() in n_tty_receive_overrun()
tty: n_tty: use 'num' for writes' counts
tty: n_tty: use output character directly
tty: n_tty: make flow of n_tty_receive_buf_common() a bool
Revert "tty: serial: meson: Add a earlycon for the T7 SoC"
Documentation: devices.txt: Fix minors for ttyCPM*
Documentation: devices.txt: Remove ttySIOC*
Documentation: devices.txt: Remove ttyIOC*
serial: 8250_bcm7271: improve bcm7271 8250 port
...
Pull modules updates from Luis Chamberlain:
"Summary of the changes worth highlighting from most interesting to
boring below:
- Christoph Hellwig's symbol_get() fix to Nvidia's efforts to
circumvent the protection he put in place in year 2020 to prevent
proprietary modules from using GPL only symbols, and also ensuring
proprietary modules which export symbols grandfather their taint.
That was done through year 2020 commit 262e6ae708 ("modules:
inherit TAINT_PROPRIETARY_MODULE"). Christoph's new fix is done by
clarifing __symbol_get() was only ever intended to prevent module
reference loops by Linux kernel modules and so making it only find
symbols exported via EXPORT_SYMBOL_GPL(). The circumvention tactic
used by Nvidia was to use symbol_get() to purposely swift through
proprietary module symbols and completely bypass our traditional
EXPORT_SYMBOL*() annotations and community agreed upon
restrictions.
A small set of preamble patches fix up a few symbols which just
needed adjusting for this on two modules, the rtc ds1685 and the
networking enetc module. Two other modules just needed some build
fixing and removal of use of __symbol_get() as they can't ever be
modular, as was done by Arnd on the ARM pxa module and Christoph
did on the mmc au1xmmc driver.
This is a good reminder to us that symbol_get() is just a hack to
address things which should be fixed through Kconfig at build time
as was done in the later patches, and so ultimately it should just
go.
- Extremely late minor fix for old module layout 055f23b74b
("module: check for exit sections in layout_sections() instead of
module_init_section()") by James Morse for arm64. Note that this
layout thing is old, it is *not* Song Liu's commit ac3b432839
("module: replace module_layout with module_memory"). The issue
however is very odd to run into and so there was no hurry to get
this in fast.
- Although the fix did not go through the modules tree I'd like to
highlight the fix by Peter Zijlstra in commit 5409730962
("x86/static_call: Fix __static_call_fixup()") now merged in your
tree which came out of what was originally suspected to be a
fallout of the the newer module layout changes by Song Liu commit
ac3b432839 ("module: replace module_layout with module_memory")
instead of module_init_section()"). Thanks to the report by
Christian Bricart and the debugging by Song Liu & Peter that turned
to be noted as a kernel regression in place since v5.19 through
commit ee88d363d1 ("x86,static_call: Use alternative RET
encoding").
I highlight this to reflect and clarify that we haven't seen more
fallout from ac3b432839 ("module: replace module_layout with
module_memory").
- RISC-V toolchain got mapping symbol support which prefix symbols
with "$" to help with alignment considerations for disassembly.
This is used to differentiate between incompatible instruction
encodings when disassembling. RISC-V just matches what ARM/AARCH64
did for alignment considerations and Palmer Dabbelt extended
is_mapping_symbol() to accept these symbols for RISC-V. We already
had support for this for all architectures but it also checked for
the second character, the RISC-V check Dabbelt added was just for
the "$". After a bit of testing and fallout on linux-next and based
on feedback from Masahiro Yamada it was decided to simplify the
check and treat the first char "$" as unique for all architectures,
and so we no make is_mapping_symbol() for all archs if the symbol
starts with "$".
The most relevant commit for this for RISC-V on binutils was:
https://sourceware.org/pipermail/binutils/2021-July/117350.html
- A late fix by Andrea Righi (today) to make module zstd
decompression use vmalloc() instead of kmalloc() to account for
large compressed modules. I suspect we'll see similar things for
other decompression algorithms soon.
- samples/hw_breakpoint minor fixes by Rong Tao, Arnd Bergmann and
Chen Jiahao"
* tag 'modules-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
module/decompress: use vmalloc() for zstd decompression workspace
kallsyms: Add more debug output for selftest
ARM: module: Use module_init_layout_section() to spot init sections
arm64: module: Use module_init_layout_section() to spot init sections
module: Expose module_init_layout_section()
modules: only allow symbol_get of EXPORT_SYMBOL_GPL modules
rtc: ds1685: use EXPORT_SYMBOL_GPL for ds1685_rtc_poweroff
net: enetc: use EXPORT_SYMBOL_GPL for enetc_phc_index
mmc: au1xmmc: force non-modular build and remove symbol_get usage
ARM: pxa: remove use of symbol_get()
samples/hw_breakpoint: mark sample_hbp as static
samples/hw_breakpoint: fix building without module unloading
samples/hw_breakpoint: Fix kernel BUG 'invalid opcode: 0000'
modpost, kallsyms: Treat add '$'-prefixed symbols as mapping symbols
kernel: params: Remove unnecessary ‘0’ values from err
module: Ignore RISC-V mapping symbols too
First of all, Unisoc's IC provides cmd delay and read delay to ensure
that the host can get the correct data. However, according to SD Spec,
there is no need to do tuning in high speed mode, but with the
development of chip processes, it is more and more difficult to find
a suitable delay to cover all the chips. Therefore, we need SD high
speed mode online tuning.
In addition, we added mmc_sd_switch() and mmc_send_status() to the
header file to allow it to be usable by the driver.
Signed-off-by: Wenchao Chen <wenchao.chen@unisoc.com>
Link: https://lore.kernel.org/r/20230825091743.15613-3-wenchao.chen@unisoc.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
To support the need for host specific tuning for SD high-speed mode, let's
add two new optional callbacks, ->prepare|execute_sd_hs_tuning() and let's
call them when switching into the SD high-speed mode.
Note that, during the tuning process it's also needed for host drivers to
send commands to the SD card to verify that the tuning process succeeds.
Therefore, let's also share the corresponding functions from the core to
allow this.
Signed-off-by: Wenchao Chen <wenchao.chen@unisoc.com>
Link: https://lore.kernel.org/r/20230825091743.15613-2-wenchao.chen@unisoc.com
[Ulf: Dropped unnecessary function declarations and updated the commit msg]
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>