Based on the gpiomem driver, allow mapping of the decoder register
spaces such that userspace can access control/status registers.
This driver is intended for use with a custom ffmpeg backend accelerator
prior to a v4l2 driver being written.
Signed-off-by: Jonathan Bell <jonathan@raspberrypi.org>
driver: char: rpivid: Destroy the legacy device on remove
The legacy name support created a new device that was never destroyed.
If the driver was unloaded and reloaded, it failed due to the
device already existing.
Fixes: "75f1d14 driver: char: rpivid - also support legacy name"
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
driver: char: rpivid: Clean up error handling use of ERR_PTR/IS_ERR
The driver used an unnecessary intermediate void* variable so it
only called ERR_PTR once to convert to the error value.
Switch to converting as the error arises to remove these intermediate
variables.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
driver: char: rpivid: Add error handling to the legacy device load
The return value from device_create for the legacy device was never
checked or handled. Add the required error handling.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
driver: char: rpivid: Fix coding style whitespace issues.
Makes checkpatch happier.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
driver: char: rpimem: Add SPDX licence header.
Stops checkpatch complaining.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
driver: char: rpivid: Fix access to freed memory
The error path during probe frees the private memory block, and
then promptly dereferences it to log an error message.
Use the base device instead of the pointer to it in the private
structure.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
xHCI caches device and endpoint data after the interface is configured,
so an explicit command needs to be issued for any device driver wanting
to alter the polling interval of an endpoint.
Add usb_fixup_endpoint() to allow drivers to do this. The fixup must be
called after calculating endpoint bandwidth requirements but before any
URBs are submitted.
If polling intervals are shortened, any bandwidth reservations are no
longer valid but in practice polling intervals are only ever relaxed.
Limit the scope to interrupt transfers for now.
Signed-off-by: Jonathan Bell <jonathan@raspberrypi.org>
Without the actual power management part any more, there's a lot less
to set up for V3D. We just need to clear the RSTN field for the power
domain, and expose the reset controller for toggling it again.
This is definitely incomplete -- the old ISP and H264 is in the old
bridge, but since we have no consumers of it I've just done the
minimum to get V3D working.
Signed-off-by: Eric Anholt <eric@anholt.net>
This new driver allows contiguous memory blocks to be imported
into the VideoCore VPU memory map, and manages the lifetime of
those objects, only releasing the source dmabuf once the VPU has
confirmed it has finished with it.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
staging: vc-sm-cma: Correct DMA configuration.
Now that VCHIQ is setting up the DMA configuration as our
parent device, don't try to configure it during probe.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
staging: vc-sm-cma: Use a void* pointer as the handle within the kernel
The driver was using an unsigned int as the handle to the outside world,
and doing a nasty cast to the struct dmabuf when handed it back.
This breaks badly with a 64 bit kernel where the pointer doesn't fit
in an unsigned int.
Switch to using a void* within the kernel. Reality is that it is
a struct dma_buf*, but advertising it as such to other drivers seems
to encourage the use of it as such, and I'm not sure on the implications
of that.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
staging: vc-sm-cma: Fix up for 64bit builds
There were a number of logging lines that were using
inappropriate formatting under 64bit kernels.
The kernel_id field passed to/from the VPU was being
abused for storing the struct vc_sm_buffer *.
This breaks with 64bit kernels, so change to using an IDR.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
staging: vc_sm_cma: Remove erroneous misc_deregister
Code from the misc /dev node was still present in
bcm2835_vc_sm_cma_remove, which caused a NULL deref.
Remove it.
See #2885.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
staging: vc-sm-cma: Remove the debugfs directory on remove
Without removing that, reloading the driver fails.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
staging: vc-sm-cma: Use devm_ allocs for sm_state.
Use managed allocations for sm_state, removing reliance on
manual management.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
staging: vc-sm-cma: Don't fail if debugfs calls fail.
Return codes from debugfs calls should never alter the
flow of the main code.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
staging: vc-sm-cma: Ensure mutex and idr are destroyed
map_lock and kernelid_map are created in probe, but not released
in release should the vcsm service not connect (eg running the
cutdown firmware).
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
staging: vc-sm-cma: Remove obsolete comment and make function static
Removes obsolete comment about wanting to pass a function
pointer into mmal-vchiq as we now do.
As the function is passed as a function pointer, the function itself
can be static.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
staging: vc-sm-cma: Add in allocation for VPU requests.
Module has to change from tristate to bool as all CMA functions
are boolean.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
staging: vc-sm-cma: Update TODO.
The driver is already a platform driver, so that can be
deleted from the TODO.
There are no known issues that need to be resolved.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
staging: vc-sm-cma: Add in userspace allocation API
Replacing the functionality from the older vc-sm driver,
add in a userspace API that allows allocation of buffers,
and importing of dma-bufs.
The driver hands out dma-buf fds, therefore much of the
handling around lifespan and odd mmaps from the old driver
goes away.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
staging: vcsm-cma: Add cache control ioctls
The old driver allowed for direct cache manipulation and that
was used by various clients. Replicate here.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
staging: vcsm-cma: Alter dev node permissions to 0666
Until the udev rules are updated, open up access to this node by
default.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
staging: vcsm-cma: Drop logging level on messages in vc_sm_release_resource
They weren't errors but were logged as such.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
staging: vcsm-cma: Fixup the alloc code handling of kernel_id
The allocation code had been copied in from an old branch prior
to having added the IDR for 64bit support. It was therefore pushing
a pointer into the kernel_id field instead of an IDR handle, the
lookup therefore failed, and we never released the buffer.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
staging: vcsm-cma: Remove cache manipulation ioctl from ARM64
The cache flushing ioctls are used by the Pi3 HEVC hw-assisted
decoder as it needs finer grained flushing control than dma_ops
allow.
These cache calls are not present for ARM64, therefore disable
them. We are not actively supporting 64bit kernels at present,
and the use case of the HEVC decoder is fairly limited.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
staging: vcsm-cma: Rework to use dma APIs, not CMA
Due to a misunderstanding of the DMA mapping APIs, I made
the wrong decision on how to implement this.
Rework to use dma_alloc_coherent instead of the CMA
API. This also allows it to be built as a module easily.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
staging: vc-sm-cma: Fix the few remaining coding style issues
Fix a few minor checkpatch complaints to make the driver clean
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
staging: vc04_services: fix compiling in separate directory
The vc04_services Makefiles do not respect the O=path argument
correctly: include paths in CFLAGS are given relatively to object path,
not source path. Compiling in a separate directory yields #include
errors.
Signed-off-by: Marek Behún <marek.behun@nic.cz>
vc-sm-cma: Fix compatibility ioctl
This code path hasn't been used previously.
Fixed up after testing with kodi on 32-bit userland and 64-bit kernel
Signed-off-by: popcornmix <popcornmix@gmail.com>
Ethernet cables with faulty or missing pairs (specifically pairs C and
D) allow auto-negotiation to 1000Mbs, but do not support the successful
establishment of a link. Add a DT property, "microchip,downshift-after",
to configure the number of auto-negotiation failures after which it
falls back to 100Mbs. Valid values are 2, 3, 4, 5 and 0, where 0 means
never downshift.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
The "input" trigger makes the associated GPIO an input. This is to support
the Raspberry Pi PWR LED, which is driven by external hardware in normal use.
N.B. pwr_led is not available on Model A or B boards.
leds-gpio: Implement the brightness_get method
The power LED uses some clever logic that means it is driven
by a voltage measuring circuit when configured as input, otherwise
it is driven by the GPIO output value. This patch wires up the
brightness_get method for leds-gpio so that user-space can monitor
the LED value via /sys/class/gpio/led1/brightness. Using the input
trigger this returns an indication of the system power health,
otherwise it is just whatever value the trigger has written most
recently.
See: https://github.com/raspberrypi/linux/issues/1064
Signed-off-by: Luke Wren <wren6991@gmail.com>
MISC: bcm2835: smi: use clock manager and fix reload issues
Use clock manager instead of self-made clockmanager.
Also fix some error paths that showd up during development
(especially missing release of dma resources on rmmod)
Signed-off-by: Martin Sperl <kernel@martin.sperl.org>
bcm2835_smi: re-add dereference to fix DMA transfers
Add experimental support for the VideoCore shared memory service.
This allows user processes to allocate memory from VideoCore's
GPU relocatable heap and mmap the buffers. Additionally, the memory
handles can passed to other VideoCore services such as MMAL, OpenMax
and DispmanX
TODO
* This driver was originally released for BCM28155 which has a different
cache architecture to BCM2835. Consequently, in this release only
uncached mappings are supported. However, there's no fundamental
reason which cached mappings cannot be support or BCM2835
* More refactoring is required to remove the typedefs.
* Re-enable the some of the commented out debug-fs statistics which were
disabled when migrating code from proc-fs.
* There's a lot of code to support sharing of VCSM in order to support
Android. This could probably done more cleanly or perhaps just
removed.
Signed-off-by: Tim Gover <timgover@gmail.com>
config: Disable VC_SM for now to fix hang with cutdown kernel
vcsm: Use boolean as it cannot be built as module
On building the bcm_vc_sm as a module we get the following error:
v7_dma_flush_range and do_munmap are undefined in vc-sm.ko.
Fix by making it not an option to build as module
vcsm: Add ioctl for custom cache flushing
vc-sm: Move headers out of arch directory
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
vcsm: Treat EBUSY as success rather than SIGBUS
Currently if two cores access the same page concurrently one will return VM_FAULT_NOPAGE
and the other VM_FAULT_SIGBUS crashing the user code.
Also report when mapping fails.
Signed-off-by: popcornmix <popcornmix@gmail.com>
vcsm: Provide new ioctl to clean/invalidate a 2D block
vcsm: Convert to loading via device tree.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
VCSM: New option to import a DMABUF for VPU use
Takes a dmabuf, and then calls over to the VPU to wrap
it into a suitable handle.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
vcsm: fix multi-platform build
vcsm: add macros for cache functions
vcsm: use dma APIs for cache functions
* Will handle multi-platform builds
vcsm: Fix up macros to avoid breaking numbers used by existing apps
vcsm: Define cache operation constants in user header
Without this change, users have to use raw values (1, 2, 3) to specify
cache operation.
Signed-off-by: Sugizaki Yukimasa <i.can.speak.c.and.basic@gmail.com>
vcsm: Support for finding user/vc handle in memory pool
vmcs_sm_{usr,vc}_handle_from_pid_and_address() were failing to find
handle if specified user pointer is not exactly the one that the memory
locking call returned even if the pointer is in range of map/resource.
So fixed the functions to match the range.
Signed-off-by: Sugizaki Yukimasa <i.can.speak.c.and.basic@gmail.com>
vcsm: Unify cache manipulating functions
Signed-off-by: Sugizaki Yukimasa <i.can.speak.c.and.basic@gmail.com>
vcsm: Fix obscure conditions
Signed-off-by: Sugizaki Yukimasa <i.can.speak.c.and.basic@gmail.com>
vcsm: Fix memory leaking on clean_invalid2 ioctl handler
Signed-off-by: Sugizaki Yukimasa <i.can.speak.c.and.basic@gmail.com>
vcsm: Describe the use of cache operation constants
Signed-off-by: Sugizaki Yukimasa <i.can.speak.c.and.basic@gmail.com>
vcsm: Fix obscure conditions again
Signed-off-by: Sugizaki Yukimasa <i.can.speak.c.and.basic@gmail.com>
vcsm: Add no-op cache operation constant
Signed-off-by: Sugizaki Yukimasa <i.can.speak.c.and.basic@gmail.com>
vcsm: Revert to do page-table-walk-based cache manipulating on some ioctl calls
On FLUSH, INVALID, CLEAN_INVALID ioctl calls, cache operations based on
page table walk were used in case that the buffer of the cache is not
pinned. So reverted to do page-table-based cache manipulating.
Signed-off-by: Sugizaki Yukimasa <i.can.speak.c.and.basic@gmail.com>
vcsm: Define cache operation constants in user header
Without this change, users have to use raw values (1, 2, 3) to specify
cache operation.
Signed-off-by: Sugizaki Yukimasa <i.can.speak.c.and.basic@gmail.com>
vcsm: Updates for changed vchiq interface
vcsm: Fix an NULL dereference in the import_dmabuf error path
resource was dereferenced even though it was NULL.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
vcsm: Use struct service_creation
vcsm: Fix makefile include on out-of-tree builds
The vc_sm module tries to include the 'fs' directory from the
$(srctree). $(srctree) is already provided by the build system, and
causes the include path to be duplicated.
With -Werror this fails to compile.
Remove the unnecessary variable.
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
vcsm: Remove set but unused variable
The 'success' variable is set by the call to vchi_service_close() but never checked.
Remove it, keeping the call in place.
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
vcsm: Reduce scope of local functions
The functions:
vc_vchi_sm_send_msg
vc_sm_ioctl_alloc
vc_sm_ioctl_alloc_share
vc_sm_ioctl_import_dmabuf
Are declared without a prototype. They are not used outside of this
module, thus - convert them to static functions.
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
vc_sm: Let it support to build in the non-src folder
If we build the kernel with "-O=$non-src-folder", this driver will
introdcue a building error because of the header's location.
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: popcornmix <popcornmix@gmail.com>
BCM270x: Move vc_mem
Make the vc_mem module available for ARCH_BCM2835 by moving it.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
char: vc_mem: Fix up compat ioctls for 64bit kernel
compat_ioctl wasn't defined, so 32bit user/64bit kernel
always failed.
VC_MEM_IOC_MEM_PHYS_ADDR was defined with parameter size
unsigned long, so the ioctl cmd changes between sizes.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
char: vc_mem: Fix all coding style issues.
Cleans up all checkpatch errors in vc_mem.c and vc_mem.h
No functional change to the code.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
mmc: Disable CMD23 transfers on all cards
Pending wire-level investigation of these types of transfers
and associated errors on bcm2835-mmc, disable for now. Fallback of
CMD18/CMD25 transfers will be used automatically by the MMC layer.
Reported/Tested-by: Gellert Weisz <gellert@raspberrypi.org>
mmc: bcm2835-mmc: enable DT support for all architectures
Both ARCH_BCM2835 and ARCH_BCM270x are built with OF now.
Enable Device Tree support for all architectures.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
mmc: bcm2835-mmc: fix probe error handling
Probe error handling is broken in several places.
Simplify error handling by using device managed functions.
Replace pr_{err,info} with dev_{err,info}.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
bcm2835-mmc: Add locks when accessing sdhost registers
bcm2835-mmc: Add range of debug options for slowing things down
bcm2835-mmc: Add option to disable some delays
bcm2835-mmc: Add option to disable MMC_QUIRK_BLK_NO_CMD23
bcm2835-mmc: Default to disabling MMC_QUIRK_BLK_NO_CMD23
bcm2835-mmc: Adding overclocking option
Allow a different clock speed to be substitued for a requested 50MHz.
This option is exposed using the "overclock_50" DT parameter.
Note that the mmc interface is restricted to EVEN integer divisions of
250MHz, and the highest sensible option is 63 (250/4 = 62.5), the
next being 125 (250/2) which is much too high.
Use at your own risk.
bcm2835-mmc: Round up the overclock, so 62 works for 62.5Mhz
Also only warn once for each overclock setting.
mmc: bcm2835-mmc: Make available on ARCH_BCM2835
Make the bcm2835-mmc driver available for use on ARCH_BCM2835.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
BCM270x_DT: add bcm2835-mmc entry
Add Device Tree entry for bcm2835-mmc.
In non-DT mode, don't add the device in the board file.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
bcm2835-mmc: Don't overwrite MMC capabilities from DT
bcm2835-mmc: Don't override bus width capabilities from devicetree
Take out the force setting of the MMC_CAP_4_BIT_DATA host capability
so that the result read from devicetree via mmc_of_parse() is
preserved.
bcm2835-mmc: Only claim one DMA channel
With both MMC controllers enabled there are few DMA channels left. The
bcm2835-mmc driver only uses DMA in one direction at a time, so it
doesn't need to claim two channels.
See: https://github.com/raspberrypi/linux/issues/1327
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
bcm2835-mmc: New timer API
mmc: bcm2835-mmc: Support underclocking
Support underclocking of the SD bus using the max-frequency DT property
(which currently has no DT parameter). The sd_overclock parameter
already provides another way to achieve the same thing which should be
equivalent in end result, but it is a bug not to support max-frequency
as well.
See: https://github.com/raspberrypi/linux/issues/2350
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
mmc/bcm2835: Recover from MMC_SEND_EXT_CSD
If the user issues an "mmc extcsd read", the SD controller receives
what it thinks is a SEND_IF_COND command with an unexpected data block.
The resulting operations leave the FSM stuck in READWAIT, a state which
persists until the MMC framework resets the controller, by which point
the root filesystem is likely to have been unmounted.
A less heavyweight solution is to detect the condition and nudge the
FSM by asserting the (self-clearing) FORCE_DATA_MODE bit.
N.B. This workaround was essentially discovered by accident and without
a full understanding the inner workings of the controller, so it is
fortunate that the "fix" only modifies error paths.
See: https://github.com/raspberrypi/linux/issues/2728
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
bcm2835-mmc: Fix DMA channel leak
The BCM2835 MMC host driver requests a DMA channel on probe but neglects
to release the channel in the probe error path and on driver unbind.
I'm seeing this happen on every boot of the Compute Module 3: On first
driver probe, DMA channel 2 is allocated and then leaked with a "could
not get clk, deferring probe" message. On second driver probe, channel 4
is allocated.
Fix it.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Frank Pavlic <f.pavlic@kunbus.de>
bcm2835-mmc: Fix struct mmc_host leak on probe
The BCM2835 MMC host driver requests the bus address of the host's
register map on probe. If that fails, the driver leaks the struct
mmc_host allocated earlier.
Fix it.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Frank Pavlic <f.pavlic@kunbus.de>
bcm2835-mmc: Fix duplicate free_irq() on remove
The BCM2835 MMC host driver requests its interrupt as a device-managed
resource, so the interrupt is automatically freed after the driver is
unbound.
However on driver unbind, bcm2835_mmc_remove() frees the interrupt
explicitly to avoid invocation of the interrupt handler after driver
structures have been torn down.
The interrupt is thus freed twice, leading to a WARN splat in
__free_irq(). Fix by not requesting the interrupt as a device-managed
resource.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Frank Pavlic <f.pavlic@kunbus.de>
bcm2835-mmc: Handle mmc_add_host() errors
The BCM2835 MMC host driver calls mmc_add_host() but doesn't check its
return value. Errors occurring in that function are therefore not
handled. Fix it.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Frank Pavlic <f.pavlic@kunbus.de>
bcm2835-mmc: Deduplicate reset of driver data on remove
The BCM2835 MMC host driver sets the device's driver data pointer to
NULL on ->remove() even though the driver core subsequently does the
same in __device_release_driver(). Drop the duplicate assignment.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Frank Pavlic <f.pavlic@kunbus.de>
bcm2835_mmc: Remove vestigial threaded IRQ
With SDIO processing now managed by the MMC framework with a
workqueue, the bcm2835_mmc driver no longer needs a threaded
IRQ.
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Add missing dma_unmap_sg calls to free relevant swiotlb bounce buffers.
This prevents DMA leaks.
Signed-off-by: Yaroslav Rosomakho <yaroslavros@gmail.com>
Limit max_req_size under arm64 (or any other platform that uses swiotlb) to prevent potential buffer overflow due to bouncing.
Signed-off-by: Yaroslav Rosomakho <yaroslavros@gmail.com>
Add support for DMA controller of BCM2708 as used in the Raspberry Pi.
Currently it only supports cyclic DMA.
Signed-off-by: Florian Meier <florian.meier@koalo.de>
dmaengine: expand functionality by supporting scatter/gather transfers sdhci-bcm2708 and dma.c: fix for LITE channels
DMA: fix cyclic LITE length overflow bug
dmaengine: bcm2708: Remove chancnt affectations
Mirror bcm2835-dma.c commit 9eba5536a7:
chancnt is already filled by dma_async_device_register, which uses the channel
list to know how much channels there is.
Since it's already filled, we can safely remove it from the drivers' probe
function.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
dmaengine: bcm2708: overwrite dreq only if it is not set
dreq is set when the DMA channel is fetched from Device Tree.
slave_id is set using dmaengine_slave_config().
Only overwrite dreq with slave_id if it is not set.
dreq/slave_id in the cyclic DMA case is not touched, because I don't
have hardware to test with.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
dmaengine: bcm2708: do device registration in the board file
Don't register the device in the driver. Do it in the board file.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
dmaengine: bcm2708: don't restrict DT support to ARCH_BCM2835
Both ARCH_BCM2835 and ARCH_BCM270x are built with OF now.
Add Device Tree support to the non ARCH_BCM2835 case.
Use the same driver name regardless of architecture.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
BCM270x_DT: add bcm2835-dma entry
Add Device Tree entry for bcm2835-dma.
The entry doesn't contain any resources since they are handled
by the arch/arm/mach-bcm270x/dma.c driver.
In non-DT mode, don't add the device in the board file.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
bcm2708-dmaengine: Add debug options
BCM270x: Add memory and irq resources to dmaengine device and DT
Prepare for merging of the legacy DMA API arch driver dma.c
with bcm2708-dmaengine by adding memory and irq resources both
to platform file device and Device Tree node.
Don't use BCM_DMAMAN_DRIVER_NAME so we don't have to include mach/dma.h
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
dmaengine: bcm2708: Merge with arch dma.c driver and disable dma.c
Merge the legacy DMA API driver with bcm2708-dmaengine.
This is done so we can use bcm2708_fb on ARCH_BCM2835 (mailbox
driver is also needed).
Changes to the dma.c code:
- Use BIT() macro.
- Cutdown some comments to one line.
- Add mutex to vc_dmaman and use this, since the dev lock is locked
during probing of the engine part.
- Add global g_dmaman variable since drvdata is used by the engine part.
- Restructure for readability:
vc_dmaman_chan_alloc()
vc_dmaman_chan_free()
bcm_dma_chan_free()
- Restructure bcm_dma_chan_alloc() to simplify error handling.
- Use device irq resources instead of hardcoded bcm_dma_irqs table.
- Remove dev_dmaman_register() and code it directly.
- Remove dev_dmaman_deregister() and code it directly.
- Simplify bcm_dmaman_probe() using devm_* functions.
- Get dmachans from DT if available.
- Keep 'dma.dmachans' module argument name for backwards compatibility.
Make it available on ARCH_BCM2835 as well.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
dmaengine: bcm2708: set residue_granularity field
bcm2708-dmaengine supports residue reporting at burst level
but didn't report this via the residue_granularity field.
Without this field set properly we get playback issues with I2S cards.
dmaengine: bcm2708-dmaengine: Fix memory leak when stopping a running transfer
bcm2708-dmaengine: Use more DMA channels (but not 12)
1) Only the bcm2708_fb drivers uses the legacy DMA API, and
it requires a BULK-capable channel, so all other types
(FAST, NORMAL and LITE) can be made available to the regular
DMA API.
2) DMA channels 11-14 share an interrupt. The driver can't
handle this, so don't use channels 12-14 (12 was used, probably
because it appears to have an interrupt, but in reality that
interrupt is for activity on ANY channel). This may explain
a lockup encountered when running out of DMA channels.
The combined effect of this patch is to leave 7 DMA channels
available + channel 0 for bcm2708_fb via the legacy API.
See: https://github.com/raspberrypi/linux/issues/1110https://github.com/raspberrypi/linux/issues/1108
dmaengine: bcm2708: Make legacy API available for bcm2835-dma
bcm2708_fb uses the legacy DMA API, so in order to start using
bcm2835-dma, bcm2835-dma has to support the legacy API. Make this
possible by exporting bcm_dmaman_probe() and bcm_dmaman_remove().
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
dmaengine: bcm2708: Change DT compatible string
Both bcm2835-dma and bcm2708-dmaengine have the same compatible string.
So change compatible to "brcm,bcm2708-dma".
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
dmaengine: bcm2708: Remove driver but keep legacy API
Dropping non-DT support means we don't need this driver,
but we still need the legacy DMA API.
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
bcm2708-dmaengine - Fix arm64 portability/build issues
dma-bcm2708: Fix module compilation of CONFIG_DMA_BCM2708
bcm2708-dmaengine.c defines functions like bcm_dma_start which are
defined as well in dma-bcm2708.h as inline versions when
CONFIG_DMA_BCM2708 is not defined. This works fine when
CONFIG_DMA_BCM2708 is built in, but when it is selected as module build
fails with redefinition errors because in the build system when
CONFIG_DMA_BCM2708 is selected as module, the macro becomes
CONFIG_DMA_BCM2708_MODULE.
This patch makes the header use CONFIG_DMA_BCM2708_MODULE too when
available.
Fixes https://github.com/raspberrypi/linux/issues/2056
Signed-off-by: Andrei Gherzan <andrei@gherzan.com>
commit 6b61d49a55 upstream.
Commit 8234f6734c ("PM-runtime: Switch autosuspend over to using
hrtimers") switched PM runtime autosuspend to use hrtimers and all
related time accounting in ns, but missed to update the timer_expires
data type in struct dev_pm_info to u64.
This causes the timer_expires value to be truncated on 32-bit
architectures when assignment is done from u64 values:
rpm_suspend()
|- dev->power.timer_expires = expires;
Fix it by changing the timer_expires type to u64.
Fixes: 8234f6734c ("PM-runtime: Switch autosuspend over to using hrtimers")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: 5.0+ <stable@vger.kernel.org> # 5.0+
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c9ca43d42e upstream.
For QUP IP versions 2.5 and above the oversampling rate is
halved from 32 to 16.
Commit ce73460054 ("tty: serial: qcom_geni_serial: Update
the oversampling rate") is pushed to handle this scenario.
But the existing logic is failing to classify QUP Version 3.0
into the correct group ( 2.5 and above).
As result Serial Engine clocks are not configured properly for
baud rate and garbage data is sampled to FIFOs from the line.
So, fix the logic to detect QUP with versions 2.5 and above.
Fixes: ce73460054 ("tty: serial: qcom_geni_serial: Update the oversampling rate")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Paras Sharma <parashar@codeaurora.org>
Reviewed-by: Akash Asthana <akashast@codeaurora.org>
Link: https://lore.kernel.org/r/1601445926-23673-1-git-send-email-parashar@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ec6347bb43 upstream.
In reaction to a proposal to introduce a memcpy_mcsafe_fast()
implementation Linus points out that memcpy_mcsafe() is poorly named
relative to communicating the scope of the interface. Specifically what
addresses are valid to pass as source, destination, and what faults /
exceptions are handled.
Of particular concern is that even though x86 might be able to handle
the semantics of copy_mc_to_user() with its common copy_user_generic()
implementation other archs likely need / want an explicit path for this
case:
On Fri, May 1, 2020 at 11:28 AM Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> On Thu, Apr 30, 2020 at 6:21 PM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > However now I see that copy_user_generic() works for the wrong reason.
> > It works because the exception on the source address due to poison
> > looks no different than a write fault on the user address to the
> > caller, it's still just a short copy. So it makes copy_to_user() work
> > for the wrong reason relative to the name.
>
> Right.
>
> And it won't work that way on other architectures. On x86, we have a
> generic function that can take faults on either side, and we use it
> for both cases (and for the "in_user" case too), but that's an
> artifact of the architecture oddity.
>
> In fact, it's probably wrong even on x86 - because it can hide bugs -
> but writing those things is painful enough that everybody prefers
> having just one function.
Replace a single top-level memcpy_mcsafe() with either
copy_mc_to_user(), or copy_mc_to_kernel().
Introduce an x86 copy_mc_fragile() name as the rename for the
low-level x86 implementation formerly named memcpy_mcsafe(). It is used
as the slow / careful backend that is supplanted by a fast
copy_mc_generic() in a follow-on patch.
One side-effect of this reorganization is that separating copy_mc_64.S
to its own file means that perf no longer needs to track dependencies
for its memcpy_64.S benchmarks.
[ bp: Massage a bit. ]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: <stable@vger.kernel.org>
Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com
Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0f2122045b upstream.
Grab actual references to the files_struct. To avoid circular references
issues due to this, we add a per-task note that keeps track of what
io_uring contexts a task has used. When the tasks execs or exits its
assigned files, we cancel requests based on this tracking.
With that, we can grab proper references to the files table, and no
longer need to rely on stashing away ring_fd and ring_file to check
if the ring_fd may have been closed.
Cc: stable@vger.kernel.org # v5.5+
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e8ee6c8cb6 ]
DW DMA IP-core provides a way to synthesize the DMA controller with
channels having different parameters like maximum burst-length,
multi-block support, maximum data width, etc. Those parameters both
explicitly and implicitly affect the channels performance. Since DMA slave
devices might be very demanding to the DMA performance, let's provide a
functionality for the slaves to be assigned with DW DMA channels, which
performance according to the platform engineer fulfill their requirements.
After this patch is applied it can be done by passing the mask of suitable
DMA-channels either directly in the dw_dma_slave structure instance or as
a fifth cell of the DMA DT-property. If mask is zero or not provided, then
there is no limitation on the channels allocation.
For instance Baikal-T1 SoC is equipped with a DW DMAC engine, which first
two channels are synthesized with max burst length of 16, while the rest
of the channels have been created with max-burst-len=4. It would seem that
the first two channels must be faster than the others and should be more
preferable for the time-critical DMA slave devices. In practice it turned
out that the situation is quite the opposite. The channels with
max-burst-len=4 demonstrated a better performance than the channels with
max-burst-len=16 even when they both had been initialized with the same
settings. The performance drop of the first two DMA-channels made them
unsuitable for the DW APB SSI slave device. No matter what settings they
are configured with, full-duplex SPI transfers occasionally experience the
Rx FIFO overflow. It means that the DMA-engine doesn't keep up with
incoming data pace even though the SPI-bus is enabled with speed of 25MHz
while the DW DMA controller is clocked with 50MHz signal. There is no such
problem has been noticed for the channels synthesized with
max-burst-len=4.
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Link: https://lore.kernel.org/r/20200731200826.9292-6-Sergey.Semin@baikalelectronics.ru
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7f6e4312e1 ]
Protect against potential stack overflow that might happen when bpf2bpf
calls get combined with tailcalls. Limit the caller's stack depth for
such case down to 256 so that the worst case scenario would result in 8k
stack size (32 which is tailcall limit * 256 = 8k).
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 12856e7acd ]
For VFs, the Memory Space Enable bit in the Command Register is
hard-wired to 0.
Add a new bit to signify devices where the Command Register Memory
Space Enable bit does not control the device's response to MMIO
accesses.
Fixes: abafbc551f ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory")
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a4947e84f2 ]
The various array_size functions use SIZE_MAX define, but missed limits.h
causes to failure to compile code that needs overflow.h.
In file included from drivers/infiniband/core/uverbs_std_types_device.c:6:
./include/linux/overflow.h: In function 'array_size':
./include/linux/overflow.h:258:10: error: 'SIZE_MAX' undeclared (first use in this function)
258 | return SIZE_MAX;
| ^~~~~~~~
Fixes: 610b15c50e ("overflow.h: Add allocation size calculation helpers")
Link: https://lore.kernel.org/r/20200913102928.134985-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 67197a4f28 ]
Currently __set_oom_adj loops through all processes in the system to keep
oom_score_adj and oom_score_adj_min in sync between processes sharing
their mm. This is done for any task with more that one mm_users, which
includes processes with multiple threads (sharing mm and signals).
However for such processes the loop is unnecessary because their signal
structure is shared as well.
Android updates oom_score_adj whenever a tasks changes its role
(background/foreground/...) or binds to/unbinds from a service, making it
more/less important. Such operation can happen frequently. We noticed
that updates to oom_score_adj became more expensive and after further
investigation found out that the patch mentioned in "Fixes" introduced a
regression. Using Pixel 4 with a typical Android workload, write time to
oom_score_adj increased from ~3.57us to ~362us. Moreover this regression
linearly depends on the number of multi-threaded processes running on the
system.
Mark the mm with a new MMF_MULTIPROCESS flag bit when task is created with
(CLONE_VM && !CLONE_THREAD && !CLONE_VFORK). Change __set_oom_adj to use
MMF_MULTIPROCESS instead of mm_users to decide whether oom_score_adj
update should be synchronized between multiple processes. To prevent
races between clone() and __set_oom_adj(), when oom_score_adj of the
process being cloned might be modified from userspace, we use
oom_adj_mutex. Its scope is changed to global.
The combination of (CLONE_VM && !CLONE_THREAD) is rarely used except for
the case of vfork(). To prevent performance regressions of vfork(), we
skip taking oom_adj_mutex and setting MMF_MULTIPROCESS when CLONE_VFORK is
specified. Clearing the MMF_MULTIPROCESS flag (when the last process
sharing the mm exits) is left out of this patch to keep it simple and
because it is believed that this threading model is rare. Should there
ever be a need for optimizing that case as well, it can be done by hooking
into the exit path, likely following the mm_update_next_owner pattern.
With the combination of (CLONE_VM && !CLONE_THREAD && !CLONE_VFORK) being
quite rare, the regression is gone after the change is applied.
[surenb@google.com: v3]
Link: https://lkml.kernel.org/r/20200902012558.2335613-1-surenb@google.com
Fixes: 44a70adec9 ("mm, oom_adj: make sure processes sharing mm have same view of oom_score_adj")
Reported-by: Tim Murray <timmurray@google.com>
Suggested-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Eugene Syromiatnikov <esyr@redhat.com>
Cc: Christian Kellner <christian@kellner.me>
Cc: Adrian Reber <areber@redhat.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexey Gladkov <gladkov.alexey@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Bernd Edlinger <bernd.edlinger@hotmail.de>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Yafang Shao <laoar.shao@gmail.com>
Link: https://lkml.kernel.org/r/20200824153036.3201505-1-surenb@google.com
Debugged-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 50b2412b7e ]
Upon command completion timeout, driver simulates a forced command
completion. In a rare case where real interrupt for that command arrives
simultaneously, it might release the command entry while the forced
handler might still access it.
Fix that by adding an entry refcount, to track current amount of allowed
handlers. Command entry to be released only when this refcount is
decremented to zero.
Command refcount is always initialized to one. For callback commands,
command completion handler is the symmetric flow to decrement it. For
non-callback commands, it is wait_func().
Before ringing the doorbell, increment the refcount for the real completion
handler. Once the real completion handler is called, it will decrement it.
For callback commands, once the delayed work is scheduled, increment the
refcount. Upon callback command completion handler, we will try to cancel
the timeout callback. In case of success, we need to decrement the callback
refcount as it will never run.
In addition, gather the entry index free and the entry free into a one
flow for all command types release.
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8a018eb55e ]
Calling pipe2() with O_NOTIFICATION_PIPE could results in memory
leaks unless watch_queue_init() is successful.
In case of watch_queue_init() failure in pipe2() we are left
with inode and pipe_inode_info instances that need to be freed. That
failure exit has been introduced in commit c73be61ced ("pipe: Add
general notification queue support") and its handling should've been
identical to nearby treatment of alloc_file_pseudo() failures - it
is dealing with the same situation. As it is, the mainline kernel
leaks in that case.
Another problem is that CONFIG_WATCH_QUEUE and !CONFIG_WATCH_QUEUE
cases are treated differently (and the former leaks just pipe_inode_info,
the latter - both pipe_inode_info and inode).
Fixed by providing a dummy wacth_queue_init() in !CONFIG_WATCH_QUEUE
case and by having failures of wacth_queue_init() handled the same way
we handle alloc_file_pseudo() ones.
Fixes: c73be61ced ("pipe: Add general notification queue support")
Signed-off-by: Qian Cai <cai@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit c381b07941 upstream.
The original problem was from nvme-over-tcp code, who mistakenly uses
kernel_sendpage() to send pages allocated by __get_free_pages() without
__GFP_COMP flag. Such pages don't have refcount (page_count is 0) on
tail pages, sending them by kernel_sendpage() may trigger a kernel panic
from a corrupted kernel heap, because these pages are incorrectly freed
in network stack as page_count 0 pages.
This patch introduces a helper sendpage_ok(), it returns true if the
checking page,
- is not slab page: PageSlab(page) is false.
- has page refcount: page_count(page) is not zero
All drivers who want to send page to remote end by kernel_sendpage()
may use this helper to check whether the page is OK. If the helper does
not return true, the driver should try other non sendpage method (e.g.
sock_no_sendpage()) to handle the page.
Signed-off-by: Coly Li <colyli@suse.de>
Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Jan Kara <jack@suse.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mikhail Skorzhinskii <mskorzhinskiy@solarflare.com>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Vlastimil Babka <vbabka@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6735b4632d upstream.
syzbot has reported an issue in the framebuffer layer, where a malicious
user may overflow our built-in font data buffers.
In order to perform a reliable range check, subsystems need to know
`FONTDATAMAX` for each built-in font. Unfortunately, our font descriptor,
`struct console_font` does not contain `FONTDATAMAX`, and is part of the
UAPI, making it infeasible to modify it.
For user-provided fonts, the framebuffer layer resolves this issue by
reserving four extra words at the beginning of data buffers. Later,
whenever a function needs to access them, it simply uses the following
macros:
Recently we have gathered all the above macros to <linux/font.h>. Let us
do the same thing for built-in fonts, prepend four extra words (including
`FONTDATAMAX`) to their data buffers, so that subsystems can use these
macros for all fonts, no matter built-in or user-provided.
This patch depends on patch "fbdev, newport_con: Move FONT_EXTRA_WORDS
macros into linux/font.h".
Cc: stable@vger.kernel.org
Link: https://syzkaller.appspot.com/bug?id=08b8be45afea11888776f897895aef9ad1c3ecfd
Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/ef18af00c35fb3cc826048a5f70924ed6ddce95b.1600953813.git.yepeilin.cs@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 27ba3e8ff3 upstream.
When CONFIG_BLK_DEV_ZONED is disabled, allow using host-aware ZBC disks as
regular disks. In this case, ensure that command completion is correctly
executed by changing sd_zbc_complete() to return good_bytes instead of 0
and causing a hang during device probe (endless retries).
When CONFIG_BLK_DEV_ZONED is enabled and a host-aware disk is detected to
have partitions, it will be used as a regular disk. In this case, make sure
to not do anything in sd_zbc_revalidate_zones() as that triggers warnings.
Since all these different cases result in subtle settings of the disk queue
zoned model, introduce the block layer helper function
blk_queue_set_zoned() to generically implement setting up the effective
zoned model according to the disk type, the presence of partitions on the
disk and CONFIG_BLK_DEV_ZONED configuration.
Link: https://lore.kernel.org/r/20200915073347.832424-2-damien.lemoal@wdc.com
Fixes: b72053072c ("block: allow partitions on host aware zone devices")
Cc: <stable@vger.kernel.org>
Reported-by: Borislav Petkov <bp@alien8.de>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 472e5b056f ]
The pipe splice code still used the old model of waiting for pipe IO by
using a non-specific "pipe_wait()" that waited for any pipe event to
happen, which depended on all pipe IO being entirely serialized by the
pipe lock. So by checking the state you were waiting for, and then
adding yourself to the wait queue before dropping the lock, you were
guaranteed to see all the wakeups.
Strictly speaking, the actual wakeups were not done under the lock, but
the pipe_wait() model still worked, because since the waiter held the
lock when checking whether it should sleep, it would always see the
current state, and the wakeup was always done after updating the state.
However, commit 0ddad21d3e ("pipe: use exclusive waits when reading or
writing") split the single wait-queue into two, and in the process also
made the "wait for event" code wait for _two_ wait queues, and that then
showed a race with the wakers that were not serialized by the pipe lock.
It's only splice that used that "pipe_wait()" model, so the problem
wasn't obvious, but Josef Bacik reports:
"I hit a hang with fstest btrfs/187, which does a btrfs send into
/dev/null. This works by creating a pipe, the write side is given to
the kernel to write into, and the read side is handed to a thread that
splices into a file, in this case /dev/null.
The box that was hung had the write side stuck here [pipe_write] and
the read side stuck here [splice_from_pipe_next -> pipe_wait].
[ more details about pipe_wait() scenario ]
The problem is we're doing the prepare_to_wait, which sets our state
each time, however we can be woken up either with reads or writes. In
the case above we race with the WRITER waking us up, and re-set our
state to INTERRUPTIBLE, and thus never break out of schedule"
Josef had a patch that avoided the issue in pipe_wait() by just making
it set the state only once, but the deeper problem is that pipe_wait()
depends on a level of synchonization by the pipe mutex that it really
shouldn't. And the whole "wait for any pipe state change" model really
isn't very good to begin with.
So rather than trying to work around things in pipe_wait(), remove that
legacy model of "wait for arbitrary pipe event" entirely, and actually
create functions that wait for the pipe actually being readable or
writable, and can do so without depending on the pipe lock serializing
everything.
Fixes: 0ddad21d3e ("pipe: use exclusive waits when reading or writing")
Link: https://lore.kernel.org/linux-fsdevel/bfa88b5ad6f069b2b679316b9e495a970130416c.1601567868.git.josef@toxicpanda.com/
Reported-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-and-tested-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit f85086f95f upstream.
In register_mem_sect_under_node() the system_state's value is checked to
detect whether the call is made during boot time or during an hot-plug
operation. Unfortunately, that check against SYSTEM_BOOTING is wrong
because regular memory is registered at SYSTEM_SCHEDULING state. In
addition, memory hot-plug operation can be triggered at this system
state by the ACPI [1]. So checking against the system state is not
enough.
The consequence is that on system with interleaved node's ranges like this:
Early memory node ranges
node 1: [mem 0x0000000000000000-0x000000011fffffff]
node 2: [mem 0x0000000120000000-0x000000014fffffff]
node 1: [mem 0x0000000150000000-0x00000001ffffffff]
node 0: [mem 0x0000000200000000-0x000000048fffffff]
node 2: [mem 0x0000000490000000-0x00000007ffffffff]
This can be seen on PowerPC LPAR after multiple memory hot-plug and
hot-unplug operations are done. At the next reboot the node's memory
ranges can be interleaved and since the call to link_mem_sections() is
made in topology_init() while the system is in the SYSTEM_SCHEDULING
state, the node's id is not checked, and the sections registered to
multiple nodes:
$ ls -l /sys/devices/system/memory/memory21/node*
total 0
lrwxrwxrwx 1 root root 0 Aug 24 05:27 node1 -> ../../node/node1
lrwxrwxrwx 1 root root 0 Aug 24 05:27 node2 -> ../../node/node2
In that case, the system is able to boot but if later one of theses
memory blocks is hot-unplugged and then hot-plugged, the sysfs
inconsistency is detected and this is triggering a BUG_ON():
kernel BUG at /Users/laurent/src/linux-ppc/mm/memory_hotplug.c:1084!
Oops: Exception in kernel mode, sig: 5 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: rpadlpar_io rpaphp pseries_rng rng_core vmx_crypto gf128mul binfmt_misc ip_tables x_tables xfs libcrc32c crc32c_vpmsum autofs4
CPU: 8 PID: 10256 Comm: drmgr Not tainted 5.9.0-rc1+ #25
Call Trace:
add_memory_resource+0x23c/0x340 (unreliable)
__add_memory+0x5c/0xf0
dlpar_add_lmb+0x1b4/0x500
dlpar_memory+0x1f8/0xb80
handle_dlpar_errorlog+0xc0/0x190
dlpar_store+0x198/0x4a0
kobj_attr_store+0x30/0x50
sysfs_kf_write+0x64/0x90
kernfs_fop_write+0x1b0/0x290
vfs_write+0xe8/0x290
ksys_write+0xdc/0x130
system_call_exception+0x160/0x270
system_call_common+0xf0/0x27c
This patch addresses the root cause by not relying on the system_state
value to detect whether the call is due to a hot-plug operation. An
extra parameter is added to link_mem_sections() detailing whether the
operation is due to a hot-plug operation.
[1] According to Oscar Salvador, using this qemu command line, ACPI
memory hotplug operations are raised at SYSTEM_SCHEDULING state:
$QEMU -enable-kvm -machine pc -smp 4,sockets=4,cores=1,threads=1 -cpu host -monitor pty \
-m size=$MEM,slots=255,maxmem=4294967296k \
-numa node,nodeid=0,cpus=0-3,mem=512 -numa node,nodeid=1,mem=512 \
-object memory-backend-ram,id=memdimm0,size=134217728 -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0 \
-object memory-backend-ram,id=memdimm1,size=134217728 -device pc-dimm,node=0,memdev=memdimm1,id=dimm1,slot=1 \
-object memory-backend-ram,id=memdimm2,size=134217728 -device pc-dimm,node=0,memdev=memdimm2,id=dimm2,slot=2 \
-object memory-backend-ram,id=memdimm3,size=134217728 -device pc-dimm,node=0,memdev=memdimm3,id=dimm3,slot=3 \
-object memory-backend-ram,id=memdimm4,size=134217728 -device pc-dimm,node=1,memdev=memdimm4,id=dimm4,slot=4 \
-object memory-backend-ram,id=memdimm5,size=134217728 -device pc-dimm,node=1,memdev=memdimm5,id=dimm5,slot=5 \
-object memory-backend-ram,id=memdimm6,size=134217728 -device pc-dimm,node=1,memdev=memdimm6,id=dimm6,slot=6 \
Fixes: 4fbce63391 ("mm/memory_hotplug.c: make register_mem_sect_under_node() a callback of walk_memory_range()")
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Scott Cheloha <cheloha@linux.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200915094143.79181-3-ldufour@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c1d0da8335 upstream.
Patch series "mm: fix memory to node bad links in sysfs", v3.
Sometimes, firmware may expose interleaved memory layout like this:
Early memory node ranges
node 1: [mem 0x0000000000000000-0x000000011fffffff]
node 2: [mem 0x0000000120000000-0x000000014fffffff]
node 1: [mem 0x0000000150000000-0x00000001ffffffff]
node 0: [mem 0x0000000200000000-0x000000048fffffff]
node 2: [mem 0x0000000490000000-0x00000007ffffffff]
In that case, we can see memory blocks assigned to multiple nodes in
sysfs:
$ ls -l /sys/devices/system/memory/memory21
total 0
lrwxrwxrwx 1 root root 0 Aug 24 05:27 node1 -> ../../node/node1
lrwxrwxrwx 1 root root 0 Aug 24 05:27 node2 -> ../../node/node2
-rw-r--r-- 1 root root 65536 Aug 24 05:27 online
-r--r--r-- 1 root root 65536 Aug 24 05:27 phys_device
-r--r--r-- 1 root root 65536 Aug 24 05:27 phys_index
drwxr-xr-x 2 root root 0 Aug 24 05:27 power
-r--r--r-- 1 root root 65536 Aug 24 05:27 removable
-rw-r--r-- 1 root root 65536 Aug 24 05:27 state
lrwxrwxrwx 1 root root 0 Aug 24 05:25 subsystem -> ../../../../bus/memory
-rw-r--r-- 1 root root 65536 Aug 24 05:25 uevent
-r--r--r-- 1 root root 65536 Aug 24 05:27 valid_zones
The same applies in the node's directory with a memory21 link in both
the node1 and node2's directory.
This is wrong but doesn't prevent the system to run. However when
later, one of these memory blocks is hot-unplugged and then hot-plugged,
the system is detecting an inconsistency in the sysfs layout and a
BUG_ON() is raised:
kernel BUG at /Users/laurent/src/linux-ppc/mm/memory_hotplug.c:1084!
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: rpadlpar_io rpaphp pseries_rng rng_core vmx_crypto gf128mul binfmt_misc ip_tables x_tables xfs libcrc32c crc32c_vpmsum autofs4
CPU: 8 PID: 10256 Comm: drmgr Not tainted 5.9.0-rc1+ #25
Call Trace:
add_memory_resource+0x23c/0x340 (unreliable)
__add_memory+0x5c/0xf0
dlpar_add_lmb+0x1b4/0x500
dlpar_memory+0x1f8/0xb80
handle_dlpar_errorlog+0xc0/0x190
dlpar_store+0x198/0x4a0
kobj_attr_store+0x30/0x50
sysfs_kf_write+0x64/0x90
kernfs_fop_write+0x1b0/0x290
vfs_write+0xe8/0x290
ksys_write+0xdc/0x130
system_call_exception+0x160/0x270
system_call_common+0xf0/0x27c
This has been seen on PowerPC LPAR.
The root cause of this issue is that when node's memory is registered,
the range used can overlap another node's range, thus the memory block
is registered to multiple nodes in sysfs.
There are two issues here:
(a) The sysfs memory and node's layouts are broken due to these
multiple links
(b) The link errors in link_mem_sections() should not lead to a system
panic.
To address (a) register_mem_sect_under_node should not rely on the
system state to detect whether the link operation is triggered by a hot
plug operation or not. This is addressed by the patches 1 and 2 of this
series.
Issue (b) will be addressed separately.
This patch (of 2):
The memmap_context enum is used to detect whether a memory operation is
due to a hot-add operation or happening at boot time.
Make it general to the hotplug operation and rename it as
meminit_context.
There is no functional change introduced by this patch
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J . Wysocki" <rafael@kernel.org>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Scott Cheloha <cheloha@linux.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200915094143.79181-1-ldufour@linux.ibm.com
Link: https://lkml.kernel.org/r/20200915132624.9723-1-ldufour@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d3f7b1bb20 upstream.
Currently to make sure that every page table entry is read just once
gup_fast walks perform READ_ONCE and pass pXd value down to the next
gup_pXd_range function by value e.g.:
static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end,
unsigned int flags, struct page **pages, int *nr)
...
pudp = pud_offset(&p4d, addr);
This function passes a reference on that local value copy to pXd_offset,
and might get the very same pointer in return. This happens when the
level is folded (on most arches), and that pointer should not be
iterated.
On s390 due to the fact that each task might have different 5,4 or
3-level address translation and hence different levels folded the logic
is more complex and non-iteratable pointer to a local copy leads to
severe problems.
Here is an example of what happens with gup_fast on s390, for a task
with 3-level paging, crossing a 2 GB pud boundary:
// addr = 0x1007ffff000, end = 0x10080001000
static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end,
unsigned int flags, struct page **pages, int *nr)
{
unsigned long next;
pud_t *pudp;
// pud_offset returns &p4d itself (a pointer to a value on stack)
pudp = pud_offset(&p4d, addr);
do {
// on second iteratation reading "random" stack value
pud_t pud = READ_ONCE(*pudp);
// next = 0x10080000000, due to PUD_SIZE/MASK != PGDIR_SIZE/MASK on s390
next = pud_addr_end(addr, end);
...
} while (pudp++, addr = next, addr != end); // pudp++ iterating over stack
return 1;
}
This happens since s390 moved to common gup code with commit
d1874a0c28 ("s390/mm: make the pxd_offset functions more robust") and
commit 1a42010cdc ("s390/mm: convert to the generic
get_user_pages_fast code").
s390 tried to mimic static level folding by changing pXd_offset
primitives to always calculate top level page table offset in pgd_offset
and just return the value passed when pXd_offset has to act as folded.
What is crucial for gup_fast and what has been overlooked is that
PxD_SIZE/MASK and thus pXd_addr_end should also change correspondingly.
And the latter is not possible with dynamic folding.
To fix the issue in addition to pXd values pass original pXdp pointers
down to gup_pXd_range functions. And introduce pXd_offset_lockless
helpers, which take an additional pXd entry value parameter. This has
already been discussed in
https://lkml.kernel.org/r/20190418100218.0a4afd51@mschwideX1
Fixes: 1a42010cdc ("s390/mm: convert to the generic get_user_pages_fast code")
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: <stable@vger.kernel.org> [5.2+]
Link: https://lkml.kernel.org/r/patch.git-943f1e5dcff2.your-ad-here.call-01599856292-ext-8676@work.hours
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e2ec512825 upstream.
DM was calling generic_fsdax_supported() to determine whether a device
referenced in the DM table supports DAX. However this is a helper for "leaf" device drivers so that
they don't have to duplicate common generic checks. High level code
should call dax_supported() helper which that calls into appropriate
helper for the particular device. This problem manifested itself as
kernel messages:
dm-3: error: dax access failed (-95)
when lvm2-testsuite run in cases where a DM device was stacked on top of
another DM device.
Fixes: 7bf7eac8d6 ("dax: Arrange for dax_supported check to span multiple devices")
Cc: <stable@vger.kernel.org>
Tested-by: Adrian Huang <ahuang12@lenovo.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/160061715195.13131.5503173247632041975.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 75df529bec upstream.
Steal time initialization requires mapping a memory region which
invokes a memory allocation. Doing this at CPU starting time results
in the following trace when CONFIG_DEBUG_ATOMIC_SLEEP is enabled:
BUG: sleeping function called from invalid context at mm/slab.h:498
in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 0, name: swapper/1
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.9.0-rc5+ #1
Call trace:
dump_backtrace+0x0/0x208
show_stack+0x1c/0x28
dump_stack+0xc4/0x11c
___might_sleep+0xf8/0x130
__might_sleep+0x58/0x90
slab_pre_alloc_hook.constprop.101+0xd0/0x118
kmem_cache_alloc_node_trace+0x84/0x270
__get_vm_area_node+0x88/0x210
get_vm_area_caller+0x38/0x40
__ioremap_caller+0x70/0xf8
ioremap_cache+0x78/0xb0
memremap+0x9c/0x1a8
init_stolen_time_cpu+0x54/0xf0
cpuhp_invoke_callback+0xa8/0x720
notify_cpu_starting+0xc8/0xd8
secondary_start_kernel+0x114/0x180
CPU1: Booted secondary processor 0x0000000001 [0x431f0a11]
However we don't need to initialize steal time at CPU starting time.
We can simply wait until CPU online time, just sacrificing a bit of
accuracy by returning zero for steal time until we know better.
While at it, add __init to the functions that are only called by
pv_time_init() which is __init.
Signed-off-by: Andrew Jones <drjones@redhat.com>
Fixes: e0685fa228 ("arm64: Retrieve stolen time as paravirtualized guest")
Cc: stable@vger.kernel.org
Reviewed-by: Steven Price <steven.price@arm.com>
Link: https://lore.kernel.org/r/20200916154530.40809-1-drjones@redhat.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e0830dbf71 upstream.
Fix the port-lock initialisation regression introduced by commit
a3cb39d258 ("serial: core: Allow detach and attach serial device for
console") by making sure that the lock is again initialised during
console setup.
The console may be registered before the serial controller has been
probed in which case the port lock needs to be initialised during
console setup by a call to uart_set_options(). The console-detach
changes introduced a regression in several drivers by effectively
removing that initialisation by not initialising the lock when the port
is used as a console (which is always the case during console setup).
Add back the early lock initialisation and instead use a new
console-reinit flag to handle the case where a console is being
re-attached through sysfs.
The question whether the console-detach interface should have been added
in the first place is left for another discussion.
Note that the console-enabled check in uart_set_options() is not
redundant because of kgdboc, which can end up reinitialising an already
enabled console (see commit 42b6a1baa3 ("serial_core: Don't
re-initialize a previously initialized spinlock.")).
Fixes: a3cb39d258 ("serial: core: Allow detach and attach serial device for console")
Cc: stable <stable@vger.kernel.org> # 5.7
Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20200909143101.15389-3-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e6b1a44ecc ]
The __this_cpu*() accessors are (in general) IRQ-unsafe which, given
that percpu-rwsem is a blocking primitive, should be just fine.
However, file_end_write() is used from IRQ context and will cause
load-store issues on architectures where the per-cpu accessors are not
natively irq-safe.
Fix it by using the IRQ-safe this_cpu_*() for operations on
read_count. This will generate more expensive code on a number of
platforms, which might cause a performance regression for some of the
other percpu-rwsem users.
If any such is reported, we can consider alternative solutions.
Fixes: 70fe2f4815 ("aio: fix freeze protection of aio writes")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lkml.kernel.org/r/20200915140750.137881-1-houtao1@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0a355aeb24 ]
If something goes wrong (such as the SCL being stuck low) then we need
to reset the PCA chip. The issue with this is that on reset we lose all
config settings and the chip ends up in a disabled state which results
in a lock up/high CPU usage. We need to re-apply any configuration that
had previously been set and re-enable the chip.
Signed-off-by: Evan Nimmo <evan.nimmo@alliedtelesis.co.nz>
Reviewed-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cc5453a5b7 ]
If an sctp connection gets re-used, heartbeats are flagged as invalid
because their vtag doesn't match.
Handle this in a similar way as TCP conntrack when it suspects that the
endpoints and conntrack are out-of-sync.
When a HEARTBEAT request fails its vtag validation, flag this in the
conntrack state and accept the packet.
When a HEARTBEAT_ACK is received with an invalid vtag in the reverse
direction after we allowed such a HEARTBEAT through, assume we are
out-of-sync and re-set the vtag info.
v2: remove left-over snippet from an older incarnation that moved
new_state/old_state assignments, thats not needed so keep that
as-is.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 3b5455636f upstream.
All three generations of Sandisk SSDs lock up hard intermittently.
Experiments showed that disabling NCQ lowered the failure rate significantly
and the kernel has been disabling NCQ for some models of SD7's and 8's,
which is obviously undesirable.
Karthik worked with Sandisk to root cause the hard lockups to trim commands
larger than 128M. This patch implements ATA_HORKAGE_MAX_TRIM_128M which
limits max trim size to 128M and applies it to all three generations of
Sandisk SSDs.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Karthik Shivaram <karthikgs@fb.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>